From patchwork Fri Nov 17 17:24:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865151 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3ft5GfBz1yRM for ; Sat, 18 Nov 2023 04:24:22 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE0BC385828F for ; Fri, 17 Nov 2023 17:24:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id BB1A33858C5F for ; Fri, 17 Nov 2023 17:24:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BB1A33858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BB1A33858C5F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241849; cv=none; b=kyTNYWnRJbWewKLAe1DqqFd1X7OcSpvHygoHeayTKKk+ryg9G4IEe7Z8t0P8//h1TPPNCzyt/mt4tOKQYVs/ZmWqyndxJwWFyIRMLXbfUu5nWt7wKBq5F2zMdoAbUUL3S9MLzMNpn8oPD+ihbKGBoV84tH7TRtIRot15DSiZYgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241849; c=relaxed/simple; bh=JxXA6LmHEDfZYSqx952su+CzAXYHr1wdGs4F0DrIET0=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=xUtb4nLDTs93rxiQA0xzZGQ1nYaBaDHQdp388p6ZuHiQP8CV2VAIhE47lv+uBdaINvH39mT9qlQxX2tqNfFMu3Ti9CD8D9ERSOU+DmKsETztZxV4Q7mvzY3AD5DblTPeAsKBmA6qpYH9iq4EomxXVJJ2D6q7xcwF4nrjleRxctw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6D7B01477 for ; Fri, 17 Nov 2023 09:24:53 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 090233F73F for ; Fri, 17 Nov 2023 09:24:06 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 01/21] aarch64: Generalise require_immediate_lane_index References: Date: Fri, 17 Nov 2023 17:24:05 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-23.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org require_immediate_lane_index previously hard-coded the assumption that the group size is determined by the argument immediately before the index. However, for SME, there are cases where it should be determined by an earlier argument instead. gcc/ * config/aarch64/aarch64-sve-builtins.h: (function_checker::require_immediate_lane_index): Add an argument for the index of the indexed vector argument. * config/aarch64/aarch64-sve-builtins.cc (function_checker::require_immediate_lane_index): Likewise. * config/aarch64/aarch64-sve-builtins-shapes.cc (ternary_bfloat_lane_base::check): Update accordingly. (ternary_qq_lane_base::check): Likewise. (binary_lane_def::check): Likewise. (binary_long_lane_def::check): Likewise. (ternary_lane_def::check): Likewise. (ternary_lane_rotate_def::check): Likewise. (ternary_long_lane_def::check): Likewise. (ternary_qq_lane_rotate_def::check): Likewise. --- .../aarch64/aarch64-sve-builtins-shapes.cc | 16 ++++++++-------- gcc/config/aarch64/aarch64-sve-builtins.cc | 18 ++++++++++++------ gcc/config/aarch64/aarch64-sve-builtins.h | 3 ++- 3 files changed, 22 insertions(+), 15 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc index af816c4c9e7..1646afc7a0d 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc @@ -941,7 +941,7 @@ struct ternary_bfloat_lane_base bool check (function_checker &c) const override { - return c.require_immediate_lane_index (3, N); + return c.require_immediate_lane_index (3, 2, N); } }; @@ -956,7 +956,7 @@ struct ternary_qq_lane_base bool check (function_checker &c) const override { - return c.require_immediate_lane_index (3, 4); + return c.require_immediate_lane_index (3, 0); } }; @@ -1123,7 +1123,7 @@ struct binary_lane_def : public overloaded_base<0> bool check (function_checker &c) const override { - return c.require_immediate_lane_index (2); + return c.require_immediate_lane_index (2, 1); } }; SHAPE (binary_lane) @@ -1162,7 +1162,7 @@ struct binary_long_lane_def : public overloaded_base<0> bool check (function_checker &c) const override { - return c.require_immediate_lane_index (2); + return c.require_immediate_lane_index (2, 1); } }; SHAPE (binary_long_lane) @@ -2817,7 +2817,7 @@ struct ternary_lane_def : public overloaded_base<0> bool check (function_checker &c) const override { - return c.require_immediate_lane_index (3); + return c.require_immediate_lane_index (3, 2); } }; SHAPE (ternary_lane) @@ -2845,7 +2845,7 @@ struct ternary_lane_rotate_def : public overloaded_base<0> bool check (function_checker &c) const override { - return (c.require_immediate_lane_index (3, 2) + return (c.require_immediate_lane_index (3, 2, 2) && c.require_immediate_one_of (4, 0, 90, 180, 270)); } }; @@ -2868,7 +2868,7 @@ struct ternary_long_lane_def bool check (function_checker &c) const override { - return c.require_immediate_lane_index (3); + return c.require_immediate_lane_index (3, 2); } }; SHAPE (ternary_long_lane) @@ -2965,7 +2965,7 @@ struct ternary_qq_lane_rotate_def : public overloaded_base<0> bool check (function_checker &c) const override { - return (c.require_immediate_lane_index (3, 4) + return (c.require_immediate_lane_index (3, 0) && c.require_immediate_one_of (4, 0, 90, 180, 270)); } }; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 161a14edde7..75a51565ed2 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -2440,20 +2440,26 @@ function_checker::require_immediate_enum (unsigned int rel_argno, tree type) return false; } -/* Check that argument REL_ARGNO is suitable for indexing argument - REL_ARGNO - 1, in groups of GROUP_SIZE elements. REL_ARGNO counts - from the end of the predication arguments. */ +/* The intrinsic conceptually divides vector argument REL_VEC_ARGNO into + groups of GROUP_SIZE elements. Return true if argument REL_ARGNO is + a suitable constant index for selecting one of these groups. The + selection happens within a 128-bit quadword, rather than the whole vector. + + REL_ARGNO and REL_VEC_ARGNO count from the end of the predication + arguments. */ bool function_checker::require_immediate_lane_index (unsigned int rel_argno, + unsigned int rel_vec_argno, unsigned int group_size) { unsigned int argno = m_base_arg + rel_argno; if (!argument_exists_p (argno)) return true; - /* Get the type of the previous argument. tree_argument_type wants a - 1-based number, whereas ARGNO is 0-based. */ - machine_mode mode = TYPE_MODE (type_argument_type (m_fntype, argno)); + /* Get the type of the vector argument. tree_argument_type wants a + 1-based number, whereas VEC_ARGNO is 0-based. */ + unsigned int vec_argno = m_base_arg + rel_vec_argno; + machine_mode mode = TYPE_MODE (type_argument_type (m_fntype, vec_argno + 1)); gcc_assert (VECTOR_MODE_P (mode)); unsigned int nlanes = 128 / (group_size * GET_MODE_UNIT_BITSIZE (mode)); return require_immediate_range (rel_argno, 0, nlanes - 1); diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index a301570b82e..99bfd906a07 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -463,7 +463,8 @@ public: bool require_immediate_either_or (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT); bool require_immediate_enum (unsigned int, tree); - bool require_immediate_lane_index (unsigned int, unsigned int = 1); + bool require_immediate_lane_index (unsigned int, unsigned int, + unsigned int = 1); bool require_immediate_one_of (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT, HOST_WIDE_INT, HOST_WIDE_INT); bool require_immediate_range (unsigned int, HOST_WIDE_INT, HOST_WIDE_INT); From patchwork Fri Nov 17 17:24:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865152 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3g7505bz1yRM for ; Sat, 18 Nov 2023 04:24:35 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EC98D38582B2 for ; Fri, 17 Nov 2023 17:24:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 8969338582B2 for ; Fri, 17 Nov 2023 17:24:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8969338582B2 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8969338582B2 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241862; cv=none; b=rF8/928W7RVr7ktLDlyjWVFrz7JEWYrr7rkjsMyZ3N5fxqI+6cZne4nmyZWXEbezS+jphgY0rBdmN4cKeqCm0Ect9nj3wsQJr3nKGhTAhxFRJ2k3NyDXuyNG38wEu9G5Z629BNS74jV8hVUMBgH56YZ7FGUVQd8F94r805DICmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241862; c=relaxed/simple; bh=ROUb2r7RQ0LVWCGBpmp58aJ2IxKRpTqbXRdpNZjtp78=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=rwyt6NRzEJf8XpgPiichoNa09WfYOlqBHalxshf+B0ilDCiiNR75NJcp6jVFMl1dd5IQgunBaixmq9mItSE6DlK+hBBW/pSRKY2M3TrK6wJoaqZ3PJL1jdj2+SQBjN62zfq1fcVfdbya80WXSGLM3I3Rx0iXhnctzBXOC9t8zJc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5BD401477 for ; Fri, 17 Nov 2023 09:25:07 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EB5D93F73F for ; Fri, 17 Nov 2023 09:24:20 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 02/21] aarch64: Add a result_mode helper function References: Date: Fri, 17 Nov 2023 17:24:19 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-23.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org SME will add more intrinsics whose expansion code requires the mode of the function return value. This patch adds an associated helper routine. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_expander::result_mode): New member function. * config/aarch64/aarch64-sve-builtins-base.cc (svld234_impl::expand): Use it. * config/aarch64/aarch64-sve-builtins.cc (function_expander::get_reg_target): Likewise. --- gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +- gcc/config/aarch64/aarch64-sve-builtins.cc | 2 +- gcc/config/aarch64/aarch64-sve-builtins.h | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 9010ecca6da..b84e245eb3e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1506,7 +1506,7 @@ public: rtx expand (function_expander &e) const override { - machine_mode tuple_mode = TYPE_MODE (TREE_TYPE (e.call_expr)); + machine_mode tuple_mode = e.result_mode (); insn_code icode = convert_optab_handler (vec_mask_load_lanes_optab, tuple_mode, e.vector_mode (0)); return e.use_contiguous_load_insn (icode); diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 75a51565ed2..8b7b885a8f4 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -2802,7 +2802,7 @@ function_expander::get_fallback_value (machine_mode mode, unsigned int nops, rtx function_expander::get_reg_target () { - machine_mode target_mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl))); + machine_mode target_mode = result_mode (); if (!possible_target || GET_MODE (possible_target) != target_mode) possible_target = gen_reg_rtx (target_mode); return possible_target; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 99bfd906a07..7cf8f45b3d5 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -529,6 +529,8 @@ public: insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0, machine_mode = E_VOIDmode); + machine_mode result_mode () const; + bool overlaps_input_p (rtx); rtx convert_to_pmode (rtx); @@ -878,6 +880,13 @@ function_base::call_properties (const function_instance &instance) const return flags; } +/* Return the mode of the result of a call. */ +inline machine_mode +function_expander::result_mode () const +{ + return TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl))); +} + } #endif From patchwork Fri Nov 17 17:24:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865153 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3gY1QSrz1yRM for ; Sat, 18 Nov 2023 04:24:57 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9EB5438449D6 for ; Fri, 17 Nov 2023 17:24:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id BED913857C48 for ; Fri, 17 Nov 2023 17:24:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BED913857C48 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BED913857C48 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241881; cv=none; b=ZvDrlFPYmhUvArd3qRd08KcvhisiqPYreXfsEE/ZIq+eM1eR2LLN1UUPZrRunT9JbbSOwwBAUbhnONHpvqtFOZedBGbyTHxyADJUQI6irXkIczYxF6Leu2fBJovsLwYW1S3/ax2BWb+ni8aLeA5i6kNnEiEI3Z0XAgh5z8kMvfo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241881; c=relaxed/simple; bh=pTeTpQO2PzJ7b+NgzEbXyfaMydYlFwldUawjQW2IHys=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=mI3c/SR5KEeq3KB3+plIQx5arQLGIyrGRIAddvMO6CNLTx9JsXsyfB8l/2X7xX4XPjtmms0ubtx43P5NPZDQ86ntpANs94uwf4Grp/rQHsXI9YChjm9cmv0eDycEanM0GXcyMRcSeX2qWgqOOPV218sTDu5RmVdvvPv1yp2jJgQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8AF721477 for ; Fri, 17 Nov 2023 09:25:23 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0AAB23F73F for ; Fri, 17 Nov 2023 09:24:36 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 03/21] aarch64: Use SVE's RDVL instruction References: Date: Fri, 17 Nov 2023 17:24:35 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org We didn't previously use SVE's RDVL instruction, since the CNT* forms are preferred and provide most of the range. However, there are some cases that RDVL can handle and CNT* can't, and using RDVL-like instructions becomes important for SME. gcc/ * config/aarch64/aarch64-protos.h (aarch64_sve_rdvl_immediate_p) (aarch64_output_sve_rdvl): Declare. * config/aarch64/aarch64.cc (aarch64_sve_cnt_factor_p): New function, split out from... (aarch64_sve_cnt_immediate_p): ...here. (aarch64_sve_rdvl_factor_p): New function. (aarch64_sve_rdvl_immediate_p): Likewise. (aarch64_output_sve_rdvl): Likewise. (aarch64_offset_temporaries): Rewrite the SVE handling to use RDVL for some cases. (aarch64_expand_mov_immediate): Handle RDVL immediates. (aarch64_mov_operand_p): Likewise. * config/aarch64/constraints.md (Usr): New constraint. * config/aarch64/aarch64.md (*mov_aarch64): Add an RDVL alternative. (*movsi_aarch64, *movdi_aarch64): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/cntb.c: Tweak expected output. * gcc.target/aarch64/sve/acle/asm/cnth.c: Likewise. * gcc.target/aarch64/sve/acle/asm/cntw.c: Likewise. * gcc.target/aarch64/sve/acle/asm/cntd.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfb.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise. * gcc.target/aarch64/sve/loop_add_4.c: Expect RDVL to be used to calculate the -17 and 17 factors. * gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise the 18 factor. --- gcc/config/aarch64/aarch64-protos.h | 2 + gcc/config/aarch64/aarch64.cc | 191 ++++++++++++------ gcc/config/aarch64/aarch64.md | 3 + gcc/config/aarch64/constraints.md | 6 + .../gcc.target/aarch64/sve/acle/asm/cntb.c | 71 +++++-- .../gcc.target/aarch64/sve/acle/asm/cntd.c | 12 +- .../gcc.target/aarch64/sve/acle/asm/cnth.c | 20 +- .../gcc.target/aarch64/sve/acle/asm/cntw.c | 16 +- .../gcc.target/aarch64/sve/acle/asm/prfb.c | 6 +- .../gcc.target/aarch64/sve/acle/asm/prfd.c | 4 +- .../gcc.target/aarch64/sve/acle/asm/prfh.c | 4 +- .../gcc.target/aarch64/sve/acle/asm/prfw.c | 4 +- .../gcc.target/aarch64/sve/loop_add_4.c | 6 +- .../aarch64/sve/pcs/stack_clash_1.c | 3 +- 14 files changed, 225 insertions(+), 123 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 82e83402b75..7ebdec2f58c 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -798,6 +798,7 @@ bool aarch64_sve_mode_p (machine_mode); HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int); bool aarch64_sve_cnt_immediate_p (rtx); bool aarch64_sve_scalar_inc_dec_immediate_p (rtx); +bool aarch64_sve_rdvl_immediate_p (rtx); bool aarch64_sve_addvl_addpl_immediate_p (rtx); bool aarch64_sve_vector_inc_dec_immediate_p (rtx); int aarch64_add_offset_temporaries (rtx); @@ -810,6 +811,7 @@ char *aarch64_output_sve_prefetch (const char *, rtx, const char *); char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx); char *aarch64_output_sve_cnt_pat_immediate (const char *, const char *, rtx *); char *aarch64_output_sve_scalar_inc_dec (rtx); +char *aarch64_output_sve_rdvl (rtx); char *aarch64_output_sve_addvl_addpl (rtx); char *aarch64_output_sve_vector_inc_dec (const char *, rtx); char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fc1492b43ae..622ab763306 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -5307,6 +5307,18 @@ aarch64_fold_sve_cnt_pat (aarch64_svpattern pattern, unsigned int nelts_per_vq) return -1; } +/* Return true if a single CNT[BHWD] instruction can multiply FACTOR + by the number of 128-bit quadwords in an SVE vector. */ + +static bool +aarch64_sve_cnt_factor_p (HOST_WIDE_INT factor) +{ + /* The coefficient must be [1, 16] * {2, 4, 8, 16}. */ + return (IN_RANGE (factor, 2, 16 * 16) + && (factor & 1) == 0 + && factor <= 16 * (factor & -factor)); +} + /* Return true if we can move VALUE into a register using a single CNT[BHWD] instruction. */ @@ -5314,11 +5326,7 @@ static bool aarch64_sve_cnt_immediate_p (poly_int64 value) { HOST_WIDE_INT factor = value.coeffs[0]; - /* The coefficient must be [1, 16] * {2, 4, 8, 16}. */ - return (value.coeffs[1] == factor - && IN_RANGE (factor, 2, 16 * 16) - && (factor & 1) == 0 - && factor <= 16 * (factor & -factor)); + return value.coeffs[1] == factor && aarch64_sve_cnt_factor_p (factor); } /* Likewise for rtx X. */ @@ -5434,6 +5442,50 @@ aarch64_output_sve_scalar_inc_dec (rtx offset) -offset_value.coeffs[1], 0); } +/* Return true if a single RDVL instruction can multiply FACTOR by the + number of 128-bit quadwords in an SVE vector. */ + +static bool +aarch64_sve_rdvl_factor_p (HOST_WIDE_INT factor) +{ + return (multiple_p (factor, 16) + && IN_RANGE (factor, -32 * 16, 31 * 16)); +} + +/* Return true if we can move VALUE into a register using a single + RDVL instruction. */ + +static bool +aarch64_sve_rdvl_immediate_p (poly_int64 value) +{ + HOST_WIDE_INT factor = value.coeffs[0]; + return value.coeffs[1] == factor && aarch64_sve_rdvl_factor_p (factor); +} + +/* Likewise for rtx X. */ + +bool +aarch64_sve_rdvl_immediate_p (rtx x) +{ + poly_int64 value; + return poly_int_rtx_p (x, &value) && aarch64_sve_rdvl_immediate_p (value); +} + +/* Return the asm string for moving RDVL immediate OFFSET into register + operand 0. */ + +char * +aarch64_output_sve_rdvl (rtx offset) +{ + static char buffer[sizeof ("rdvl\t%x0, #-") + 3 * sizeof (int)]; + poly_int64 offset_value = rtx_to_poly_int64 (offset); + gcc_assert (aarch64_sve_rdvl_immediate_p (offset_value)); + + int factor = offset_value.coeffs[1]; + snprintf (buffer, sizeof (buffer), "rdvl\t%%x0, #%d", factor / 16); + return buffer; +} + /* Return true if we can add VALUE to a register using a single ADDVL or ADDPL instruction. */ @@ -6063,13 +6115,13 @@ aarch64_offset_temporaries (bool add_p, poly_int64 offset) count += 1; else if (factor != 0) { - factor = abs (factor); - if (factor > 16 * (factor & -factor)) - /* Need one register for the CNT result and one for the multiplication - factor. If necessary, the second temporary can be reused for the - constant part of the offset. */ + factor /= (HOST_WIDE_INT) least_bit_hwi (factor); + if (!IN_RANGE (factor, -32, 31)) + /* Need one register for the CNT or RDVL result and one for the + multiplication factor. If necessary, the second temporary + can be reused for the constant part of the offset. */ return 2; - /* Need one register for the CNT result (which might then + /* Need one register for the CNT or RDVL result (which might then be shifted). */ count += 1; } @@ -6158,85 +6210,100 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, /* Otherwise use a CNT-based sequence. */ else if (factor != 0) { - /* Use a subtraction if we have a negative factor. */ - rtx_code code = PLUS; - if (factor < 0) - { - factor = -factor; - code = MINUS; - } + /* Calculate CNTB * FACTOR / 16 as CNTB * REL_FACTOR * 2**SHIFT, + with negative shifts indicating a shift right. */ + HOST_WIDE_INT low_bit = least_bit_hwi (factor); + HOST_WIDE_INT rel_factor = factor / low_bit; + int shift = exact_log2 (low_bit) - 4; + gcc_assert (shift >= -4 && (rel_factor & 1) != 0); + + /* Set CODE, VAL and SHIFT so that [+-] VAL * 2**SHIFT is + equal to CNTB * FACTOR / 16, with CODE being the [+-]. - /* Calculate CNTD * FACTOR / 2. First try to fold the division - into the multiplication. */ + We can avoid a multiplication if REL_FACTOR is in the range + of RDVL, although there are then various optimizations that + we can try on top. */ + rtx_code code = PLUS; rtx val; - int shift = 0; - if (factor & 1) - /* Use a right shift by 1. */ - shift = -1; - else - factor /= 2; - HOST_WIDE_INT low_bit = factor & -factor; - if (factor <= 16 * low_bit) + if (IN_RANGE (rel_factor, -32, 31)) { - if (factor > 16 * 8) + /* Try to use an unshifted CNT[BHWD] or RDVL. */ + if (aarch64_sve_cnt_factor_p (factor) + || aarch64_sve_rdvl_factor_p (factor)) + { + val = gen_int_mode (poly_int64 (factor, factor), mode); + shift = 0; + } + /* Try to subtract an unshifted CNT[BHWD]. */ + else if (aarch64_sve_cnt_factor_p (-factor)) { - /* "CNTB Xn, ALL, MUL #FACTOR" is out of range, so calculate - the value with the minimum multiplier and shift it into - position. */ - int extra_shift = exact_log2 (low_bit); - shift += extra_shift; - factor >>= extra_shift; + code = MINUS; + val = gen_int_mode (poly_int64 (-factor, -factor), mode); + shift = 0; } - val = gen_int_mode (poly_int64 (factor * 2, factor * 2), mode); + /* If subtraction is free, prefer to load a positive constant. + In the best case this will fit a shifted CNTB. */ + else if (src != const0_rtx && rel_factor < 0) + { + code = MINUS; + val = gen_int_mode (-rel_factor * BYTES_PER_SVE_VECTOR, mode); + } + /* Otherwise use a shifted RDVL or CNT[BHWD]. */ + else + val = gen_int_mode (rel_factor * BYTES_PER_SVE_VECTOR, mode); } else { - /* Base the factor on LOW_BIT if we can calculate LOW_BIT - directly, since that should increase the chances of being - able to use a shift and add sequence. If LOW_BIT itself - is out of range, just use CNTD. */ - if (low_bit <= 16 * 8) - factor /= low_bit; + /* If we can calculate CNTB << SHIFT directly, prefer to do that, + since it should increase the chances of being able to use + a shift and add sequence for the multiplication. + If CNTB << SHIFT is out of range, stick with the current + shift factor. */ + if (IN_RANGE (low_bit, 2, 16 * 16)) + { + val = gen_int_mode (poly_int64 (low_bit, low_bit), mode); + shift = 0; + } else - low_bit = 1; + val = gen_int_mode (BYTES_PER_SVE_VECTOR, mode); - val = gen_int_mode (poly_int64 (low_bit * 2, low_bit * 2), mode); val = aarch64_force_temporary (mode, temp1, val); + /* Prefer to multiply by a positive factor and subtract rather + than multiply by a negative factor and add, since positive + values are usually easier to move. */ + if (rel_factor < 0 && src != const0_rtx) + { + rel_factor = -rel_factor; + code = MINUS; + } + if (can_create_pseudo_p ()) { - rtx coeff1 = gen_int_mode (factor, mode); + rtx coeff1 = gen_int_mode (rel_factor, mode); val = expand_mult (mode, val, coeff1, NULL_RTX, true, true); } else { - /* Go back to using a negative multiplication factor if we have - no register from which to subtract. */ - if (code == MINUS && src == const0_rtx) - { - factor = -factor; - code = PLUS; - } - rtx coeff1 = gen_int_mode (factor, mode); + rtx coeff1 = gen_int_mode (rel_factor, mode); coeff1 = aarch64_force_temporary (mode, temp2, coeff1); val = gen_rtx_MULT (mode, val, coeff1); } } + /* Multiply by 2 ** SHIFT. */ if (shift > 0) { - /* Multiply by 1 << SHIFT. */ val = aarch64_force_temporary (mode, temp1, val); val = gen_rtx_ASHIFT (mode, val, GEN_INT (shift)); } - else if (shift == -1) + else if (shift < 0) { - /* Divide by 2. */ val = aarch64_force_temporary (mode, temp1, val); - val = gen_rtx_ASHIFTRT (mode, val, const1_rtx); + val = gen_rtx_ASHIFTRT (mode, val, GEN_INT (-shift)); } - /* Calculate SRC +/- CNTD * FACTOR / 2. */ + /* Add the result to SRC or subtract the result from SRC. */ if (src != const0_rtx) { val = aarch64_force_temporary (mode, temp1, val); @@ -6882,7 +6949,9 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) aarch64_report_sve_required (); return; } - if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset)) + if (base == const0_rtx + && (aarch64_sve_cnt_immediate_p (offset) + || aarch64_sve_rdvl_immediate_p (offset))) emit_insn (gen_rtx_SET (dest, imm)); else { @@ -22019,7 +22088,9 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) if (SYMBOL_REF_P (x) && mode == DImode && CONSTANT_ADDRESS_P (x)) return true; - if (TARGET_SVE && aarch64_sve_cnt_immediate_p (x)) + if (TARGET_SVE + && (aarch64_sve_cnt_immediate_p (x) + || aarch64_sve_rdvl_immediate_p (x))) return true; return aarch64_classify_symbolic_expression (x) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 735a70bc3ff..e5f55d98057 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1230,6 +1230,7 @@ (define_insn "*mov_aarch64" [w, D; neon_move , simd ] << aarch64_output_scalar_simd_mov_immediate (operands[1], mode); /* The "mov_imm" type for CNT is just a placeholder. */ [r, Usv ; mov_imm , sve ] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r, Usr ; mov_imm , sve ] << aarch64_output_sve_rdvl (operands[1]); [r, m ; load_4 , * ] ldr\t%w0, %1 [w, m ; load_4 , * ] ldr\t%0, %1 [m, r Z ; store_4 , * ] str\\t%w1, %0 @@ -1289,6 +1290,7 @@ (define_insn_and_split "*movsi_aarch64" [r , n ; mov_imm , * ,16] # /* The "mov_imm" type for CNT is just a placeholder. */ [r , Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r , Usr; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); [r , m ; load_4 , * , 4] ldr\t%w0, %1 [w , m ; load_4 , fp , 4] ldr\t%s0, %1 [m , r Z; store_4 , * , 4] str\t%w1, %0 @@ -1324,6 +1326,7 @@ (define_insn_and_split "*movdi_aarch64" [r, n ; mov_imm , * ,16] # /* The "mov_imm" type for CNT is just a placeholder. */ [r, Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r, Usr; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); [r, m ; load_8 , * , 4] ldr\t%x0, %1 [w, m ; load_8 , fp , 4] ldr\t%d0, %1 [m, r Z; store_8 , * , 4] str\t%x1, %0 diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index b3922bcb9a8..5c02d15c77a 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -219,6 +219,12 @@ (define_constraint "Ulc" (and (match_code "const_int") (match_test "aarch64_high_bits_all_ones_p (ival)"))) +(define_constraint "Usr" + "@internal + A constraint that matches a value produced by RDVL." + (and (match_code "const_poly_int") + (match_test "aarch64_sve_rdvl_immediate_p (op)"))) + (define_constraint "Usv" "@internal A constraint that matches a VG-based constant that can be loaded by diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntb.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntb.c index 8b8fe8e4f2b..a22d8a28d86 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntb.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntb.c @@ -51,19 +51,24 @@ PROTO (cntb_15, uint64_t, ()) { return svcntb () * 15; } */ PROTO (cntb_16, uint64_t, ()) { return svcntb () * 16; } -/* Other sequences would be OK. */ /* ** cntb_17: -** cntb x0, all, mul #16 -** incb x0 +** rdvl x0, #17 ** ret */ PROTO (cntb_17, uint64_t, ()) { return svcntb () * 17; } +/* +** cntb_31: +** rdvl x0, #31 +** ret +*/ +PROTO (cntb_31, uint64_t, ()) { return svcntb () * 31; } + /* ** cntb_32: -** cntd (x[0-9]+) -** lsl x0, \1, 8 +** cntb (x[0-9]+) +** lsl x0, \1, 5 ** ret */ PROTO (cntb_32, uint64_t, ()) { return svcntb () * 32; } @@ -80,16 +85,16 @@ PROTO (cntb_33, uint64_t, ()) { return svcntb () * 33; } /* ** cntb_64: -** cntd (x[0-9]+) -** lsl x0, \1, 9 +** cntb (x[0-9]+) +** lsl x0, \1, 6 ** ret */ PROTO (cntb_64, uint64_t, ()) { return svcntb () * 64; } /* ** cntb_128: -** cntd (x[0-9]+) -** lsl x0, \1, 10 +** cntb (x[0-9]+) +** lsl x0, \1, 7 ** ret */ PROTO (cntb_128, uint64_t, ()) { return svcntb () * 128; } @@ -106,46 +111,70 @@ PROTO (cntb_129, uint64_t, ()) { return svcntb () * 129; } /* ** cntb_m1: -** cntb (x[0-9]+) -** neg x0, \1 +** rdvl x0, #-1 ** ret */ PROTO (cntb_m1, uint64_t, ()) { return -svcntb (); } /* ** cntb_m13: -** cntb (x[0-9]+), all, mul #13 -** neg x0, \1 +** rdvl x0, #-13 ** ret */ PROTO (cntb_m13, uint64_t, ()) { return -svcntb () * 13; } /* ** cntb_m15: -** cntb (x[0-9]+), all, mul #15 -** neg x0, \1 +** rdvl x0, #-15 ** ret */ PROTO (cntb_m15, uint64_t, ()) { return -svcntb () * 15; } /* ** cntb_m16: -** cntb (x[0-9]+), all, mul #16 -** neg x0, \1 +** rdvl x0, #-16 ** ret */ PROTO (cntb_m16, uint64_t, ()) { return -svcntb () * 16; } -/* Other sequences would be OK. */ /* ** cntb_m17: -** cntb x0, all, mul #16 -** incb x0 -** neg x0, x0 +** rdvl x0, #-17 ** ret */ PROTO (cntb_m17, uint64_t, ()) { return -svcntb () * 17; } +/* +** cntb_m32: +** rdvl x0, #-32 +** ret +*/ +PROTO (cntb_m32, uint64_t, ()) { return -svcntb () * 32; } + +/* +** cntb_m33: +** rdvl x0, #-32 +** decb x0 +** ret +*/ +PROTO (cntb_m33, uint64_t, ()) { return -svcntb () * 33; } + +/* +** cntb_m34: +** rdvl (x[0-9]+), #-17 +** lsl x0, \1, #?1 +** ret +*/ +PROTO (cntb_m34, uint64_t, ()) { return -svcntb () * 34; } + +/* +** cntb_m64: +** rdvl (x[0-9]+), #-1 +** lsl x0, \1, #?6 +** ret +*/ +PROTO (cntb_m64, uint64_t, ()) { return -svcntb () * 64; } + /* ** incb_1: ** incb x0 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntd.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntd.c index 0d0ed4849f1..090a643b418 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntd.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntd.c @@ -54,8 +54,8 @@ PROTO (cntd_16, uint64_t, ()) { return svcntd () * 16; } /* Other sequences would be OK. */ /* ** cntd_17: -** cntb x0, all, mul #2 -** incd x0 +** rdvl (x[0-9]+), #17 +** asr x0, \1, 3 ** ret */ PROTO (cntd_17, uint64_t, ()) { return svcntd () * 17; } @@ -107,8 +107,7 @@ PROTO (cntd_m15, uint64_t, ()) { return -svcntd () * 15; } /* ** cntd_m16: -** cntb (x[0-9]+), all, mul #2 -** neg x0, \1 +** rdvl x0, #-2 ** ret */ PROTO (cntd_m16, uint64_t, ()) { return -svcntd () * 16; } @@ -116,9 +115,8 @@ PROTO (cntd_m16, uint64_t, ()) { return -svcntd () * 16; } /* Other sequences would be OK. */ /* ** cntd_m17: -** cntb x0, all, mul #2 -** incd x0 -** neg x0, x0 +** rdvl (x[0-9]+), #-17 +** asr x0, \1, 3 ** ret */ PROTO (cntd_m17, uint64_t, ()) { return -svcntd () * 17; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnth.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnth.c index c29930f1591..1a4e7dc0e01 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnth.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cnth.c @@ -54,8 +54,8 @@ PROTO (cnth_16, uint64_t, ()) { return svcnth () * 16; } /* Other sequences would be OK. */ /* ** cnth_17: -** cntb x0, all, mul #8 -** inch x0 +** rdvl (x[0-9]+), #17 +** asr x0, \1, 1 ** ret */ PROTO (cnth_17, uint64_t, ()) { return svcnth () * 17; } @@ -69,16 +69,16 @@ PROTO (cnth_32, uint64_t, ()) { return svcnth () * 32; } /* ** cnth_64: -** cntd (x[0-9]+) -** lsl x0, \1, 8 +** cntb (x[0-9]+) +** lsl x0, \1, 5 ** ret */ PROTO (cnth_64, uint64_t, ()) { return svcnth () * 64; } /* ** cnth_128: -** cntd (x[0-9]+) -** lsl x0, \1, 9 +** cntb (x[0-9]+) +** lsl x0, \1, 6 ** ret */ PROTO (cnth_128, uint64_t, ()) { return svcnth () * 128; } @@ -109,8 +109,7 @@ PROTO (cnth_m15, uint64_t, ()) { return -svcnth () * 15; } /* ** cnth_m16: -** cntb (x[0-9]+), all, mul #8 -** neg x0, \1 +** rdvl x0, #-8 ** ret */ PROTO (cnth_m16, uint64_t, ()) { return -svcnth () * 16; } @@ -118,9 +117,8 @@ PROTO (cnth_m16, uint64_t, ()) { return -svcnth () * 16; } /* Other sequences would be OK. */ /* ** cnth_m17: -** cntb x0, all, mul #8 -** inch x0 -** neg x0, x0 +** rdvl (x[0-9]+), #-17 +** asr x0, \1, 1 ** ret */ PROTO (cnth_m17, uint64_t, ()) { return -svcnth () * 17; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntw.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntw.c index e26cc67a467..9d169769094 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntw.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/cntw.c @@ -54,8 +54,8 @@ PROTO (cntw_16, uint64_t, ()) { return svcntw () * 16; } /* Other sequences would be OK. */ /* ** cntw_17: -** cntb x0, all, mul #4 -** incw x0 +** rdvl (x[0-9]+), #17 +** asr x0, \1, 2 ** ret */ PROTO (cntw_17, uint64_t, ()) { return svcntw () * 17; } @@ -76,8 +76,8 @@ PROTO (cntw_64, uint64_t, ()) { return svcntw () * 64; } /* ** cntw_128: -** cntd (x[0-9]+) -** lsl x0, \1, 8 +** cntb (x[0-9]+) +** lsl x0, \1, 5 ** ret */ PROTO (cntw_128, uint64_t, ()) { return svcntw () * 128; } @@ -108,8 +108,7 @@ PROTO (cntw_m15, uint64_t, ()) { return -svcntw () * 15; } /* ** cntw_m16: -** cntb (x[0-9]+), all, mul #4 -** neg x0, \1 +** rdvl (x[0-9]+), #-4 ** ret */ PROTO (cntw_m16, uint64_t, ()) { return -svcntw () * 16; } @@ -117,9 +116,8 @@ PROTO (cntw_m16, uint64_t, ()) { return -svcntw () * 16; } /* Other sequences would be OK. */ /* ** cntw_m17: -** cntb x0, all, mul #4 -** incw x0 -** neg x0, x0 +** rdvl (x[0-9]+), #-17 +** asr x0, \1, 2 ** ret */ PROTO (cntw_m17, uint64_t, ()) { return -svcntw () * 17; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c index c90730a037c..94cd3a0662e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb.c @@ -218,8 +218,8 @@ TEST_PREFETCH (prfb_vnum_31, uint16_t, /* ** prfb_vnum_32: -** cntd (x[0-9]+) -** lsl (x[0-9]+), \1, #?8 +** cntb (x[0-9]+) +** lsl (x[0-9]+), \1, #?5 ** add (x[0-9]+), (\2, x0|x0, \2) ** prfb pldl1keep, p0, \[\3\] ** ret @@ -240,7 +240,7 @@ TEST_PREFETCH (prfb_vnum_m32, uint16_t, /* ** prfb_vnum_m33: ** ... -** prfb pldl1keep, p0, \[x[0-9]+\] +** prfb pldl1keep, p0, \[x[0-9]+(, x[0-9]+)?\] ** ret */ TEST_PREFETCH (prfb_vnum_m33, uint16_t, diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c index 869ef3d3eeb..b7a116cf056 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd.c @@ -218,8 +218,8 @@ TEST_PREFETCH (prfd_vnum_31, uint16_t, /* ** prfd_vnum_32: -** cntd (x[0-9]+) -** lsl (x[0-9]+), \1, #?8 +** cntb (x[0-9]+) +** lsl (x[0-9]+), \1, #?5 ** add (x[0-9]+), (\2, x0|x0, \2) ** prfd pldl1keep, p0, \[\3\] ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh.c index 45a735eaea0..9d3df6bd3a8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh.c @@ -218,8 +218,8 @@ TEST_PREFETCH (prfh_vnum_31, uint16_t, /* ** prfh_vnum_32: -** cntd (x[0-9]+) -** lsl (x[0-9]+), \1, #?8 +** cntb (x[0-9]+) +** lsl (x[0-9]+), \1, #?5 ** add (x[0-9]+), (\2, x0|x0, \2) ** prfh pldl1keep, p0, \[\3\] ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw.c index 444187f45d9..6962abab600 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw.c @@ -218,8 +218,8 @@ TEST_PREFETCH (prfw_vnum_31, uint16_t, /* ** prfw_vnum_32: -** cntd (x[0-9]+) -** lsl (x[0-9]+), \1, #?8 +** cntb (x[0-9]+) +** lsl (x[0-9]+), \1, #?5 ** add (x[0-9]+), (\2, x0|x0, \2) ** prfw pldl1keep, p0, \[\3\] ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c b/gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c index 9ead9c21b35..7f02497e839 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/loop_add_4.c @@ -68,8 +68,7 @@ TEST_ALL (LOOP) /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */ /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ /* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ -/* 2 for the calculations of -17 and 17. */ -/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 10 } } */ +/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 8 } } */ /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #16\n} 1 } } */ /* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #15\n} 1 } } */ @@ -86,8 +85,7 @@ TEST_ALL (LOOP) /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */ /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ /* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ -/* 2 for the calculations of -17 and 17. */ -/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 10 } } */ +/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 8 } } */ /* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #16\n} 1 } } */ /* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #15\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1.c index 110947a6c4a..5de34fc6163 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1.c @@ -6,8 +6,7 @@ /* ** test_1: -** cntd x12, all, mul #9 -** lsl x12, x12, #?4 +** rdvl x12, #18 ** mov x11, sp ** ... ** sub sp, sp, x12 From patchwork Fri Nov 17 17:24:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865154 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3gj4JcLz1yRM for ; Sat, 18 Nov 2023 04:25:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27DF0386F43C for ; Fri, 17 Nov 2023 17:25:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 6CAEE3860765 for ; Fri, 17 Nov 2023 17:24:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6CAEE3860765 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6CAEE3860765 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241893; cv=none; b=e/Rv+4cqsy0mfanHvlAotItMGcyIr/i8qg6IBwJbtylI3a6IeqbGf4pEF6D3yM0QoSb84Z4ptpWSPoGNa1QIVYdGpajmSTvQWFhIfcWfBSOkvoQaxz4bqZGQDaYdnP2w+9TqQKOnMISjLh8xL16D4KWhsUTztLTBpT3r0JHQMqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241893; c=relaxed/simple; bh=aZf0IXggLmfP7hVQ6/KrBp7iVMH6fohGaPDejp1ZfhQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=KuJqGHVwcBoNGQbQmE7mqAPdG4HEblB+IN0UxcUtyDM/KKGRq4/trpdTH7bUIIo2x8QwsTZOEbXeREIatIRgrE/TtmzhS+Lo/G/8gVfJO1AOUyWTcCTEUa9IQHoT+xpbkKVrAVBWseSEDaAVwmSQUz2mRHpA/i7ipNtNSh/Oj+0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 10E481477 for ; Fri, 17 Nov 2023 09:25:37 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A086A3F73F for ; Fri, 17 Nov 2023 09:24:50 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 04/21] aarch64: Make AARCH64_FL_SVE requirements explicit References: Date: Fri, 17 Nov 2023 17:24:49 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-23.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org So far, all intrinsics covered by the aarch64-sve-builtins* framework have (naturally enough) required at least SVE. However, arm_sme.h defines a couple of intrinsics that can be called by any code. It's therefore necessary to make the implicit SVE requirement explicit. gcc/ * config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove implied requirement on SVE. * config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE. * config/aarch64/aarch64-sve-builtins-sve2.def: Likewise. --- .../aarch64/aarch64-sve-builtins-base.def | 10 +++++----- .../aarch64/aarch64-sve-builtins-sve2.def | 18 +++++++++++++----- gcc/config/aarch64/aarch64-sve-builtins.cc | 2 +- 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def index 95ae1d71629..0484863d3f7 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.def +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def @@ -17,7 +17,7 @@ along with GCC; see the file COPYING3. If not see . */ -#define REQUIRED_EXTENSIONS 0 +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE DEF_SVE_FUNCTION (svabd, binary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svabs, unary, all_float_and_signed, mxz) DEF_SVE_FUNCTION (svacge, compare_opt_n, all_float, implicit) @@ -318,7 +318,7 @@ DEF_SVE_FUNCTION (svzip2, binary, all_data, none) DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS AARCH64_FL_BF16 +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_BF16 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none) DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none) DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none) @@ -330,7 +330,7 @@ DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz) DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_I8MM DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none) DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none) DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none) @@ -339,11 +339,11 @@ DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, s_signed, none) DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F32MM DEF_SVE_FUNCTION (svmmla, mmla, s_float, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F64MM DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit) DEF_SVE_FUNCTION (svmmla, mmla, d_float, none) DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def index dd6d1357d51..565393f3081 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def @@ -17,7 +17,7 @@ along with GCC; see the file COPYING3. If not see . */ -#define REQUIRED_EXTENSIONS AARCH64_FL_SVE2 +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SVE2 DEF_SVE_FUNCTION (svaba, ternary_opt_n, all_integer, none) DEF_SVE_FUNCTION (svabalb, ternary_long_opt_n, hsd_integer, none) DEF_SVE_FUNCTION (svabalt, ternary_long_opt_n, hsd_integer, none) @@ -189,7 +189,9 @@ DEF_SVE_FUNCTION (svwhilewr, compare_ptr, all_data, none) DEF_SVE_FUNCTION (svxar, ternary_shift_right_imm, all_integer, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_AES) +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_SVE2 \ + | AARCH64_FL_SVE2_AES) DEF_SVE_FUNCTION (svaesd, binary, b_unsigned, none) DEF_SVE_FUNCTION (svaese, binary, b_unsigned, none) DEF_SVE_FUNCTION (svaesmc, unary, b_unsigned, none) @@ -198,17 +200,23 @@ DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, d_unsigned, none) DEF_SVE_FUNCTION (svpmullt_pair, binary_opt_n, d_unsigned, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM) +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_SVE2 \ + | AARCH64_FL_SVE2_BITPERM) DEF_SVE_FUNCTION (svbdep, binary_opt_n, all_unsigned, none) DEF_SVE_FUNCTION (svbext, binary_opt_n, all_unsigned, none) DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_SHA3) +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_SVE2 \ + | AARCH64_FL_SVE2_SHA3) DEF_SVE_FUNCTION (svrax1, binary, d_integer, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_SM4) +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_SVE2 \ + | AARCH64_FL_SVE2_SM4) DEF_SVE_FUNCTION (svsm4e, binary, s_unsigned, none) DEF_SVE_FUNCTION (svsm4ekey, binary, s_unsigned, none) #undef REQUIRED_EXTENSIONS diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 8b7b885a8f4..676634ca11b 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -525,7 +525,7 @@ static const predication_index preds_z[] = { PRED_z, NUM_PREDS }; static CONSTEXPR const function_group_info function_groups[] = { #define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS, \ - REQUIRED_EXTENSIONS | AARCH64_FL_SVE }, + REQUIRED_EXTENSIONS }, #include "aarch64-sve-builtins.def" }; From patchwork Fri Nov 17 17:25:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865155 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3h05jdhz1yS7 for ; Sat, 18 Nov 2023 04:25:20 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 00BC738654AD for ; Fri, 17 Nov 2023 17:25:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A18453858418 for ; Fri, 17 Nov 2023 17:25:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A18453858418 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A18453858418 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241907; cv=none; b=BEPt2W1LSPgacGpCviH17MMbPZyUrg39PfH5/wXODqthjuK/hYNDeWMMkQyupcMP7zvEqbsEFdNskTdydoF3Wq4xgWFREtGsAjTVVFC8kLkZDhP9CI/QAGgdwQTROEz62bBpXwvtWsdL5KdJ8vMNkht85XWb4JTXm2o1BIzLEYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241907; c=relaxed/simple; bh=JJCt3btU6WHz9AJCiCC9OkABeMnLZJ4u2/F1lmhaqsk=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=AjZdDyl5Awt0ISLL2APj+LWOOd3Ev3I+wr/4Bx9at2134WmeFHXwDHCUgV6oH6GKGNqEDvWN+MJxZ+dRGu8cYSEINA9feg+eD5ivaofDtn8aNohyT+q+7FkkE98lJMHcYgY5YtNci3buOn3CWWyk4WgwtMJxuBwmVAatjh3MdC0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7297C1477 for ; Fri, 17 Nov 2023 09:25:49 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E69783F73F for ; Fri, 17 Nov 2023 09:25:02 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 05/21] aarch64: Add group suffixes to SVE intrinsics References: Date: Fri, 17 Nov 2023 17:25:01 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-23.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The SME2 ACLE adds a new "group" suffix component to the naming convention for SVE intrinsics. This is also used in the new tuple forms of the svreinterpret intrinsics. This patch adds support for group suffixes and defines the x2, x3 and x4 suffixes that are needed for the svreinterprets. gcc/ * config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Take a group suffix index parameter. (build_32_64, build_all): Update accordingly. Iterate over all group suffixes. * config/aarch64/aarch64-sve-builtins-sve2.cc (svqrshl_impl::fold) (svqshl_impl::fold, svrshl_impl::fold): Update function_instance constructors. * config/aarch64/aarch64-sve-builtins.cc (group_suffixes): New array. (groups_none): New constant. (function_groups): Initialize the groups field. (function_instance::hash): Hash the group index. (function_builder::get_name): Add the group suffix. (function_builder::add_overloaded_functions): Iterate over all group suffixes. (function_resolver::lookup_form): Take a group suffix parameter. (function_resolver::resolve_to): Likewise. * config/aarch64/aarch64-sve-builtins.def (DEF_SVE_GROUP_SUFFIX): New macro. (x2, x3, x4): New group suffixes. * config/aarch64/aarch64-sve-builtins.h (group_suffix_index): New enum. (group_suffix_info): New structure. (function_group_info::groups): New member variable. (function_instance::group_suffix_id): Likewise. (group_suffixes): New array. (function_instance::operator==): Compare the group suffixes. (function_instance::group_suffix): New function. --- .../aarch64/aarch64-sve-builtins-shapes.cc | 53 ++++++------ .../aarch64/aarch64-sve-builtins-sve2.cc | 10 +-- gcc/config/aarch64/aarch64-sve-builtins.cc | 84 +++++++++++++------ gcc/config/aarch64/aarch64-sve-builtins.def | 9 ++ gcc/config/aarch64/aarch64-sve-builtins.h | 81 ++++++++++++++---- 5 files changed, 165 insertions(+), 72 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc index 1646afc7a0d..dc255fc59f2 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc @@ -275,18 +275,20 @@ parse_signature (const function_instance &instance, const char *format, } /* Add one function instance for GROUP, using mode suffix MODE_SUFFIX_ID, - the type suffixes at index TI and the predication suffix at index PI. - The other arguments are as for build_all. */ + the type suffixes at index TI, the group suffixes at index GI, and the + predication suffix at index PI. The other arguments are as for + build_all. */ static void build_one (function_builder &b, const char *signature, const function_group_info &group, mode_suffix_index mode_suffix_id, - unsigned int ti, unsigned int pi, bool force_direct_overloads) + unsigned int ti, unsigned int gi, unsigned int pi, + bool force_direct_overloads) { /* Byte forms of svdupq take 16 arguments. */ auto_vec argument_types; function_instance instance (group.base_name, *group.base, *group.shape, mode_suffix_id, group.types[ti], - group.preds[pi]); + group.groups[gi], group.preds[pi]); tree return_type = parse_signature (instance, signature, argument_types); apply_predication (instance, return_type, argument_types); b.add_unique_function (instance, return_type, argument_types, @@ -312,24 +314,26 @@ build_32_64 (function_builder &b, const char *signature, mode_suffix_index mode64, bool force_direct_overloads = false) { for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi) - if (group.types[0][0] == NUM_TYPE_SUFFIXES) - { - gcc_assert (mode32 != MODE_none && mode64 != MODE_none); - build_one (b, signature, group, mode32, 0, pi, - force_direct_overloads); - build_one (b, signature, group, mode64, 0, pi, - force_direct_overloads); - } - else - for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti) + for (unsigned int gi = 0; group.groups[gi] != NUM_GROUP_SUFFIXES; ++gi) + if (group.types[0][0] == NUM_TYPE_SUFFIXES) { - unsigned int bits = type_suffixes[group.types[ti][0]].element_bits; - gcc_assert (bits == 32 || bits == 64); - mode_suffix_index mode = bits == 32 ? mode32 : mode64; - if (mode != MODE_none) - build_one (b, signature, group, mode, ti, pi, - force_direct_overloads); + gcc_assert (mode32 != MODE_none && mode64 != MODE_none); + build_one (b, signature, group, mode32, 0, gi, pi, + force_direct_overloads); + build_one (b, signature, group, mode64, 0, gi, pi, + force_direct_overloads); } + else + for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES; + ++ti) + { + unsigned int bits = type_suffixes[group.types[ti][0]].element_bits; + gcc_assert (bits == 32 || bits == 64); + mode_suffix_index mode = bits == 32 ? mode32 : mode64; + if (mode != MODE_none) + build_one (b, signature, group, mode, ti, gi, pi, + force_direct_overloads); + } } /* For every type and predicate combination in GROUP, add one function @@ -423,10 +427,11 @@ build_all (function_builder &b, const char *signature, bool force_direct_overloads = false) { for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi) - for (unsigned int ti = 0; - ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti) - build_one (b, signature, group, mode_suffix_id, ti, pi, - force_direct_overloads); + for (unsigned int gi = 0; group.groups[gi] != NUM_GROUP_SUFFIXES; ++gi) + for (unsigned int ti = 0; + ti == 0 || group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti) + build_one (b, signature, group, mode_suffix_id, ti, gi, pi, + force_direct_overloads); } /* TYPE is the largest type suffix associated with the arguments of R, diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc index 9e989fca2ab..73f9e5a899c 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.cc @@ -247,7 +247,7 @@ public: that we can use for sensible shift amounts. */ function_instance instance ("svqshl", functions::svqshl, shapes::binary_int_opt_n, MODE_n, - f.type_suffix_ids, f.pred); + f.type_suffix_ids, GROUP_none, f.pred); return f.redirect_call (instance); } else @@ -256,7 +256,7 @@ public: that we can use for sensible shift amounts. */ function_instance instance ("svrshl", functions::svrshl, shapes::binary_int_opt_n, MODE_n, - f.type_suffix_ids, f.pred); + f.type_suffix_ids, GROUP_none, f.pred); return f.redirect_call (instance); } } @@ -285,7 +285,7 @@ public: -wi::to_wide (amount)); function_instance instance ("svasr", functions::svasr, shapes::binary_uint_opt_n, MODE_n, - f.type_suffix_ids, f.pred); + f.type_suffix_ids, GROUP_none, f.pred); if (f.type_suffix (0).unsigned_p) { instance.base_name = "svlsr"; @@ -317,7 +317,7 @@ public: that we can use for sensible shift amounts. */ function_instance instance ("svlsl", functions::svlsl, shapes::binary_uint_opt_n, MODE_n, - f.type_suffix_ids, f.pred); + f.type_suffix_ids, GROUP_none, f.pred); gcall *call = as_a (f.redirect_call (instance)); gimple_call_set_arg (call, 2, amount); return call; @@ -330,7 +330,7 @@ public: -wi::to_wide (amount)); function_instance instance ("svrshr", functions::svrshr, shapes::shift_right_imm, MODE_n, - f.type_suffix_ids, f.pred); + f.type_suffix_ids, GROUP_none, f.pred); gcall *call = as_a (f.redirect_call (instance)); gimple_call_set_arg (call, 2, amount); return call; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 676634ca11b..196534df61e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -144,6 +144,13 @@ CONSTEXPR const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1] = { 0, VOIDmode } }; +CONSTEXPR const group_suffix_info group_suffixes[] = { +#define DEF_SVE_GROUP_SUFFIX(NAME, VG, VECTORS_PER_TUPLE) \ + { "_" #NAME, VG, VECTORS_PER_TUPLE }, +#include "aarch64-sve-builtins.def" + { "", 0, 1 } +}; + /* Define a TYPES_ macro for each combination of type suffixes that an ACLE function can have, where is the name used in DEF_SVE_FUNCTION entries. @@ -483,6 +490,10 @@ DEF_SVE_TYPES_ARRAY (inc_dec_n); DEF_SVE_TYPES_ARRAY (reinterpret); DEF_SVE_TYPES_ARRAY (while); +static const group_suffix_index groups_none[] = { + GROUP_none, NUM_GROUP_SUFFIXES +}; + /* Used by functions that have no governing predicate. */ static const predication_index preds_none[] = { PRED_none, NUM_PREDS }; @@ -524,8 +535,8 @@ static const predication_index preds_z[] = { PRED_z, NUM_PREDS }; /* A list of all SVE ACLE functions. */ static CONSTEXPR const function_group_info function_groups[] = { #define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ - { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, preds_##PREDS, \ - REQUIRED_EXTENSIONS }, + { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, groups_none, \ + preds_##PREDS, REQUIRED_EXTENSIONS }, #include "aarch64-sve-builtins.def" }; @@ -788,6 +799,7 @@ function_instance::hash () const h.add_int (mode_suffix_id); h.add_int (type_suffix_ids[0]); h.add_int (type_suffix_ids[1]); + h.add_int (group_suffix_id); h.add_int (pred); return h.end (); } @@ -957,6 +969,8 @@ function_builder::get_name (const function_instance &instance, for (unsigned int i = 0; i < 2; ++i) if (!overloaded_p || instance.shape->explicit_type_suffix_p (i)) append_name (instance.type_suffix (i).string); + if (!overloaded_p || instance.shape->explicit_group_suffix_p ()) + append_name (instance.group_suffix ().string); append_name (pred_suffixes[instance.pred]); return finish_name (); } @@ -1113,19 +1127,26 @@ void function_builder::add_overloaded_functions (const function_group_info &group, mode_suffix_index mode) { - unsigned int explicit_type0 = (*group.shape)->explicit_type_suffix_p (0); - unsigned int explicit_type1 = (*group.shape)->explicit_type_suffix_p (1); - for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi) + bool explicit_type0 = (*group.shape)->explicit_type_suffix_p (0); + bool explicit_type1 = (*group.shape)->explicit_type_suffix_p (1); + bool explicit_group = (*group.shape)->explicit_group_suffix_p (); + auto add_function = [&](const type_suffix_pair &types, + group_suffix_index group_suffix_id, + unsigned int pi) + { + function_instance instance (group.base_name, *group.base, + *group.shape, mode, types, + group_suffix_id, group.preds[pi]); + add_overloaded_function (instance, group.required_extensions); + }; + + auto add_group_suffix = [&](group_suffix_index group_suffix_id, + unsigned int pi) { if (!explicit_type0 && !explicit_type1) - { - /* Deal with the common case in which there is one overloaded - function for all type combinations. */ - function_instance instance (group.base_name, *group.base, - *group.shape, mode, types_none[0], - group.preds[pi]); - add_overloaded_function (instance, group.required_extensions); - } + /* Deal with the common case in which there is one overloaded + function for all type combinations. */ + add_function (types_none[0], group_suffix_id, pi); else for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti) @@ -1136,12 +1157,16 @@ function_builder::add_overloaded_functions (const function_group_info &group, explicit_type0 ? group.types[ti][0] : NUM_TYPE_SUFFIXES, explicit_type1 ? group.types[ti][1] : NUM_TYPE_SUFFIXES }; - function_instance instance (group.base_name, *group.base, - *group.shape, mode, types, - group.preds[pi]); - add_overloaded_function (instance, group.required_extensions); + add_function (types, group_suffix_id, pi); } - } + }; + + for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi) + if (explicit_group) + for (unsigned int gi = 0; group.groups[gi] != NUM_GROUP_SUFFIXES; ++gi) + add_group_suffix (group.groups[gi], pi); + else + add_group_suffix (GROUP_none, pi); } /* Register all the functions in GROUP. */ @@ -1213,29 +1238,34 @@ function_resolver::report_no_such_form (type_suffix_index type) } /* Silently check whether there is an instance of the function with the - mode suffix given by MODE and the type suffixes given by TYPE0 and TYPE1. - Return its function decl if so, otherwise return null. */ + mode suffix given by MODE, the type suffixes given by TYPE0 and TYPE1, + and the group suffix given by GROUP. Return its function decl if so, + otherwise return null. */ tree function_resolver::lookup_form (mode_suffix_index mode, type_suffix_index type0, - type_suffix_index type1) + type_suffix_index type1, + group_suffix_index group) { type_suffix_pair types = { type0, type1 }; - function_instance instance (base_name, base, shape, mode, types, pred); + function_instance instance (base_name, base, shape, mode, types, + group, pred); registered_function *rfn = function_table->find_with_hash (instance, instance.hash ()); return rfn ? rfn->decl : NULL_TREE; } -/* Resolve the function to one with the mode suffix given by MODE and the - type suffixes given by TYPE0 and TYPE1. Return its function decl on - success, otherwise report an error and return error_mark_node. */ +/* Resolve the function to one with the mode suffix given by MODE, the + type suffixes given by TYPE0 and TYPE1, and group suffix given by + GROUP. Return its function decl on success, otherwise report an + error and return error_mark_node. */ tree function_resolver::resolve_to (mode_suffix_index mode, type_suffix_index type0, - type_suffix_index type1) + type_suffix_index type1, + group_suffix_index group) { - tree res = lookup_form (mode, type0, type1); + tree res = lookup_form (mode, type0, type1, group); if (!res) { if (type1 == NUM_TYPE_SUFFIXES) diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def index 534f6e69d72..5fbd486d74e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.def +++ b/gcc/config/aarch64/aarch64-sve-builtins.def @@ -29,6 +29,10 @@ #define DEF_SVE_TYPE_SUFFIX(A, B, C, D, E) #endif +#ifndef DEF_SVE_GROUP_SUFFIX +#define DEF_SVE_GROUP_SUFFIX(A, B, C) +#endif + #ifndef DEF_SVE_FUNCTION #define DEF_SVE_FUNCTION(A, B, C, D) #endif @@ -95,10 +99,15 @@ DEF_SVE_TYPE_SUFFIX (u16, svuint16_t, unsigned, 16, VNx8HImode) DEF_SVE_TYPE_SUFFIX (u32, svuint32_t, unsigned, 32, VNx4SImode) DEF_SVE_TYPE_SUFFIX (u64, svuint64_t, unsigned, 64, VNx2DImode) +DEF_SVE_GROUP_SUFFIX (x2, 0, 2) +DEF_SVE_GROUP_SUFFIX (x3, 0, 3) +DEF_SVE_GROUP_SUFFIX (x4, 0, 4) + #include "aarch64-sve-builtins-base.def" #include "aarch64-sve-builtins-sve2.def" #undef DEF_SVE_FUNCTION +#undef DEF_SVE_GROUP_SUFFIX #undef DEF_SVE_TYPE_SUFFIX #undef DEF_SVE_TYPE #undef DEF_SVE_MODE diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 7cf8f45b3d5..a861e22ae6c 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -180,6 +180,17 @@ enum type_suffix_index NUM_TYPE_SUFFIXES }; +/* Enumerates the possible group suffixes. Each suffix combines two + optional pieces of information: the vector group size in a ZA index, + and the number of vectors in the largest tuple argument. */ +enum group_suffix_index +{ +#define DEF_SVE_GROUP_SUFFIX(NAME, VG, VECTORS_PER_TUPLE) GROUP_##NAME, +#include "aarch64-sve-builtins.def" + GROUP_none, + NUM_GROUP_SUFFIXES +}; + /* Combines two type suffixes. */ typedef enum type_suffix_index type_suffix_pair[2]; @@ -237,6 +248,21 @@ struct type_suffix_info machine_mode vector_mode : 16; }; +/* Static information about a group suffix. */ +struct group_suffix_info +{ + /* The suffix string itself. */ + const char *string; + + /* If the suffix describes a vector group in a ZA index, this is the + size of that group, otherwise it is zero. */ + unsigned int vg; + + /* The number of vectors in the largest (or only) tuple argument, + or 1 if the suffix does not convey this information. */ + unsigned int vectors_per_tuple; +}; + /* Static information about a set of functions. */ struct function_group_info { @@ -251,14 +277,16 @@ struct function_group_info shapes. */ const function_shape *const *shape; - /* A list of the available type suffixes, and of the available predication - types. The function supports every combination of the two. + /* A list of the available type suffixes, group suffixes, and predication + types. The function supports every combination of the three. + + The list of type suffixes is terminated by two NUM_TYPE_SUFFIXES. + It is lexicographically ordered based on the index value. - The list of type suffixes is terminated by two NUM_TYPE_SUFFIXES - while the list of predication types is terminated by NUM_PREDS. - The list of type suffixes is lexicographically ordered based - on the index value. */ + The list of group suffixes is terminated by NUM_GROUP_SUFFIXES + and the list of predication types is terminated by NUM_PREDS. */ const type_suffix_pair *types; + const group_suffix_index *groups; const predication_index *preds; /* The architecture extensions that the functions require, as a set of @@ -273,7 +301,8 @@ class GTY((user)) function_instance public: function_instance (const char *, const function_base *, const function_shape *, mode_suffix_index, - const type_suffix_pair &, predication_index); + const type_suffix_pair &, group_suffix_index, + predication_index); bool operator== (const function_instance &) const; bool operator!= (const function_instance &) const; @@ -294,6 +323,8 @@ public: units_index displacement_units () const; const type_suffix_info &type_suffix (unsigned int) const; + const group_suffix_info &group_suffix () const; + tree scalar_type (unsigned int) const; tree vector_type (unsigned int) const; tree tuple_type (unsigned int) const; @@ -301,14 +332,14 @@ public: machine_mode vector_mode (unsigned int) const; machine_mode gp_mode (unsigned int) const; - /* The properties of the function. (The explicit "enum"s are required - for gengtype.) */ + /* The properties of the function. */ const char *base_name; const function_base *base; const function_shape *shape; - enum mode_suffix_index mode_suffix_id; + mode_suffix_index mode_suffix_id; type_suffix_pair type_suffix_ids; - enum predication_index pred; + group_suffix_index group_suffix_id; + predication_index pred; }; class registered_function; @@ -390,10 +421,12 @@ public: tree report_no_such_form (type_suffix_index); tree lookup_form (mode_suffix_index, type_suffix_index = NUM_TYPE_SUFFIXES, - type_suffix_index = NUM_TYPE_SUFFIXES); + type_suffix_index = NUM_TYPE_SUFFIXES, + group_suffix_index = GROUP_none); tree resolve_to (mode_suffix_index, type_suffix_index = NUM_TYPE_SUFFIXES, - type_suffix_index = NUM_TYPE_SUFFIXES); + type_suffix_index = NUM_TYPE_SUFFIXES, + group_suffix_index = GROUP_none); type_suffix_index infer_integer_scalar_type (unsigned int); type_suffix_index infer_pointer_type (unsigned int, bool = false); @@ -641,6 +674,11 @@ class function_shape public: virtual bool explicit_type_suffix_p (unsigned int) const = 0; + /* True if the group suffix is present in overloaded names. + This isn't meaningful for pre-SME intrinsics, and true is + more common than false, so provide a default definition. */ + virtual bool explicit_group_suffix_p () const { return true; } + /* Define all functions associated with the given group. */ virtual void build (function_builder &, const function_group_info &) const = 0; @@ -669,6 +707,7 @@ private: extern const type_suffix_info type_suffixes[NUM_TYPE_SUFFIXES + 1]; extern const mode_suffix_info mode_suffixes[MODE_none + 1]; +extern const group_suffix_info group_suffixes[NUM_GROUP_SUFFIXES]; extern tree scalar_types[NUM_VECTOR_TYPES]; extern tree acle_vector_types[MAX_TUPLE_SIZE][NUM_VECTOR_TYPES + 1]; @@ -728,9 +767,11 @@ function_instance (const char *base_name_in, const function_shape *shape_in, mode_suffix_index mode_suffix_id_in, const type_suffix_pair &type_suffix_ids_in, + group_suffix_index group_suffix_id_in, predication_index pred_in) : base_name (base_name_in), base (base_in), shape (shape_in), - mode_suffix_id (mode_suffix_id_in), pred (pred_in) + mode_suffix_id (mode_suffix_id_in), group_suffix_id (group_suffix_id_in), + pred (pred_in) { memcpy (type_suffix_ids, type_suffix_ids_in, sizeof (type_suffix_ids)); } @@ -741,9 +782,10 @@ function_instance::operator== (const function_instance &other) const return (base == other.base && shape == other.shape && mode_suffix_id == other.mode_suffix_id - && pred == other.pred && type_suffix_ids[0] == other.type_suffix_ids[0] - && type_suffix_ids[1] == other.type_suffix_ids[1]); + && type_suffix_ids[1] == other.type_suffix_ids[1] + && group_suffix_id == other.group_suffix_id + && pred == other.pred); } inline bool @@ -815,6 +857,13 @@ function_instance::type_suffix (unsigned int i) const return type_suffixes[type_suffix_ids[i]]; } +/* Return information about the function's group suffix. */ +inline const group_suffix_info & +function_instance::group_suffix () const +{ + return group_suffixes[group_suffix_id]; +} + /* Return the scalar type associated with type suffix I. */ inline tree function_instance::scalar_type (unsigned int i) const From patchwork Fri Nov 17 17:25:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865156 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3hP5jPbz1yS7 for ; Sat, 18 Nov 2023 04:25:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4AA5B3857026 for ; Fri, 17 Nov 2023 17:25:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1EC26387093E for ; Fri, 17 Nov 2023 17:25:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1EC26387093E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1EC26387093E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241926; cv=none; b=sco7Rktws4wIwggFd+0daeLD6izwwCq6dW9GSG/VAbwfyxJiPImW+uCQH7YNbcTNivI9tXYpqIwMqgcOBkGtm5/WyrlPv4lN2v1JVvhO9O27Ha9A2DLKknTJE16i4qpL9SDcU6lPuIMU09TFoo5lNcnQ79kci0bej69FLLa6dpI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241926; c=relaxed/simple; bh=mroMjwjaQlZBOSpX0lsCiV6Jfg1ujo3tmuc92qmg3EQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=haGM6Z8xF9kGyw0ShgPbYm+ne2lpWRPSc+QvXNjVJ6UWyeOrySpBvrjivAIVgAvjJPmavKqALwQrg5/Ia1ul2HRzDcZ+mVREPga3wAfDiI3lwWWdn2ylHWOWVloC5UAi8IA5gnh2DzAqw/7yHV0tyzoejCo3IYWoe+cLVKmXkr8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DAC231477 for ; Fri, 17 Nov 2023 09:26:05 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3FBBD3F73F for ; Fri, 17 Nov 2023 09:25:19 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 06/21] aarch64: Add tuple forms of svreinterpret References: Date: Fri, 17 Nov 2023 17:25:18 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-23.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org SME2 adds a number of intrinsics that operate on tuples of 2 and 4 vectors. The ACLE therefore extends the existing svreinterpret intrinsics to handle tuples as well. gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svreinterpret_impl::fold): Punt on tuple forms. (svreinterpret_impl::expand): Use tuple_mode instead of vector_mode. * config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Extend to x1234 groups. * config/aarch64/aarch64-sve-builtins-functions.h (multi_vector_function::vectors_per_tuple): If the function has a group suffix, get the number of vectors from there. * config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare. * config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def) (reinterpret): New function shape. * config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle DEF_SVE_FUNCTION_GS. (function_resolver::infer_vector_type_and_group_suffix): New function. * config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New macro. (DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default. * config/aarch64/aarch64-sve-builtins.h (function_instance::tuple_mode): New member function. (function_resolver::infer_vector_type_and_group_suffix): Likewise. (function_base::vectors_per_tuple): Take the function instance as argument and get the number from the group suffix. (function_instance::vectors_per_tuple): Update accordingly. * config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4) (SVE_ALL_STRUCT): New mode iterators. (SVE_STRUCT): Redefine in terms of SVE_FULL*. * config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret) (*aarch64_sve_reinterpret): Extend to SVE structure modes. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN): New macro. * gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for tuple forms. * gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise. --- .../aarch64/aarch64-sve-builtins-base.cc | 5 +- .../aarch64/aarch64-sve-builtins-base.def | 2 +- .../aarch64/aarch64-sve-builtins-functions.h | 7 ++- .../aarch64/aarch64-sve-builtins-shapes.cc | 30 +++++++++ .../aarch64/aarch64-sve-builtins-shapes.h | 1 + gcc/config/aarch64/aarch64-sve-builtins.cc | 52 +++++++++++++++- gcc/config/aarch64/aarch64-sve-builtins.def | 8 ++- gcc/config/aarch64/aarch64-sve-builtins.h | 23 ++++++- gcc/config/aarch64/aarch64-sve.md | 8 +-- gcc/config/aarch64/iterators.md | 26 +++++--- .../aarch64/sve/acle/asm/reinterpret_bf16.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_f16.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_f32.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_f64.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_s16.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_s32.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_s64.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_s8.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_u16.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_u32.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_u64.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/reinterpret_u8.c | 62 +++++++++++++++++++ .../aarch64/sve/acle/asm/test_sve_acle.h | 14 +++++ 23 files changed, 900 insertions(+), 20 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index b84e245eb3e..5b75b903e5f 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -2161,6 +2161,9 @@ public: gimple * fold (gimple_folder &f) const override { + if (f.vectors_per_tuple () > 1) + return NULL; + /* Punt to rtl if the effect of the reinterpret on registers does not conform to GCC's endianness model. */ if (!targetm.can_change_mode_class (f.vector_mode (0), @@ -2177,7 +2180,7 @@ public: rtx expand (function_expander &e) const override { - machine_mode mode = e.vector_mode (0); + machine_mode mode = e.tuple_mode (0); return e.use_exact_insn (code_for_aarch64_sve_reinterpret (mode)); } }; diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def index 0484863d3f7..4e31f67ac47 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.def +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def @@ -248,7 +248,7 @@ DEF_SVE_FUNCTION (svrdffr, rdffr, none, z_or_none) DEF_SVE_FUNCTION (svrecpe, unary, all_float, none) DEF_SVE_FUNCTION (svrecps, binary, all_float, none) DEF_SVE_FUNCTION (svrecpx, unary, all_float, mxz) -DEF_SVE_FUNCTION (svreinterpret, unary_convert, reinterpret, none) +DEF_SVE_FUNCTION_GS (svreinterpret, reinterpret, reinterpret, x1234, none) DEF_SVE_FUNCTION (svrev, unary, all_data, none) DEF_SVE_FUNCTION (svrev, unary_pred, all_pred, none) DEF_SVE_FUNCTION (svrevb, unary, hsd_integer, mxz) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h index 2729877d914..4a10102038a 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h @@ -48,8 +48,13 @@ public: : m_vectors_per_tuple (vectors_per_tuple) {} unsigned int - vectors_per_tuple () const override + vectors_per_tuple (const function_instance &fi) const override { + if (fi.group_suffix_id != GROUP_none) + { + gcc_checking_assert (m_vectors_per_tuple == 1); + return fi.group_suffix ().vectors_per_tuple; + } return m_vectors_per_tuple; } diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc index dc255fc59f2..aa5dbb5df9d 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc @@ -2400,6 +2400,36 @@ struct reduction_wide_def : public overloaded_base<0> }; SHAPE (reduction_wide) +/* svx_t svfoo_t0[_t1_g](svx_t) + + where the target type must be specified explicitly but the source + type can be inferred. */ +struct reinterpret_def : public overloaded_base<1> +{ + bool explicit_group_suffix_p () const override { return false; } + + void + build (function_builder &b, const function_group_info &group) const override + { + b.add_overloaded_functions (group, MODE_none); + build_all (b, "t0,t1", group, MODE_none); + } + + tree + resolve (function_resolver &r) const override + { + type_suffix_index type; + group_suffix_index group; + if (!r.check_num_arguments (1) + || !r.infer_vector_type_and_group_suffix (0, &type, &group)) + return error_mark_node; + + return r.resolve_to (r.mode_suffix_id, r.type_suffix_ids[0], + type, group); + } +}; +SHAPE (reinterpret) + /* svxN_t svfoo[_t0](svxN_t, uint64_t, sv_t) where the second argument is an integer constant expression in the diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h index 7483c1d04b8..38d494761ae 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.h +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.h @@ -133,6 +133,7 @@ namespace aarch64_sve extern const function_shape *const rdffr; extern const function_shape *const reduction; extern const function_shape *const reduction_wide; + extern const function_shape *const reinterpret; extern const function_shape *const set; extern const function_shape *const setffr; extern const function_shape *const shift_left_imm_long; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 196534df61e..ced3fcfafdf 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -494,6 +494,10 @@ static const group_suffix_index groups_none[] = { GROUP_none, NUM_GROUP_SUFFIXES }; +static const group_suffix_index groups_x1234[] = { + GROUP_none, GROUP_x2, GROUP_x3, GROUP_x4, NUM_GROUP_SUFFIXES +}; + /* Used by functions that have no governing predicate. */ static const predication_index preds_none[] = { PRED_none, NUM_PREDS }; @@ -534,8 +538,8 @@ static const predication_index preds_z[] = { PRED_z, NUM_PREDS }; /* A list of all SVE ACLE functions. */ static CONSTEXPR const function_group_info function_groups[] = { -#define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ - { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, groups_none, \ +#define DEF_SVE_FUNCTION_GS(NAME, SHAPE, TYPES, GROUPS, PREDS) \ + { #NAME, &functions::NAME, &shapes::SHAPE, types_##TYPES, groups_##GROUPS, \ preds_##PREDS, REQUIRED_EXTENSIONS }, #include "aarch64-sve-builtins.def" }; @@ -1485,6 +1489,50 @@ function_resolver::infer_tuple_type (unsigned int argno) return infer_vector_or_tuple_type (argno, vectors_per_tuple ()); } +/* Require argument ARGNO to be a single vector or a tuple, inferring both + the vector element type and the number of vectors in a tuple. Return true + on success, storing the type suffix in *TYPE_OUT and the group suffix + in *GROUP_OUT. Report an error and return false on failure. */ +bool +function_resolver:: +infer_vector_type_and_group_suffix (unsigned int argno, + type_suffix_index *type_out, + group_suffix_index *group_out) +{ + tree actual = get_argument_type (argno); + if (actual == error_mark_node) + return false; + + /* A linear search should be OK here, since the code isn't hot and + the number of types is only small. */ + for (unsigned int size_i = 0; size_i < MAX_TUPLE_SIZE; ++size_i) + for (unsigned int suffix_i = 0; suffix_i < NUM_TYPE_SUFFIXES; ++suffix_i) + { + vector_type_index type_i = type_suffixes[suffix_i].vector_type; + tree type = acle_vector_types[size_i][type_i]; + if (type && matches_type_p (type, actual)) + { + if (size_i == 0) + *group_out = GROUP_none; + else if (size_i == 1) + *group_out = GROUP_x2; + else if (size_i == 2) + *group_out = GROUP_x3; + else if (size_i == 3) + *group_out = GROUP_x4; + else + gcc_unreachable (); + *type_out = type_suffix_index (suffix_i); + return true; + } + } + + error_at (location, "passing %qT to argument %d of %qE, which" + " expects an SVE vector or tuple type", + actual, argno + 1, fndecl); + return false; +} + /* Require argument ARGNO to be a vector or scalar argument. Return true if it is, otherwise report an appropriate error. */ bool diff --git a/gcc/config/aarch64/aarch64-sve-builtins.def b/gcc/config/aarch64/aarch64-sve-builtins.def index 5fbd486d74e..14d12f07415 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.def +++ b/gcc/config/aarch64/aarch64-sve-builtins.def @@ -33,8 +33,13 @@ #define DEF_SVE_GROUP_SUFFIX(A, B, C) #endif +#ifndef DEF_SVE_FUNCTION_GS +#define DEF_SVE_FUNCTION_GS(A, B, C, D, E) +#endif + #ifndef DEF_SVE_FUNCTION -#define DEF_SVE_FUNCTION(A, B, C, D) +#define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \ + DEF_SVE_FUNCTION_GS (NAME, SHAPE, TYPES, none, PREDS) #endif DEF_SVE_MODE (n, none, none, none) @@ -107,6 +112,7 @@ DEF_SVE_GROUP_SUFFIX (x4, 0, 4) #include "aarch64-sve-builtins-sve2.def" #undef DEF_SVE_FUNCTION +#undef DEF_SVE_FUNCTION_GS #undef DEF_SVE_GROUP_SUFFIX #undef DEF_SVE_TYPE_SUFFIX #undef DEF_SVE_TYPE diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index a861e22ae6c..981a57d82d2 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -330,6 +330,7 @@ public: tree tuple_type (unsigned int) const; unsigned int elements_per_vq (unsigned int i) const; machine_mode vector_mode (unsigned int) const; + machine_mode tuple_mode (unsigned int) const; machine_mode gp_mode (unsigned int) const; /* The properties of the function. */ @@ -436,6 +437,9 @@ public: type_suffix_index infer_unsigned_vector_type (unsigned int); type_suffix_index infer_sd_vector_type (unsigned int); type_suffix_index infer_tuple_type (unsigned int); + bool infer_vector_type_and_group_suffix (unsigned int, + type_suffix_index *, + group_suffix_index *); bool require_vector_or_scalar_type (unsigned int); @@ -627,7 +631,7 @@ public: /* If the function operates on tuples of vectors, return the number of vectors in the tuples, otherwise return 1. */ - virtual unsigned int vectors_per_tuple () const { return 1; } + virtual unsigned int vectors_per_tuple (const function_instance &) const; /* If the function addresses memory, return the type of a single scalar memory element. */ @@ -799,7 +803,7 @@ function_instance::operator!= (const function_instance &other) const inline unsigned int function_instance::vectors_per_tuple () const { - return base->vectors_per_tuple (); + return base->vectors_per_tuple (*this); } /* If the function addresses memory, return the type of a single @@ -903,6 +907,15 @@ function_instance::vector_mode (unsigned int i) const return type_suffix (i).vector_mode; } +/* Return the mode of tuple_type (I). */ +inline machine_mode +function_instance::tuple_mode (unsigned int i) const +{ + if (group_suffix ().vectors_per_tuple > 1) + return TYPE_MODE (tuple_type (i)); + return vector_mode (i); +} + /* Return the mode of the governing predicate to use when operating on type suffix I. */ inline machine_mode @@ -929,6 +942,12 @@ function_base::call_properties (const function_instance &instance) const return flags; } +inline unsigned int +function_base::vectors_per_tuple (const function_instance &instance) const +{ + return instance.group_suffix ().vectors_per_tuple; +} + /* Return the mode of the result of a call. */ inline machine_mode function_expander::result_mode () const diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index cfadac4f1be..e9cebffe3e0 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -787,8 +787,8 @@ (define_insn_and_split "*aarch64_sve_mov_subreg_be" ;; This is equivalent to a subreg on little-endian targets but not for ;; big-endian; see the comment at the head of the file for details. (define_expand "@aarch64_sve_reinterpret" - [(set (match_operand:SVE_ALL 0 "register_operand") - (unspec:SVE_ALL + [(set (match_operand:SVE_ALL_STRUCT 0 "register_operand") + (unspec:SVE_ALL_STRUCT [(match_operand 1 "aarch64_any_register_operand")] UNSPEC_REINTERPRET))] "TARGET_SVE" @@ -805,8 +805,8 @@ (define_expand "@aarch64_sve_reinterpret" ;; A pattern for handling type punning on big-endian targets. We use a ;; special predicate for operand 1 to reduce the number of patterns. (define_insn_and_split "*aarch64_sve_reinterpret" - [(set (match_operand:SVE_ALL 0 "register_operand" "=w") - (unspec:SVE_ALL + [(set (match_operand:SVE_ALL_STRUCT 0 "register_operand" "=w") + (unspec:SVE_ALL_STRUCT [(match_operand 1 "aarch64_any_register_operand" "w")] UNSPEC_REINTERPRET))] "TARGET_SVE" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index a920de99ffc..e7aa7e35ae1 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -430,14 +430,6 @@ (define_mode_iterator VNx4SF_ONLY [VNx4SF]) (define_mode_iterator VNx2DI_ONLY [VNx2DI]) (define_mode_iterator VNx2DF_ONLY [VNx2DF]) -;; All SVE vector structure modes. -(define_mode_iterator SVE_STRUCT [VNx32QI VNx16HI VNx8SI VNx4DI - VNx16BF VNx16HF VNx8SF VNx4DF - VNx48QI VNx24HI VNx12SI VNx6DI - VNx24BF VNx24HF VNx12SF VNx6DF - VNx64QI VNx32HI VNx16SI VNx8DI - VNx32BF VNx32HF VNx16SF VNx8DF]) - ;; All fully-packed SVE vector modes. (define_mode_iterator SVE_FULL [VNx16QI VNx8HI VNx4SI VNx2DI VNx8BF VNx8HF VNx4SF VNx2DF]) @@ -509,6 +501,24 @@ (define_mode_iterator SVE_ALL [VNx16QI VNx8QI VNx4QI VNx2QI VNx2DI VNx2DF]) +;; All SVE 2-vector modes. +(define_mode_iterator SVE_FULLx2 [VNx32QI VNx16HI VNx8SI VNx4DI + VNx16BF VNx16HF VNx8SF VNx4DF]) + +;; All SVE 3-vector modes. +(define_mode_iterator SVE_FULLx3 [VNx48QI VNx24HI VNx12SI VNx6DI + VNx24BF VNx24HF VNx12SF VNx6DF]) + +;; All SVE 4-vector modes. +(define_mode_iterator SVE_FULLx4 [VNx64QI VNx32HI VNx16SI VNx8DI + VNx32BF VNx32HF VNx16SF VNx8DF]) + +;; All SVE vector structure modes. +(define_mode_iterator SVE_STRUCT [SVE_FULLx2 SVE_FULLx3 SVE_FULLx4]) + +;; All SVE vector and structure modes. +(define_mode_iterator SVE_ALL_STRUCT [SVE_ALL SVE_STRUCT]) + ;; All SVE integer vector modes. (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI VNx8HI VNx4HI VNx2HI diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c index 2d2c2a714b9..dd0daf2eff0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_bf16_u64_tied1, svbfloat16_t, svuint64_t, TEST_DUAL_Z (reinterpret_bf16_u64_untied, svbfloat16_t, svuint64_t, z0 = svreinterpret_bf16_u64 (z4), z0 = svreinterpret_bf16 (z4)) + +/* +** reinterpret_bf16_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_bf16_bf16_x2_tied1, svbfloat16x2_t, svbfloat16x2_t, + z0_res = svreinterpret_bf16_bf16_x2 (z0), + z0_res = svreinterpret_bf16 (z0)) + +/* +** reinterpret_bf16_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_bf16_f32_x2_untied, svbfloat16x2_t, svfloat32x2_t, z0, + svreinterpret_bf16_f32_x2 (z4), + svreinterpret_bf16 (z4)) + +/* +** reinterpret_bf16_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_bf16_s64_x3_tied1, svbfloat16x3_t, svint64x3_t, + z0_res = svreinterpret_bf16_s64_x3 (z0), + z0_res = svreinterpret_bf16 (z0)) + +/* +** reinterpret_bf16_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_bf16_u8_x3_untied, svbfloat16x3_t, svuint8x3_t, z18, + svreinterpret_bf16_u8_x3 (z23), + svreinterpret_bf16 (z23)) + +/* +** reinterpret_bf16_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_bf16_u32_x4_tied1, svbfloat16x4_t, svuint32x4_t, + z0_res = svreinterpret_bf16_u32_x4 (z0), + z0_res = svreinterpret_bf16 (z0)) + +/* +** reinterpret_bf16_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_bf16_f64_x4_untied, svbfloat16x4_t, svfloat64x4_t, z28, + svreinterpret_bf16_f64_x4 (z4), + svreinterpret_bf16 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c index 60705e62879..9b6f8227d2a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_f16_u64_tied1, svfloat16_t, svuint64_t, TEST_DUAL_Z (reinterpret_f16_u64_untied, svfloat16_t, svuint64_t, z0 = svreinterpret_f16_u64 (z4), z0 = svreinterpret_f16 (z4)) + +/* +** reinterpret_f16_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f16_bf16_x2_tied1, svfloat16x2_t, svbfloat16x2_t, + z0_res = svreinterpret_f16_bf16_x2 (z0), + z0_res = svreinterpret_f16 (z0)) + +/* +** reinterpret_f16_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_f16_f32_x2_untied, svfloat16x2_t, svfloat32x2_t, z0, + svreinterpret_f16_f32_x2 (z4), + svreinterpret_f16 (z4)) + +/* +** reinterpret_f16_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f16_s64_x3_tied1, svfloat16x3_t, svint64x3_t, + z0_res = svreinterpret_f16_s64_x3 (z0), + z0_res = svreinterpret_f16 (z0)) + +/* +** reinterpret_f16_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f16_u8_x3_untied, svfloat16x3_t, svuint8x3_t, z18, + svreinterpret_f16_u8_x3 (z23), + svreinterpret_f16 (z23)) + +/* +** reinterpret_f16_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f16_u32_x4_tied1, svfloat16x4_t, svuint32x4_t, + z0_res = svreinterpret_f16_u32_x4 (z0), + z0_res = svreinterpret_f16 (z0)) + +/* +** reinterpret_f16_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f16_f64_x4_untied, svfloat16x4_t, svfloat64x4_t, z28, + svreinterpret_f16_f64_x4 (z4), + svreinterpret_f16 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c index 06fc46f25de..ce981fce9d8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_f32_u64_tied1, svfloat32_t, svuint64_t, TEST_DUAL_Z (reinterpret_f32_u64_untied, svfloat32_t, svuint64_t, z0 = svreinterpret_f32_u64 (z4), z0 = svreinterpret_f32 (z4)) + +/* +** reinterpret_f32_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f32_bf16_x2_tied1, svfloat32x2_t, svbfloat16x2_t, + z0_res = svreinterpret_f32_bf16_x2 (z0), + z0_res = svreinterpret_f32 (z0)) + +/* +** reinterpret_f32_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_f32_f32_x2_untied, svfloat32x2_t, svfloat32x2_t, z0, + svreinterpret_f32_f32_x2 (z4), + svreinterpret_f32 (z4)) + +/* +** reinterpret_f32_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f32_s64_x3_tied1, svfloat32x3_t, svint64x3_t, + z0_res = svreinterpret_f32_s64_x3 (z0), + z0_res = svreinterpret_f32 (z0)) + +/* +** reinterpret_f32_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f32_u8_x3_untied, svfloat32x3_t, svuint8x3_t, z18, + svreinterpret_f32_u8_x3 (z23), + svreinterpret_f32 (z23)) + +/* +** reinterpret_f32_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f32_u32_x4_tied1, svfloat32x4_t, svuint32x4_t, + z0_res = svreinterpret_f32_u32_x4 (z0), + z0_res = svreinterpret_f32 (z0)) + +/* +** reinterpret_f32_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f32_f64_x4_untied, svfloat32x4_t, svfloat64x4_t, z28, + svreinterpret_f32_f64_x4 (z4), + svreinterpret_f32 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c index 003ee3fe220..4f51824ab7e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_f64_u64_tied1, svfloat64_t, svuint64_t, TEST_DUAL_Z (reinterpret_f64_u64_untied, svfloat64_t, svuint64_t, z0 = svreinterpret_f64_u64 (z4), z0 = svreinterpret_f64 (z4)) + +/* +** reinterpret_f64_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f64_bf16_x2_tied1, svfloat64x2_t, svbfloat16x2_t, + z0_res = svreinterpret_f64_bf16_x2 (z0), + z0_res = svreinterpret_f64 (z0)) + +/* +** reinterpret_f64_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_f64_f32_x2_untied, svfloat64x2_t, svfloat32x2_t, z0, + svreinterpret_f64_f32_x2 (z4), + svreinterpret_f64 (z4)) + +/* +** reinterpret_f64_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f64_s64_x3_tied1, svfloat64x3_t, svint64x3_t, + z0_res = svreinterpret_f64_s64_x3 (z0), + z0_res = svreinterpret_f64 (z0)) + +/* +** reinterpret_f64_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f64_u8_x3_untied, svfloat64x3_t, svuint8x3_t, z18, + svreinterpret_f64_u8_x3 (z23), + svreinterpret_f64 (z23)) + +/* +** reinterpret_f64_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_f64_u32_x4_tied1, svfloat64x4_t, svuint32x4_t, + z0_res = svreinterpret_f64_u32_x4 (z0), + z0_res = svreinterpret_f64 (z0)) + +/* +** reinterpret_f64_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_f64_f64_x4_untied, svfloat64x4_t, svfloat64x4_t, z28, + svreinterpret_f64_f64_x4 (z4), + svreinterpret_f64 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c index d62817c2cac..7e15f3e9bd3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_s16_u64_tied1, svint16_t, svuint64_t, TEST_DUAL_Z (reinterpret_s16_u64_untied, svint16_t, svuint64_t, z0 = svreinterpret_s16_u64 (z4), z0 = svreinterpret_s16 (z4)) + +/* +** reinterpret_s16_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s16_bf16_x2_tied1, svint16x2_t, svbfloat16x2_t, + z0_res = svreinterpret_s16_bf16_x2 (z0), + z0_res = svreinterpret_s16 (z0)) + +/* +** reinterpret_s16_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_s16_f32_x2_untied, svint16x2_t, svfloat32x2_t, z0, + svreinterpret_s16_f32_x2 (z4), + svreinterpret_s16 (z4)) + +/* +** reinterpret_s16_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s16_s64_x3_tied1, svint16x3_t, svint64x3_t, + z0_res = svreinterpret_s16_s64_x3 (z0), + z0_res = svreinterpret_s16 (z0)) + +/* +** reinterpret_s16_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s16_u8_x3_untied, svint16x3_t, svuint8x3_t, z18, + svreinterpret_s16_u8_x3 (z23), + svreinterpret_s16 (z23)) + +/* +** reinterpret_s16_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s16_u32_x4_tied1, svint16x4_t, svuint32x4_t, + z0_res = svreinterpret_s16_u32_x4 (z0), + z0_res = svreinterpret_s16 (z0)) + +/* +** reinterpret_s16_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s16_f64_x4_untied, svint16x4_t, svfloat64x4_t, z28, + svreinterpret_s16_f64_x4 (z4), + svreinterpret_s16 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c index e1068f244ed..60da8aef333 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_s32_u64_tied1, svint32_t, svuint64_t, TEST_DUAL_Z (reinterpret_s32_u64_untied, svint32_t, svuint64_t, z0 = svreinterpret_s32_u64 (z4), z0 = svreinterpret_s32 (z4)) + +/* +** reinterpret_s32_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s32_bf16_x2_tied1, svint32x2_t, svbfloat16x2_t, + z0_res = svreinterpret_s32_bf16_x2 (z0), + z0_res = svreinterpret_s32 (z0)) + +/* +** reinterpret_s32_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_s32_f32_x2_untied, svint32x2_t, svfloat32x2_t, z0, + svreinterpret_s32_f32_x2 (z4), + svreinterpret_s32 (z4)) + +/* +** reinterpret_s32_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s32_s64_x3_tied1, svint32x3_t, svint64x3_t, + z0_res = svreinterpret_s32_s64_x3 (z0), + z0_res = svreinterpret_s32 (z0)) + +/* +** reinterpret_s32_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s32_u8_x3_untied, svint32x3_t, svuint8x3_t, z18, + svreinterpret_s32_u8_x3 (z23), + svreinterpret_s32 (z23)) + +/* +** reinterpret_s32_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s32_u32_x4_tied1, svint32x4_t, svuint32x4_t, + z0_res = svreinterpret_s32_u32_x4 (z0), + z0_res = svreinterpret_s32 (z0)) + +/* +** reinterpret_s32_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s32_f64_x4_untied, svint32x4_t, svfloat64x4_t, z28, + svreinterpret_s32_f64_x4 (z4), + svreinterpret_s32 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c index cada7533c53..d705c60dfd7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_s64_u64_tied1, svint64_t, svuint64_t, TEST_DUAL_Z (reinterpret_s64_u64_untied, svint64_t, svuint64_t, z0 = svreinterpret_s64_u64 (z4), z0 = svreinterpret_s64 (z4)) + +/* +** reinterpret_s64_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s64_bf16_x2_tied1, svint64x2_t, svbfloat16x2_t, + z0_res = svreinterpret_s64_bf16_x2 (z0), + z0_res = svreinterpret_s64 (z0)) + +/* +** reinterpret_s64_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_s64_f32_x2_untied, svint64x2_t, svfloat32x2_t, z0, + svreinterpret_s64_f32_x2 (z4), + svreinterpret_s64 (z4)) + +/* +** reinterpret_s64_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s64_s64_x3_tied1, svint64x3_t, svint64x3_t, + z0_res = svreinterpret_s64_s64_x3 (z0), + z0_res = svreinterpret_s64 (z0)) + +/* +** reinterpret_s64_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s64_u8_x3_untied, svint64x3_t, svuint8x3_t, z18, + svreinterpret_s64_u8_x3 (z23), + svreinterpret_s64 (z23)) + +/* +** reinterpret_s64_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s64_u32_x4_tied1, svint64x4_t, svuint32x4_t, + z0_res = svreinterpret_s64_u32_x4 (z0), + z0_res = svreinterpret_s64 (z0)) + +/* +** reinterpret_s64_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s64_f64_x4_untied, svint64x4_t, svfloat64x4_t, z28, + svreinterpret_s64_f64_x4 (z4), + svreinterpret_s64 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c index 23a40d0bab7..ab90a54d746 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_s8_u64_tied1, svint8_t, svuint64_t, TEST_DUAL_Z (reinterpret_s8_u64_untied, svint8_t, svuint64_t, z0 = svreinterpret_s8_u64 (z4), z0 = svreinterpret_s8 (z4)) + +/* +** reinterpret_s8_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s8_bf16_x2_tied1, svint8x2_t, svbfloat16x2_t, + z0_res = svreinterpret_s8_bf16_x2 (z0), + z0_res = svreinterpret_s8 (z0)) + +/* +** reinterpret_s8_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_s8_f32_x2_untied, svint8x2_t, svfloat32x2_t, z0, + svreinterpret_s8_f32_x2 (z4), + svreinterpret_s8 (z4)) + +/* +** reinterpret_s8_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s8_s64_x3_tied1, svint8x3_t, svint64x3_t, + z0_res = svreinterpret_s8_s64_x3 (z0), + z0_res = svreinterpret_s8 (z0)) + +/* +** reinterpret_s8_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s8_u8_x3_untied, svint8x3_t, svuint8x3_t, z18, + svreinterpret_s8_u8_x3 (z23), + svreinterpret_s8 (z23)) + +/* +** reinterpret_s8_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_s8_u32_x4_tied1, svint8x4_t, svuint32x4_t, + z0_res = svreinterpret_s8_u32_x4 (z0), + z0_res = svreinterpret_s8 (z0)) + +/* +** reinterpret_s8_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_s8_f64_x4_untied, svint8x4_t, svfloat64x4_t, z28, + svreinterpret_s8_f64_x4 (z4), + svreinterpret_s8 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c index 48e8ecaff44..fcfc0eb9da5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_u16_u64_tied1, svuint16_t, svuint64_t, TEST_DUAL_Z (reinterpret_u16_u64_untied, svuint16_t, svuint64_t, z0 = svreinterpret_u16_u64 (z4), z0 = svreinterpret_u16 (z4)) + +/* +** reinterpret_u16_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u16_bf16_x2_tied1, svuint16x2_t, svbfloat16x2_t, + z0_res = svreinterpret_u16_bf16_x2 (z0), + z0_res = svreinterpret_u16 (z0)) + +/* +** reinterpret_u16_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_u16_f32_x2_untied, svuint16x2_t, svfloat32x2_t, z0, + svreinterpret_u16_f32_x2 (z4), + svreinterpret_u16 (z4)) + +/* +** reinterpret_u16_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u16_s64_x3_tied1, svuint16x3_t, svint64x3_t, + z0_res = svreinterpret_u16_s64_x3 (z0), + z0_res = svreinterpret_u16 (z0)) + +/* +** reinterpret_u16_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u16_u8_x3_untied, svuint16x3_t, svuint8x3_t, z18, + svreinterpret_u16_u8_x3 (z23), + svreinterpret_u16 (z23)) + +/* +** reinterpret_u16_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u16_u32_x4_tied1, svuint16x4_t, svuint32x4_t, + z0_res = svreinterpret_u16_u32_x4 (z0), + z0_res = svreinterpret_u16 (z0)) + +/* +** reinterpret_u16_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u16_f64_x4_untied, svuint16x4_t, svfloat64x4_t, z28, + svreinterpret_u16_f64_x4 (z4), + svreinterpret_u16 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c index 1d4e857120e..6d7e05857fe 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_u32_u64_tied1, svuint32_t, svuint64_t, TEST_DUAL_Z (reinterpret_u32_u64_untied, svuint32_t, svuint64_t, z0 = svreinterpret_u32_u64 (z4), z0 = svreinterpret_u32 (z4)) + +/* +** reinterpret_u32_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u32_bf16_x2_tied1, svuint32x2_t, svbfloat16x2_t, + z0_res = svreinterpret_u32_bf16_x2 (z0), + z0_res = svreinterpret_u32 (z0)) + +/* +** reinterpret_u32_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_u32_f32_x2_untied, svuint32x2_t, svfloat32x2_t, z0, + svreinterpret_u32_f32_x2 (z4), + svreinterpret_u32 (z4)) + +/* +** reinterpret_u32_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u32_s64_x3_tied1, svuint32x3_t, svint64x3_t, + z0_res = svreinterpret_u32_s64_x3 (z0), + z0_res = svreinterpret_u32 (z0)) + +/* +** reinterpret_u32_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u32_u8_x3_untied, svuint32x3_t, svuint8x3_t, z18, + svreinterpret_u32_u8_x3 (z23), + svreinterpret_u32 (z23)) + +/* +** reinterpret_u32_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u32_u32_x4_tied1, svuint32x4_t, svuint32x4_t, + z0_res = svreinterpret_u32_u32_x4 (z0), + z0_res = svreinterpret_u32 (z0)) + +/* +** reinterpret_u32_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u32_f64_x4_untied, svuint32x4_t, svfloat64x4_t, z28, + svreinterpret_u32_f64_x4 (z4), + svreinterpret_u32 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c index 07af69dce8d..55c0baefb6f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_u64_u64_tied1, svuint64_t, svuint64_t, TEST_DUAL_Z (reinterpret_u64_u64_untied, svuint64_t, svuint64_t, z0 = svreinterpret_u64_u64 (z4), z0 = svreinterpret_u64 (z4)) + +/* +** reinterpret_u64_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u64_bf16_x2_tied1, svuint64x2_t, svbfloat16x2_t, + z0_res = svreinterpret_u64_bf16_x2 (z0), + z0_res = svreinterpret_u64 (z0)) + +/* +** reinterpret_u64_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_u64_f32_x2_untied, svuint64x2_t, svfloat32x2_t, z0, + svreinterpret_u64_f32_x2 (z4), + svreinterpret_u64 (z4)) + +/* +** reinterpret_u64_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u64_s64_x3_tied1, svuint64x3_t, svint64x3_t, + z0_res = svreinterpret_u64_s64_x3 (z0), + z0_res = svreinterpret_u64 (z0)) + +/* +** reinterpret_u64_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u64_u8_x3_untied, svuint64x3_t, svuint8x3_t, z18, + svreinterpret_u64_u8_x3 (z23), + svreinterpret_u64 (z23)) + +/* +** reinterpret_u64_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u64_u32_x4_tied1, svuint64x4_t, svuint32x4_t, + z0_res = svreinterpret_u64_u32_x4 (z0), + z0_res = svreinterpret_u64 (z0)) + +/* +** reinterpret_u64_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u64_f64_x4_untied, svuint64x4_t, svfloat64x4_t, z28, + svreinterpret_u64_f64_x4 (z4), + svreinterpret_u64 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c index a4c7f4c8d21..f7302196162 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c @@ -205,3 +205,65 @@ TEST_DUAL_Z_REV (reinterpret_u8_u64_tied1, svuint8_t, svuint64_t, TEST_DUAL_Z (reinterpret_u8_u64_untied, svuint8_t, svuint64_t, z0 = svreinterpret_u8_u64 (z4), z0 = svreinterpret_u8 (z4)) + +/* +** reinterpret_u8_bf16_x2_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u8_bf16_x2_tied1, svuint8x2_t, svbfloat16x2_t, + z0_res = svreinterpret_u8_bf16_x2 (z0), + z0_res = svreinterpret_u8 (z0)) + +/* +** reinterpret_u8_f32_x2_untied: +** ( +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** | +** mov z0\.d, z4\.d +** mov z1\.d, z5\.d +** ) +** ret +*/ +TEST_DUAL_XN (reinterpret_u8_f32_x2_untied, svuint8x2_t, svfloat32x2_t, z0, + svreinterpret_u8_f32_x2 (z4), + svreinterpret_u8 (z4)) + +/* +** reinterpret_u8_s64_x3_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u8_s64_x3_tied1, svuint8x3_t, svint64x3_t, + z0_res = svreinterpret_u8_s64_x3 (z0), + z0_res = svreinterpret_u8 (z0)) + +/* +** reinterpret_u8_u8_x3_untied: +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** mov (z18|z19|z20)\.d, (z23|z24|z25)\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u8_u8_x3_untied, svuint8x3_t, svuint8x3_t, z18, + svreinterpret_u8_u8_x3 (z23), + svreinterpret_u8 (z23)) + +/* +** reinterpret_u8_u32_x4_tied1: +** ret +*/ +TEST_DUAL_Z_REV (reinterpret_u8_u32_x4_tied1, svuint8x4_t, svuint32x4_t, + z0_res = svreinterpret_u8_u32_x4 (z0), + z0_res = svreinterpret_u8 (z0)) + +/* +** reinterpret_u8_f64_x4_untied: +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** mov (z28|z29|z30|z31)\.d, z[4-7]\.d +** ret +*/ +TEST_DUAL_XN (reinterpret_u8_f64_x4_untied, svuint8x4_t, svfloat64x4_t, z28, + svreinterpret_u8_f64_x4 (z4), + svreinterpret_u8 (z4)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h index fbf392b3ed4..2da61ff5c0b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h @@ -421,4 +421,18 @@ return z0_res; \ } +#define TEST_DUAL_XN(NAME, TTYPE1, TTYPE2, RES, CODE1, CODE2) \ + PROTO (NAME, void, ()) \ + { \ + register TTYPE1 z0 __asm ("z0"); \ + register TTYPE2 z4 __asm ("z4"); \ + register TTYPE1 z18 __asm ("z18"); \ + register TTYPE2 z23 __asm ("z23"); \ + register TTYPE1 z28 __asm ("z28"); \ + __asm volatile ("" : "=w" (z0), "=w" (z4), "=w" (z18), \ + "=w" (z23), "=w" (z28)); \ + INVOKE (RES = CODE1, RES = CODE2); \ + __asm volatile ("" :: "w" (RES)); \ + } + #endif From patchwork Fri Nov 17 17:25:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865157 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3hd2czWz1yS7 for ; Sat, 18 Nov 2023 04:25:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD915385DC05 for ; Fri, 17 Nov 2023 17:25:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3A1E338432F8 for ; Fri, 17 Nov 2023 17:25:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3A1E338432F8 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3A1E338432F8 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241938; cv=none; b=ChKo+8Lch5MjlIs2tZP9p1KgZUFHgOD/kXfR3bc0C/qWr1qc9wPHQdQs6NV2XOVPNlIrYdlElw0NIP+l2wI1ajBi5P/FpjSrfp/biBKyQIumh6tOCiGd4AlrxyvqdJ7C3YTGmK4p6kfPilEbXl6KeoSooQ8iauVN6mhDnTm7Xe8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241938; c=relaxed/simple; bh=rLoKuBV1//o2EVpcSAkx9g77G3Vcoqq1OhDg/syea04=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=fJ1OZOw/Y+w6G3WjjqzyNvDjhsEnvSuyZYIk9i0vyotlLauj++qFejeKUlRqyD2VFSRqJNUueFCoSYl+dRzElkAXNbTY8ZrVsHLAg3SjIa9zJgk+Zfb4tJIaeacGCqqdfutRbPrr4d4SKu1gk9yGGlre278v8Oj2IO5uBaVngrg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 16A5C1477 for ; Fri, 17 Nov 2023 09:26:19 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6D52A3F73F for ; Fri, 17 Nov 2023 09:25:32 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 07/21] aarch64: Add arm_streaming(_compatible) attributes References: Date: Fri, 17 Nov 2023 17:25:31 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-21.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds support for recognising the SME arm::streaming and arm::streaming_compatible attributes. These attributes respectively describe whether the processor is definitely in "streaming mode" (PSTATE.SM==1), whether the processor is definitely not in streaming mode (PSTATE.SM==0), or whether we don't know at compile time either way. As far as the compiler is concerned, this effectively creates three ISA submodes: streaming mode enables things that are not available in non-streaming mode, non-streaming mode enables things that not available in streaming mode, and streaming-compatible mode has to stick to the common subset. This means that some instructions are conditional on PSTATE.SM==1 and some are conditional on PSTATE.SM==0. I wondered about recording the streaming state in a new variable. However, the set of available instructions is also influenced by PSTATE.ZA (added later), so I think it makes sense to view this as an instance of a more general mechanism. Also, keeping the PSTATE.SM state in the same flag variable as the other ISA features makes it possible to sum up the requirements of an ACLE function in a single value. The patch therefore adds a new set of feature flags called "ISA modes". Unlike the other two sets of flags (optional features and architecture- level features), these ISA modes are not controlled directly by command-line parameters or "target" attributes. arm::streaming and arm::streaming_compatible are function type attributes rather than function declaration attributes. This means that we need to find somewhere to copy the type information across to a function's target options. The patch does this in aarch64_set_current_function. We also need to record which ISA mode a callee expects/requires to be active on entry. (The same mode is then active on return.) The patch extends the current UNSPEC_CALLEE_ABI cookie to include this information, as well as the PCS variant that it recorded previously. The attributes can also be written __arm_streaming and __arm_streaming_compatible. This has two advantages: it triggers an error on compilers that don't understand the attributes, and it eases use on C, where [[...]] attributes were only added in C23. gcc/ * config/aarch64/aarch64-isa-modes.def: New file. * config/aarch64/aarch64.h: Include it in the feature enumerations. (AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants. (AARCH64_FL_DEFAULT_ISA_MODE): Likewise. (AARCH64_ISA_MODE): New macro. (CUMULATIVE_ARGS): Add an isa_mode field. * config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare. (aarch64_tlsdesc_abi_id): Return an arm_pcs. * config/aarch64/aarch64.cc (attr_streaming_exclusions) (aarch64_gnu_attributes, aarch64_gnu_attribute_table) (aarch64_arm_attributes, aarch64_arm_attribute_table): New tables. (aarch64_attribute_table): Redefine to include the gnu and arm attributes. (aarch64_fntype_pstate_sm, aarch64_fntype_isa_mode): New functions. (aarch64_fndecl_pstate_sm, aarch64_fndecl_isa_mode): Likewise. (aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise. (aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them. (aarch64_function_arg, aarch64_output_mi_thunk): Likewise. (aarch64_init_cumulative_args): Initialize the isa_mode field. (aarch64_output_mi_thunk): Use aarch64_gen_callee_cookie to get the ABI cookie. (aarch64_override_options): Add the ISA mode to the feature set. (aarch64_temporary_target::copy_from_fndecl): Likewise. (aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise. (aarch64_set_current_function): Maintain the correct ISA mode. (aarch64_tlsdesc_abi_id): Return an arm_pcs. (aarch64_comp_type_attributes): Handle arm::streaming and arm::streaming_compatible. * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Define __arm_streaming and __arm_streaming_compatible. * config/aarch64/aarch64.md (tlsdesc_small_): Use aarch64_gen_callee_cookie to get the ABI cookie. * config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files. gcc/testsuite/ * gcc.target/aarch64/sme/aarch64-sme.exp: New harness. * gcc.target/aarch64/sme/streaming_mode_1.c: New test. * gcc.target/aarch64/sme/streaming_mode_2.c: Likewise. * gcc.target/aarch64/sme/keyword_macros_1.c: Likewise. * g++.target/aarch64/sme/aarch64-sme.exp: New harness. * g++.target/aarch64/sme/streaming_mode_1.C: New test. * g++.target/aarch64/sme/streaming_mode_2.C: Likewise. * g++.target/aarch64/sme/keyword_macros_1.C: Likewise. * gcc.target/aarch64/auto-init-1.c: Only expect the call insn to contain 1 (const_int 0), not 2. --- gcc/config/aarch64/aarch64-c.cc | 14 ++ gcc/config/aarch64/aarch64-isa-modes.def | 35 +++ gcc/config/aarch64/aarch64-protos.h | 3 +- gcc/config/aarch64/aarch64.cc | 233 +++++++++++++++--- gcc/config/aarch64/aarch64.h | 24 +- gcc/config/aarch64/aarch64.md | 3 +- gcc/config/aarch64/t-aarch64 | 5 +- .../g++.target/aarch64/sme/aarch64-sme.exp | 40 +++ .../g++.target/aarch64/sme/keyword_macros_1.C | 4 + .../g++.target/aarch64/sme/streaming_mode_1.C | 142 +++++++++++ .../g++.target/aarch64/sme/streaming_mode_2.C | 25 ++ .../gcc.target/aarch64/auto-init-1.c | 3 +- .../gcc.target/aarch64/sme/aarch64-sme.exp | 40 +++ .../gcc.target/aarch64/sme/keyword_macros_1.c | 4 + .../gcc.target/aarch64/sme/streaming_mode_1.c | 130 ++++++++++ .../gcc.target/aarch64/sme/streaming_mode_2.c | 25 ++ 16 files changed, 685 insertions(+), 45 deletions(-) create mode 100644 gcc/config/aarch64/aarch64-isa-modes.def create mode 100644 gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp create mode 100644 gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sme/streaming_mode_1.C create mode 100644 gcc/testsuite/g++.target/aarch64/sme/streaming_mode_2.C create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc index ab8844f6049..1603621b30d 100644 --- a/gcc/config/aarch64/aarch64-c.cc +++ b/gcc/config/aarch64/aarch64-c.cc @@ -72,6 +72,20 @@ aarch64_define_unconditional_macros (cpp_reader *pfile) builtin_define_with_int_value ("__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE / 8); builtin_define ("__GCC_ASM_FLAG_OUTPUTS__"); + + /* Define keyword attributes like __arm_streaming as macros that expand + to the associated [[...]] attribute. Use __extension__ in the attribute + for C, since the [[...]] syntax was only added in C23. */ +#define DEFINE_ARM_KEYWORD_MACRO(NAME) \ + builtin_define_with_value ("__arm_" NAME, \ + lang_GNU_CXX () \ + ? "[[arm::" NAME "]]" \ + : "[[__extension__ arm::" NAME "]]", 0); + + DEFINE_ARM_KEYWORD_MACRO ("streaming"); + DEFINE_ARM_KEYWORD_MACRO ("streaming_compatible"); + +#undef DEFINE_ARM_KEYWORD_MACRO } /* Undefine/redefine macros that depend on the current backend state and may diff --git a/gcc/config/aarch64/aarch64-isa-modes.def b/gcc/config/aarch64/aarch64-isa-modes.def new file mode 100644 index 00000000000..5915c98a896 --- /dev/null +++ b/gcc/config/aarch64/aarch64-isa-modes.def @@ -0,0 +1,35 @@ +/* Copyright (C) 2023 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +/* This file defines a set of "ISA modes"; in other words, it defines + various bits of runtime state that control the set of available + instructions or that affect the semantics of instructions in some way. + + Before using #include to read this file, define a macro: + + DEF_AARCH64_ISA_MODE(NAME) + + where NAME is the name of the mode. */ + +/* Indicates that PSTATE.SM is known to be 1 or 0 respectively. These + modes are mutually exclusive. If neither mode is active then the state + of PSTATE.SM is not known at compile time. */ +DEF_AARCH64_ISA_MODE(SM_ON) +DEF_AARCH64_ISA_MODE(SM_OFF) + +#undef DEF_AARCH64_ISA_MODE diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 7ebdec2f58c..abc94e482af 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -767,6 +767,7 @@ bool aarch64_constant_address_p (rtx); bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); tree aarch64_vector_load_decl (tree); +rtx aarch64_gen_callee_cookie (aarch64_feature_flags, arm_pcs); void aarch64_expand_call (rtx, rtx, rtx, bool); bool aarch64_expand_cpymem_mops (rtx *, bool); bool aarch64_expand_cpymem (rtx *); @@ -852,7 +853,7 @@ bool aarch64_use_return_insn_p (void); const char *aarch64_output_casesi (rtx *); const char *aarch64_output_load_tp (rtx); -unsigned int aarch64_tlsdesc_abi_id (); +arm_pcs aarch64_tlsdesc_abi_id (); enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT); enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 622ab763306..1a4ef2a4396 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -2838,8 +2838,18 @@ handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree, gcc_unreachable (); } +/* Mutually-exclusive function type attributes for controlling PSTATE.SM. */ +static const struct attribute_spec::exclusions attr_streaming_exclusions[] = +{ + /* Attribute name exclusion applies to: + function, type, variable */ + { "streaming", false, true, false }, + { "streaming_compatible", false, true, false }, + { NULL, false, false, false } +}; + /* Table of machine attributes. */ -TARGET_GNU_ATTRIBUTES (aarch64_attribute_table, +static const attribute_spec aarch64_gnu_attributes[] = { /* { name, min_len, max_len, decl_req, type_req, fn_type_req, affects_type_identity, handler, exclude } */ @@ -2851,7 +2861,31 @@ TARGET_GNU_ATTRIBUTES (aarch64_attribute_table, { "Advanced SIMD type", 1, 1, false, true, false, true, NULL, NULL }, { "SVE type", 3, 3, false, true, false, true, NULL, NULL }, { "SVE sizeless type", 0, 0, false, true, false, true, NULL, NULL } -}); +}; + +static const scoped_attribute_specs aarch64_gnu_attribute_table = +{ + "gnu", aarch64_gnu_attributes +}; + +static const attribute_spec aarch64_arm_attributes[] = +{ + { "streaming", 0, 0, false, true, true, true, + NULL, attr_streaming_exclusions }, + { "streaming_compatible", 0, 0, false, true, true, true, + NULL, attr_streaming_exclusions }, +}; + +static const scoped_attribute_specs aarch64_arm_attribute_table = +{ + "arm", aarch64_arm_attributes +}; + +static const scoped_attribute_specs *const aarch64_attribute_table[] = +{ + &aarch64_gnu_attribute_table, + &aarch64_arm_attribute_table +}; typedef enum aarch64_cond_code { @@ -4089,6 +4123,48 @@ aarch64_fntype_abi (const_tree fntype) return default_function_abi; } +/* Return the state of PSTATE.SM on entry to functions of type FNTYPE. */ + +static aarch64_feature_flags +aarch64_fntype_pstate_sm (const_tree fntype) +{ + if (lookup_attribute ("arm", "streaming", TYPE_ATTRIBUTES (fntype))) + return AARCH64_FL_SM_ON; + + if (lookup_attribute ("arm", "streaming_compatible", + TYPE_ATTRIBUTES (fntype))) + return 0; + + return AARCH64_FL_SM_OFF; +} + +/* Return the ISA mode on entry to functions of type FNTYPE. */ + +static aarch64_feature_flags +aarch64_fntype_isa_mode (const_tree fntype) +{ + return aarch64_fntype_pstate_sm (fntype); +} + +/* Return the state of PSTATE.SM when compiling the body of + function FNDECL. This might be different from the state of + PSTATE.SM on entry. */ + +static aarch64_feature_flags +aarch64_fndecl_pstate_sm (const_tree fndecl) +{ + return aarch64_fntype_pstate_sm (TREE_TYPE (fndecl)); +} + +/* Return the ISA mode that should be used to compile the body of + function FNDECL. */ + +static aarch64_feature_flags +aarch64_fndecl_isa_mode (const_tree fndecl) +{ + return aarch64_fndecl_pstate_sm (fndecl); +} + /* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P. */ static bool @@ -4151,17 +4227,46 @@ aarch64_reg_save_mode (unsigned int regno) gcc_unreachable (); } -/* Implement TARGET_INSN_CALLEE_ABI. */ +/* Given the ISA mode on entry to a callee and the ABI of the callee, + return the CONST_INT that should be placed in an UNSPEC_CALLEE_ABI rtx. */ -const predefined_function_abi & -aarch64_insn_callee_abi (const rtx_insn *insn) +rtx +aarch64_gen_callee_cookie (aarch64_feature_flags isa_mode, arm_pcs pcs_variant) +{ + return gen_int_mode ((unsigned int) isa_mode + | (unsigned int) pcs_variant << AARCH64_NUM_ISA_MODES, + DImode); +} + +/* COOKIE is a CONST_INT from an UNSPEC_CALLEE_ABI rtx. Return the + callee's ABI. */ + +static const predefined_function_abi & +aarch64_callee_abi (rtx cookie) +{ + return function_abis[UINTVAL (cookie) >> AARCH64_NUM_ISA_MODES]; +} + +/* INSN is a call instruction. Return the CONST_INT stored in its + UNSPEC_CALLEE_ABI rtx. */ + +static rtx +aarch64_insn_callee_cookie (const rtx_insn *insn) { rtx pat = PATTERN (insn); gcc_assert (GET_CODE (pat) == PARALLEL); rtx unspec = XVECEXP (pat, 0, 1); gcc_assert (GET_CODE (unspec) == UNSPEC && XINT (unspec, 1) == UNSPEC_CALLEE_ABI); - return function_abis[INTVAL (XVECEXP (unspec, 0, 0))]; + return XVECEXP (unspec, 0, 0); +} + +/* Implement TARGET_INSN_CALLEE_ABI. */ + +const predefined_function_abi & +aarch64_insn_callee_abi (const rtx_insn *insn) +{ + return aarch64_callee_abi (aarch64_insn_callee_cookie (insn)); } /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves @@ -8086,7 +8191,7 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg) || pcum->pcs_variant == ARM_PCS_SVE); if (arg.end_marker_p ()) - return gen_int_mode (pcum->pcs_variant, DImode); + return aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant); aarch64_layout_arg (pcum_v, arg); return pcum->aapcs_reg; @@ -8107,9 +8212,15 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum, pcum->aapcs_nextnvrn = 0; pcum->aapcs_nextnprn = 0; if (fntype) - pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id (); + { + pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id (); + pcum->isa_mode = aarch64_fntype_isa_mode (fntype); + } else - pcum->pcs_variant = ARM_PCS_AAPCS64; + { + pcum->pcs_variant = ARM_PCS_AAPCS64; + pcum->isa_mode = AARCH64_FL_DEFAULT_ISA_MODE; + } pcum->aapcs_reg = NULL_RTX; pcum->aapcs_arg_processed = false; pcum->aapcs_stack_words = 0; @@ -10715,7 +10826,9 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, } funexp = XEXP (DECL_RTL (function), 0); funexp = gen_rtx_MEM (FUNCTION_MODE, funexp); - rtx callee_abi = gen_int_mode (fndecl_abi (function).id (), DImode); + auto isa_mode = aarch64_fntype_isa_mode (TREE_TYPE (function)); + auto pcs_variant = arm_pcs (fndecl_abi (function).id ()); + rtx callee_abi = aarch64_gen_callee_cookie (isa_mode, pcs_variant); insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, callee_abi)); SIBLING_CALL_P (insn) = 1; @@ -18873,6 +18986,7 @@ aarch64_override_options (void) SUBTARGET_OVERRIDE_OPTIONS; #endif + auto isa_mode = AARCH64_FL_DEFAULT_ISA_MODE; if (cpu && arch) { /* If both -mcpu and -march are specified, warn if they are not @@ -18885,25 +18999,25 @@ aarch64_override_options (void) } selected_arch = arch->arch; - aarch64_set_asm_isa_flags (arch_isa); + aarch64_set_asm_isa_flags (arch_isa | isa_mode); } else if (cpu) { selected_arch = cpu->arch; - aarch64_set_asm_isa_flags (cpu_isa); + aarch64_set_asm_isa_flags (cpu_isa | isa_mode); } else if (arch) { cpu = &all_cores[arch->ident]; selected_arch = arch->arch; - aarch64_set_asm_isa_flags (arch_isa); + aarch64_set_asm_isa_flags (arch_isa | isa_mode); } else { /* No -mcpu or -march specified, so use the default CPU. */ cpu = &all_cores[TARGET_CPU_DEFAULT]; selected_arch = cpu->arch; - aarch64_set_asm_isa_flags (cpu->flags); + aarch64_set_asm_isa_flags (cpu->flags | isa_mode); } selected_tune = tune ? tune->ident : cpu->ident; @@ -19076,6 +19190,21 @@ aarch64_save_restore_target_globals (tree new_tree) TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts (); } +/* Return the target_option_node for FNDECL, or the current options + if FNDECL is null. */ + +static tree +aarch64_fndecl_options (tree fndecl) +{ + if (!fndecl) + return target_option_current_node; + + if (tree options = DECL_FUNCTION_SPECIFIC_TARGET (fndecl)) + return options; + + return target_option_default_node; +} + /* Implement TARGET_SET_CURRENT_FUNCTION. Unpack the codegen decisions like tuning and ISA features from the DECL_FUNCTION_SPECIFIC_TARGET of the function, if such exists. This function may be called multiple @@ -19085,25 +19214,24 @@ aarch64_save_restore_target_globals (tree new_tree) static void aarch64_set_current_function (tree fndecl) { - if (!fndecl || fndecl == aarch64_previous_fndecl) - return; - - tree old_tree = (aarch64_previous_fndecl - ? DECL_FUNCTION_SPECIFIC_TARGET (aarch64_previous_fndecl) - : NULL_TREE); - - tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl); + tree old_tree = aarch64_fndecl_options (aarch64_previous_fndecl); + tree new_tree = aarch64_fndecl_options (fndecl); - /* If current function has no attributes but the previous one did, - use the default node. */ - if (!new_tree && old_tree) - new_tree = target_option_default_node; + auto new_isa_mode = (fndecl + ? aarch64_fndecl_isa_mode (fndecl) + : AARCH64_FL_DEFAULT_ISA_MODE); + auto isa_flags = TREE_TARGET_OPTION (new_tree)->x_aarch64_isa_flags; /* If nothing to do, return. #pragma GCC reset or #pragma GCC pop to the default have been handled by aarch64_save_restore_target_globals from aarch64_pragma_target_parse. */ - if (old_tree == new_tree) - return; + if (old_tree == new_tree + && (!fndecl || aarch64_previous_fndecl) + && (isa_flags & AARCH64_FL_ISA_MODES) == new_isa_mode) + { + gcc_assert (AARCH64_ISA_MODE == new_isa_mode); + return; + } aarch64_previous_fndecl = fndecl; @@ -19111,7 +19239,28 @@ aarch64_set_current_function (tree fndecl) cl_target_option_restore (&global_options, &global_options_set, TREE_TARGET_OPTION (new_tree)); + /* The ISA mode can vary based on function type attributes and + function declaration attributes. Make sure that the target + options correctly reflect these attributes. */ + if ((isa_flags & AARCH64_FL_ISA_MODES) != new_isa_mode) + { + auto base_flags = (aarch64_asm_isa_flags & ~AARCH64_FL_ISA_MODES); + aarch64_set_asm_isa_flags (base_flags | new_isa_mode); + + aarch64_override_options_internal (&global_options); + new_tree = build_target_option_node (&global_options, + &global_options_set); + DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = new_tree; + + tree new_optimize = build_optimization_node (&global_options, + &global_options_set); + if (new_optimize != optimization_default_node) + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl) = new_optimize; + } + aarch64_save_restore_target_globals (new_tree); + + gcc_assert (AARCH64_ISA_MODE == new_isa_mode); } /* Enum describing the various ways we can handle attributes. @@ -19161,7 +19310,7 @@ aarch64_handle_attr_arch (const char *str) { gcc_assert (tmp_arch); selected_arch = tmp_arch->arch; - aarch64_set_asm_isa_flags (tmp_flags); + aarch64_set_asm_isa_flags (tmp_flags | AARCH64_ISA_MODE); return true; } @@ -19202,7 +19351,7 @@ aarch64_handle_attr_cpu (const char *str) gcc_assert (tmp_cpu); selected_tune = tmp_cpu->ident; selected_arch = tmp_cpu->arch; - aarch64_set_asm_isa_flags (tmp_flags); + aarch64_set_asm_isa_flags (tmp_flags | AARCH64_ISA_MODE); return true; } @@ -19302,7 +19451,7 @@ aarch64_handle_attr_isa_flags (char *str) features if the user wants to handpick specific features. */ if (strncmp ("+nothing", str, 8) == 0) { - isa_flags = 0; + isa_flags = AARCH64_ISA_MODE; str += 8; } @@ -19795,7 +19944,7 @@ aarch64_can_inline_p (tree caller, tree callee) /* Return the ID of the TLDESC ABI, initializing the descriptor if hasn't been already. */ -unsigned int +arm_pcs aarch64_tlsdesc_abi_id () { predefined_function_abi &tlsdesc_abi = function_abis[ARM_PCS_TLSDESC]; @@ -19809,7 +19958,7 @@ aarch64_tlsdesc_abi_id () SET_HARD_REG_BIT (full_reg_clobbers, regno); tlsdesc_abi.initialize (ARM_PCS_TLSDESC, full_reg_clobbers); } - return tlsdesc_abi.id (); + return ARM_PCS_TLSDESC; } /* Return true if SYMBOL_REF X binds locally. */ @@ -27745,22 +27894,26 @@ aarch64_simd_clone_usable (struct cgraph_node *node) static int aarch64_comp_type_attributes (const_tree type1, const_tree type2) { - auto check_attr = [&](const char *name) { - tree attr1 = lookup_attribute (name, TYPE_ATTRIBUTES (type1)); - tree attr2 = lookup_attribute (name, TYPE_ATTRIBUTES (type2)); + auto check_attr = [&](const char *ns, const char *name) { + tree attr1 = lookup_attribute (ns, name, TYPE_ATTRIBUTES (type1)); + tree attr2 = lookup_attribute (ns, name, TYPE_ATTRIBUTES (type2)); if (!attr1 && !attr2) return true; return attr1 && attr2 && attribute_value_equal (attr1, attr2); }; - if (!check_attr ("aarch64_vector_pcs")) + if (!check_attr ("gnu", "aarch64_vector_pcs")) + return 0; + if (!check_attr ("gnu", "Advanced SIMD type")) + return 0; + if (!check_attr ("gnu", "SVE type")) return 0; - if (!check_attr ("Advanced SIMD type")) + if (!check_attr ("gnu", "SVE sizeless type")) return 0; - if (!check_attr ("SVE type")) + if (!check_attr ("arm", "streaming")) return 0; - if (!check_attr ("SVE sizeless type")) + if (!check_attr ("arm", "streaming_compatible")) return 0; return 1; } diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 2f0777a37ac..4c7d9409fbc 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -157,10 +157,13 @@ #ifndef USED_FOR_TARGET -/* Define an enum of all features (architectures and extensions). */ +/* Define an enum of all features (ISA modes, architectures and extensions). + The ISA modes must come first. */ enum class aarch64_feature : unsigned char { +#define DEF_AARCH64_ISA_MODE(IDENT) IDENT, #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) IDENT, #define AARCH64_ARCH(A, B, IDENT, D, E) IDENT, +#include "aarch64-isa-modes.def" #include "aarch64-option-extensions.def" #include "aarch64-arches.def" }; @@ -169,16 +172,34 @@ enum class aarch64_feature : unsigned char { #define HANDLE(IDENT) \ constexpr auto AARCH64_FL_##IDENT \ = aarch64_feature_flags (1) << int (aarch64_feature::IDENT); +#define DEF_AARCH64_ISA_MODE(IDENT) HANDLE (IDENT) #define AARCH64_OPT_EXTENSION(A, IDENT, C, D, E, F) HANDLE (IDENT) #define AARCH64_ARCH(A, B, IDENT, D, E) HANDLE (IDENT) +#include "aarch64-isa-modes.def" #include "aarch64-option-extensions.def" #include "aarch64-arches.def" #undef HANDLE +constexpr auto AARCH64_FL_SM_STATE = AARCH64_FL_SM_ON | AARCH64_FL_SM_OFF; + +constexpr unsigned int AARCH64_NUM_ISA_MODES = (0 +#define DEF_AARCH64_ISA_MODE(IDENT) + 1 +#include "aarch64-isa-modes.def" +); + +/* The mask of all ISA modes. */ +constexpr auto AARCH64_FL_ISA_MODES + = (aarch64_feature_flags (1) << AARCH64_NUM_ISA_MODES) - 1; + +/* The default ISA mode, for functions with no attributes that specify + something to the contrary. */ +constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; + #endif /* Macros to test ISA flags. */ +#define AARCH64_ISA_MODE (aarch64_isa_flags & AARCH64_FL_ISA_MODES) #define AARCH64_ISA_CRC (aarch64_isa_flags & AARCH64_FL_CRC) #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO) #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP) @@ -933,6 +954,7 @@ enum arm_pcs typedef struct { enum arm_pcs pcs_variant; + aarch64_feature_flags isa_mode; int aapcs_arg_processed; /* No need to lay out this argument again. */ int aapcs_ncrn; /* Next Core register number. */ int aapcs_nextncrn; /* Next next core register number. */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e5f55d98057..b4608d1c5e3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7335,7 +7335,8 @@ (define_expand "tlsdesc_small_" { if (TARGET_SVE) { - rtx abi = gen_int_mode (aarch64_tlsdesc_abi_id (), DImode); + rtx abi = aarch64_gen_callee_cookie (AARCH64_ISA_MODE, + aarch64_tlsdesc_abi_id ()); rtx_insn *call = emit_call_insn (gen_tlsdesc_small_sve_ (operands[0], abi)); RTL_CONST_CALL_P (call) = 1; diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 index a9a244ab6d6..a4e0aa03274 100644 --- a/gcc/config/aarch64/t-aarch64 +++ b/gcc/config/aarch64/t-aarch64 @@ -20,7 +20,10 @@ TM_H += $(srcdir)/config/aarch64/aarch64-fusion-pairs.def \ $(srcdir)/config/aarch64/aarch64-tuning-flags.def \ - $(srcdir)/config/aarch64/aarch64-option-extensions.def + $(srcdir)/config/aarch64/aarch64-option-extensions.def \ + $(srcdir)/config/aarch64/aarch64-cores.def \ + $(srcdir)/config/aarch64/aarch64-isa-modes.def \ + $(srcdir)/config/aarch64/aarch64-arches.def OPTIONS_H_EXTRA += $(srcdir)/config/aarch64/aarch64-cores.def \ $(srcdir)/config/aarch64/aarch64-arches.def diff --git a/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp new file mode 100644 index 00000000000..72fcd0bd982 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp @@ -0,0 +1,40 @@ +# Specific regression driver for AArch64 SME. +# Copyright (C) 2009-2023 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +# GCC testsuite that uses the `dg.exp' driver. + +# Exit immediately if this isn't an AArch64 target. +if {![istarget aarch64*-*-*] } { + return +} + +# Load support procs. +load_lib g++-dg.exp + +# Initialize `dg'. +dg-init + +aarch64-with-arch-dg-options "" { + # Main loop. + dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ + "" "" +} + +# All done. +dg-finish diff --git a/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C new file mode 100644 index 00000000000..032485adf95 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C @@ -0,0 +1,4 @@ +/* { dg-options "-std=c++11 -pedantic-errors" } */ + +void f1 () __arm_streaming; +void f2 () __arm_streaming_compatible; diff --git a/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_1.C b/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_1.C new file mode 100644 index 00000000000..c3de726e726 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_1.C @@ -0,0 +1,142 @@ +// { dg-options "" } + +void sc_a () [[arm::streaming_compatible]]; +void sc_a (); // { dg-error "ambiguating new declaration" "" { xfail *-*-* } } + +void sc_b (); +void sc_b () [[arm::streaming_compatible]]; // { dg-error "ambiguating new declaration" } + +void sc_c () [[arm::streaming_compatible]]; +void sc_c () {} // Inherits attribute from declaration (confusingly). + +void sc_d (); +void sc_d () [[arm::streaming_compatible]] {} // { dg-error "ambiguating new declaration" } + +void sc_e () [[arm::streaming_compatible]] {} +void sc_e (); // { dg-error "ambiguating new declaration" "" { xfail *-*-* } } + +void sc_f () {} +void sc_f () [[arm::streaming_compatible]]; // { dg-error "ambiguating new declaration" } + +extern void (*sc_g) (); +extern void (*sc_g) () [[arm::streaming_compatible]]; // { dg-error "conflicting declaration" } + +extern void (*sc_h) () [[arm::streaming_compatible]]; +extern void (*sc_h) (); // { dg-error "conflicting declaration" } + +//---------------------------------------------------------------------------- + +void s_a () [[arm::streaming]]; +void s_a (); // { dg-error "ambiguating new declaration" "" { xfail *-*-* } } + +void s_b (); +void s_b () [[arm::streaming]]; // { dg-error "ambiguating new declaration" } + +void s_c () [[arm::streaming]]; +void s_c () {} // Inherits attribute from declaration (confusingly). + +void s_d (); +void s_d () [[arm::streaming]] {} // { dg-error "ambiguating new declaration" } + +void s_e () [[arm::streaming]] {} +void s_e (); // { dg-error "ambiguating new declaration" "" { xfail *-*-* } } + +void s_f () {} +void s_f () [[arm::streaming]]; // { dg-error "ambiguating new declaration" } + +extern void (*s_g) (); +extern void (*s_g) () [[arm::streaming]]; // { dg-error "conflicting declaration" } + +extern void (*s_h) () [[arm::streaming]]; +extern void (*s_h) (); // { dg-error "conflicting declaration" } + +//---------------------------------------------------------------------------- + +void mixed_a () [[arm::streaming]]; +void mixed_a () [[arm::streaming_compatible]]; // { dg-error "ambiguating new declaration" } + +void mixed_b () [[arm::streaming_compatible]]; +void mixed_b () [[arm::streaming]]; // { dg-error "ambiguating new declaration" } + +void mixed_c () [[arm::streaming]]; +void mixed_c () [[arm::streaming_compatible]] {} // { dg-error "ambiguating new declaration" } + +void mixed_d () [[arm::streaming_compatible]]; +void mixed_d () [[arm::streaming]] {} // { dg-error "ambiguating new declaration" } + +void mixed_e () [[arm::streaming]] {} +void mixed_e () [[arm::streaming_compatible]]; // { dg-error "ambiguating new declaration" } + +void mixed_f () [[arm::streaming_compatible]] {} +void mixed_f () [[arm::streaming]]; // { dg-error "ambiguating new declaration" } + +extern void (*mixed_g) () [[arm::streaming_compatible]]; +extern void (*mixed_g) () [[arm::streaming]]; // { dg-error "conflicting declaration" } + +extern void (*mixed_h) () [[arm::streaming]]; +extern void (*mixed_h) () [[arm::streaming_compatible]]; // { dg-error "conflicting declaration" } + +//---------------------------------------------------------------------------- + +void contradiction_1 () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +void contradiction_2 () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +int [[arm::streaming_compatible]] int_attr; // { dg-warning "attribute ignored" } +void [[arm::streaming_compatible]] ret_attr (); // { dg-warning "attribute ignored" } +void *[[arm::streaming]] ptr_attr; // { dg-warning "only applies to function types" } + +typedef void s_callback () [[arm::streaming]]; +typedef void sc_callback () [[arm::streaming_compatible]]; + +typedef void contradiction_callback_1 () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +typedef void contradiction_callback_2 () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +void (*contradiction_callback_ptr_1) () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +void (*contradiction_callback_ptr_2) () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +struct s { + void (*contradiction_callback_ptr_1) () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } + void (*contradiction_callback_ptr_2) () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } +}; + +//---------------------------------------------------------------------------- + +void keyword_ok_1 () __arm_streaming; +void keyword_ok_1 () __arm_streaming; + +void keyword_ok_2 () __arm_streaming; +void keyword_ok_2 () [[arm::streaming]]; + +void keyword_ok_3 () [[arm::streaming]]; +void keyword_ok_3 () __arm_streaming; + +void keyword_ok_4 () __arm_streaming [[arm::streaming]]; + +void keyword_ok_5 () __arm_streaming_compatible; +void keyword_ok_5 () [[arm::streaming_compatible]]; + +//---------------------------------------------------------------------------- + +void keyword_contradiction_1 () __arm_streaming; +void keyword_contradiction_1 (); // { dg-error "ambiguating new declaration" "" { xfail *-*-* } } + +void keyword_contradiction_2 (); +void keyword_contradiction_2 () __arm_streaming; // { dg-error "ambiguating new declaration" } + +void keyword_contradiction_3 () __arm_streaming; +void keyword_contradiction_3 () [[arm::streaming_compatible]]; // { dg-error "ambiguating new declaration" } + +void keyword_contradiction_4 () [[arm::streaming_compatible]]; +void keyword_contradiction_4 () __arm_streaming; // { dg-error "ambiguating new declaration" } + +//---------------------------------------------------------------------------- + +struct s1 +{ + virtual void f () [[arm::streaming]]; +}; + +struct s2 : public s1 +{ + void f () override; // { dg-error "conflicting type attributes" } +}; diff --git a/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_2.C b/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_2.C new file mode 100644 index 00000000000..f2dd2db9b6f --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/streaming_mode_2.C @@ -0,0 +1,25 @@ +// { dg-options "" } + +void sc_fn () [[arm::streaming_compatible]]; +void s_fn () [[arm::streaming]]; +void ns_fn (); + +void (*sc_fn_ptr) () [[arm::streaming_compatible]]; +void (*s_fn_ptr) () [[arm::streaming]]; +void (*ns_fn_ptr) (); + +void +f () +{ + sc_fn_ptr = sc_fn; + sc_fn_ptr = s_fn; // { dg-error "invalid conversion" } + sc_fn_ptr = ns_fn; // { dg-error "invalid conversion" } + + s_fn_ptr = sc_fn; // { dg-error "invalid conversion" } + s_fn_ptr = s_fn; + s_fn_ptr = ns_fn; // { dg-error "invalid conversion" } + + ns_fn_ptr = sc_fn; // { dg-error "invalid conversion" } + ns_fn_ptr = s_fn; // { dg-error "invalid conversion" } + ns_fn_ptr = ns_fn; +} diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c index 0fa470880bf..45bb02561ed 100644 --- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c +++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c @@ -29,4 +29,5 @@ void foo() return; } -/* { dg-final { scan-rtl-dump-times "const_int 0" 11 "expand" } } */ +/* Includes 1 for the call instruction and 1 for a nop. */ +/* { dg-final { scan-rtl-dump-times "const_int 0" 10 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp new file mode 100644 index 00000000000..c990e59247a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp @@ -0,0 +1,40 @@ +# Specific regression driver for AArch64 SME. +# Copyright (C) 2009-2023 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +# GCC testsuite that uses the `dg.exp' driver. + +# Exit immediately if this isn't an AArch64 target. +if {![istarget aarch64*-*-*] } { + return +} + +# Load support procs. +load_lib gcc-dg.exp + +# Initialize `dg'. +dg-init + +aarch64-with-arch-dg-options "" { + # Main loop. + dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ + "" "" +} + +# All done. +dg-finish diff --git a/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c new file mode 100644 index 00000000000..8f1b836764e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c @@ -0,0 +1,4 @@ +/* { dg-options "-std=c90 -pedantic-errors" } */ + +void f1 () __arm_streaming; +void f2 () __arm_streaming_compatible; diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c new file mode 100644 index 00000000000..8874b05b882 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_1.c @@ -0,0 +1,130 @@ +// { dg-options "" } + +void sc_a () [[arm::streaming_compatible]]; +void sc_a (); // { dg-error "conflicting types" } + +void sc_b (); +void sc_b () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +void sc_c () [[arm::streaming_compatible]]; +void sc_c () {} // Inherits attribute from declaration (confusingly). + +void sc_d (); +void sc_d () [[arm::streaming_compatible]] {} // { dg-error "conflicting types" } + +void sc_e () [[arm::streaming_compatible]] {} +void sc_e (); // { dg-error "conflicting types" } + +void sc_f () {} +void sc_f () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +extern void (*sc_g) (); +extern void (*sc_g) () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +extern void (*sc_h) () [[arm::streaming_compatible]]; +extern void (*sc_h) (); // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void s_a () [[arm::streaming]]; +void s_a (); // { dg-error "conflicting types" } + +void s_b (); +void s_b () [[arm::streaming]]; // { dg-error "conflicting types" } + +void s_c () [[arm::streaming]]; +void s_c () {} // Inherits attribute from declaration (confusingly). + +void s_d (); +void s_d () [[arm::streaming]] {} // { dg-error "conflicting types" } + +void s_e () [[arm::streaming]] {} +void s_e (); // { dg-error "conflicting types" } + +void s_f () {} +void s_f () [[arm::streaming]]; // { dg-error "conflicting types" } + +extern void (*s_g) (); +extern void (*s_g) () [[arm::streaming]]; // { dg-error "conflicting types" } + +extern void (*s_h) () [[arm::streaming]]; +extern void (*s_h) (); // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void mixed_a () [[arm::streaming]]; +void mixed_a () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +void mixed_b () [[arm::streaming_compatible]]; +void mixed_b () [[arm::streaming]]; // { dg-error "conflicting types" } + +void mixed_c () [[arm::streaming]]; +void mixed_c () [[arm::streaming_compatible]] {} // { dg-error "conflicting types" } + +void mixed_d () [[arm::streaming_compatible]]; +void mixed_d () [[arm::streaming]] {} // { dg-error "conflicting types" } + +void mixed_e () [[arm::streaming]] {} +void mixed_e () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +void mixed_f () [[arm::streaming_compatible]] {} +void mixed_f () [[arm::streaming]]; // { dg-error "conflicting types" } + +extern void (*mixed_g) () [[arm::streaming_compatible]]; +extern void (*mixed_g) () [[arm::streaming]]; // { dg-error "conflicting types" } + +extern void (*mixed_h) () [[arm::streaming]]; +extern void (*mixed_h) () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void contradiction_1 () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +void contradiction_2 () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +int [[arm::streaming_compatible]] int_attr; // { dg-warning "only applies to function types" } +void [[arm::streaming_compatible]] ret_attr (); // { dg-warning "only applies to function types" } +void *[[arm::streaming]] ptr_attr; // { dg-warning "only applies to function types" } + +typedef void s_callback () [[arm::streaming]]; +typedef void sc_callback () [[arm::streaming_compatible]]; + +typedef void contradiction_callback_1 () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +typedef void contradiction_callback_2 () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +void (*contradiction_callback_ptr_1) () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } +void (*contradiction_callback_ptr_2) () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } + +struct s { + void (*contradiction_callback_ptr_1) () [[arm::streaming, arm::streaming_compatible]]; // { dg-warning "conflicts with attribute" } + void (*contradiction_callback_ptr_2) () [[arm::streaming_compatible, arm::streaming]]; // { dg-warning "conflicts with attribute" } +}; + +//---------------------------------------------------------------------------- + +void keyword_ok_1 () __arm_streaming; +void keyword_ok_1 () __arm_streaming; + +void keyword_ok_2 () __arm_streaming; +void keyword_ok_2 () [[arm::streaming]]; + +void keyword_ok_3 () [[arm::streaming]]; +void keyword_ok_3 () __arm_streaming; + +void keyword_ok_4 () __arm_streaming [[arm::streaming]]; + +void keyword_ok_5 () __arm_streaming_compatible; +void keyword_ok_5 () [[arm::streaming_compatible]]; + +//---------------------------------------------------------------------------- + +void keyword_contradiction_1 () __arm_streaming; +void keyword_contradiction_1 (); // { dg-error "conflicting types" } + +void keyword_contradiction_2 (); +void keyword_contradiction_2 () __arm_streaming; // { dg-error "conflicting types" } + +void keyword_contradiction_3 () __arm_streaming; +void keyword_contradiction_3 () [[arm::streaming_compatible]]; // { dg-error "conflicting types" } + +void keyword_contradiction_4 () [[arm::streaming_compatible]]; +void keyword_contradiction_4 () __arm_streaming; // { dg-error "conflicting types" } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c new file mode 100644 index 00000000000..1e328c817d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_2.c @@ -0,0 +1,25 @@ +// { dg-options "" } + +void sc_fn () [[arm::streaming_compatible]]; +void s_fn () [[arm::streaming]]; +void ns_fn (); + +void (*sc_fn_ptr) () [[arm::streaming_compatible]]; +void (*s_fn_ptr) () [[arm::streaming]]; +void (*ns_fn_ptr) (); + +void +f () +{ + sc_fn_ptr = sc_fn; + sc_fn_ptr = s_fn; // { dg-warning "incompatible pointer type" } + sc_fn_ptr = ns_fn; // { dg-warning "incompatible pointer type" } + + s_fn_ptr = sc_fn; // { dg-warning "incompatible pointer type" } + s_fn_ptr = s_fn; + s_fn_ptr = ns_fn; // { dg-warning "incompatible pointer type" } + + ns_fn_ptr = sc_fn; // { dg-warning "incompatible pointer type" } + ns_fn_ptr = s_fn; // { dg-warning "incompatible pointer type" } + ns_fn_ptr = ns_fn; +} From patchwork Fri Nov 17 17:25:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865158 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3jK41kLz1yS7 for ; Sat, 18 Nov 2023 04:26:28 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3092B3857B95 for ; Fri, 17 Nov 2023 17:26:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F35B43861813 for ; Fri, 17 Nov 2023 17:25:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F35B43861813 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F35B43861813 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241950; cv=none; b=nSrHtv+H9sNlTME3EN46NkAKWpEUHGOLQbPxPgrNAViwBR1WfVtX0Uk4NVPytIw67taCjKllMJwjD2IYOSKdCeK1JLwDjdu40Wk8C+Er/Qy7Xt46GwE/itQNzNyOpurKnRtKjGcr/B4EoEfXCq6PIQy97VZ6vE+YAVUi8pCvkwM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241950; c=relaxed/simple; bh=Aewh7qgvUDyA6QZJchWgAWmU8Nm0FAHXtdrAcng1VLg=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=xC+a0rKXstDw6ugLhlnfVMUNIb6MKEBk8V9xdqOAjNGNBKaTn4EYuGTr3ov9h5uxAMOUFCapjzusudlOn7MRE2khXRA7UqW1n3vvV0kz8kgpEHDTN0YLLVHpG3om5oetZSbR0nk1AkFjgKiKkwk3kHosIVDE39Q4r7nuwMM/ONo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B22B1477 for ; Fri, 17 Nov 2023 09:26:33 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 260BB3F73F for ; Fri, 17 Nov 2023 09:25:47 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 08/21] aarch64: Add +sme References: Date: Fri, 17 Nov 2023 17:25:45 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds the +sme ISA feature and requires it to be present when compiling arm_streaming code. (arm_streaming_compatible code does not necessarily assume the presence of SME. It just has to work when SME is present and streaming mode is enabled.) gcc/ * doc/invoke.texi: Document SME. * doc/sourcebuild.texi: Document aarch64_sve. * config/aarch64/aarch64-option-extensions.def (sme): Define. * config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro. (TARGET_SME): Likewise. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Ensure that SME is present when compiling streaming code. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_sme): New target test. * gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled if it isn't by default. * g++.target/aarch64/sme/aarch64-sme.exp: Likewise. * gcc.target/aarch64/sme/streaming_mode_3.c: New test. --- .../aarch64/aarch64-option-extensions.def | 2 + gcc/config/aarch64/aarch64.cc | 33 ++++++++++ gcc/config/aarch64/aarch64.h | 5 ++ gcc/doc/invoke.texi | 2 + gcc/doc/sourcebuild.texi | 2 + .../g++.target/aarch64/sme/aarch64-sme.exp | 10 ++- .../gcc.target/aarch64/sme/aarch64-sme.exp | 10 ++- .../gcc.target/aarch64/sme/streaming_mode_3.c | 63 +++++++++++++++++++ .../gcc.target/aarch64/sme/streaming_mode_4.c | 22 +++++++ gcc/testsuite/lib/target-supports.exp | 12 ++++ 10 files changed, 157 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index 825f3bf7758..fb9ff1b66b2 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -151,4 +151,6 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "") AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc") +AARCH64_OPT_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme") + #undef AARCH64_OPT_EXTENSION diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 1a4ef2a4396..fcaea87c737 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -11737,6 +11737,23 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2) return true; } +/* Implement TARGET_START_CALL_ARGS. */ + +static void +aarch64_start_call_args (cumulative_args_t ca_v) +{ + CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v); + + if (!TARGET_SME && (ca->isa_mode & AARCH64_FL_SM_ON)) + { + error ("calling a streaming function requires the ISA extension %qs", + "sme"); + inform (input_location, "you can enable %qs using the command-line" + " option %<-march%>, or by using the %" + " attribute or pragma", "sme"); + } +} + /* This function is used by the call expanders of the machine description. RESULT is the register in which the result is returned. It's NULL for "call" and "sibcall". @@ -18541,6 +18558,19 @@ aarch64_override_options_internal (struct gcc_options *opts) && !fixed_regs[R18_REGNUM]) error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); + if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON) + && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME)) + { + error ("streaming functions require the ISA extension %qs", "sme"); + inform (input_location, "you can enable %qs using the command-line" + " option %<-march%>, or by using the %" + " attribute or pragma", "sme"); + opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY; + auto new_flags = (opts->x_aarch64_asm_isa_flags + | feature_deps::SME ().enable); + aarch64_set_asm_isa_flags (opts, new_flags); + } + initialize_aarch64_code_model (opts); initialize_aarch64_tls_size (opts); aarch64_tpidr_register = opts->x_aarch64_tpidr_reg; @@ -28607,6 +28637,9 @@ aarch64_run_selftests (void) #undef TARGET_FUNCTION_VALUE_REGNO_P #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p +#undef TARGET_START_CALL_ARGS +#define TARGET_START_CALL_ARGS aarch64_start_call_args + #undef TARGET_GIMPLE_FOLD_BUILTIN #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 4c7d9409fbc..ded640e8c7b 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -214,6 +214,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define AARCH64_ISA_SVE2_BITPERM (aarch64_isa_flags & AARCH64_FL_SVE2_BITPERM) #define AARCH64_ISA_SVE2_SHA3 (aarch64_isa_flags & AARCH64_FL_SVE2_SHA3) #define AARCH64_ISA_SVE2_SM4 (aarch64_isa_flags & AARCH64_FL_SVE2_SM4) +#define AARCH64_ISA_SME (aarch64_isa_flags & AARCH64_FL_SME) #define AARCH64_ISA_V8_3A (aarch64_isa_flags & AARCH64_FL_V8_3A) #define AARCH64_ISA_DOTPROD (aarch64_isa_flags & AARCH64_FL_DOTPROD) #define AARCH64_ISA_AES (aarch64_isa_flags & AARCH64_FL_AES) @@ -293,6 +294,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* SVE2 SM4 instructions, enabled through +sve2-sm4. */ #define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4) +/* SME instructions, enabled through +sme. Note that this does not + imply anything about the state of PSTATE.SM. */ +#define TARGET_SME (AARCH64_ISA_SME) + /* ARMv8.3-A features. */ #define TARGET_ARMV8_3 (AARCH64_ISA_V8_3A) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ba99a701f3e..31a3f7c567d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -21059,6 +21059,8 @@ Enable the Flag Manipulation instructions Extension. Enable the Pointer Authentication Extension. @item cssc Enable the Common Short Sequence Compression instructions. +@item sme +Enable the Scalable Matrix Extension. @end table diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index eaa75f00f5c..448f5e08578 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2316,6 +2316,8 @@ AArch64 target which generates instruction sequences for big endian. @item aarch64_small_fpic Binutils installed on test system supports relocation types required by -fpic for AArch64 small memory model. +@item aarch64_sme +AArch64 target that generates instructions for SME. @item aarch64_sve_hw AArch64 target that is able to generate and execute SVE code (regardless of whether it does so by default). diff --git a/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp index 72fcd0bd982..1c3e69cde12 100644 --- a/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp +++ b/gcc/testsuite/g++.target/aarch64/sme/aarch64-sme.exp @@ -30,10 +30,16 @@ load_lib g++-dg.exp # Initialize `dg'. dg-init -aarch64-with-arch-dg-options "" { +if { [check_effective_target_aarch64_sme] } { + set sme_flags "" +} else { + set sme_flags "-march=armv9-a+sme" +} + +aarch64-with-arch-dg-options $sme_flags { # Main loop. dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ - "" "" + "" $sme_flags } # All done. diff --git a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp index c990e59247a..011310e8061 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp +++ b/gcc/testsuite/gcc.target/aarch64/sme/aarch64-sme.exp @@ -30,10 +30,16 @@ load_lib gcc-dg.exp # Initialize `dg'. dg-init -aarch64-with-arch-dg-options "" { +if { [check_effective_target_aarch64_sme] } { + set sme_flags "" +} else { + set sme_flags "-march=armv9-a+sme" +} + +aarch64-with-arch-dg-options $sme_flags { # Main loop. dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \ - "" "" + "" $sme_flags } # All done. diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c new file mode 100644 index 00000000000..45ec92321b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c @@ -0,0 +1,63 @@ +// { dg-options "" } + +#pragma GCC target "+nosme" + +void sc_a () [[arm::streaming_compatible]] {} +void s_a () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } +void ns_a () {} + +void sc_b () [[arm::streaming_compatible]] {} +void ns_b () {} +void s_b () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } + +void sc_c () [[arm::streaming_compatible]] {} +void sc_d () [[arm::streaming_compatible]] {} + +void s_c () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } +void s_d () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } + +void ns_c () {} +void ns_d () {} + +void sc_e () [[arm::streaming_compatible]]; +void s_e () [[arm::streaming]]; +void ns_e (); + +#pragma GCC target "+sme" + +void sc_f () [[arm::streaming_compatible]] {} +void s_f () [[arm::streaming]] {} +void ns_f () {} + +void sc_g () [[arm::streaming_compatible]] {} +void ns_g () {} +void s_g () [[arm::streaming]] {} + +void sc_h () [[arm::streaming_compatible]] {} +void sc_i () [[arm::streaming_compatible]] {} + +void s_h () [[arm::streaming]] {} +void s_i () [[arm::streaming]] {} + +void ns_h () {} +void ns_i () {} + +void sc_j () [[arm::streaming_compatible]]; +void s_j () [[arm::streaming]]; +void ns_j (); + +#pragma GCC target "+sme" + +void sc_k () [[arm::streaming_compatible]] {} + +#pragma GCC target "+nosme" +#pragma GCC target "+sme" + +void s_k () [[arm::streaming]] {} + +#pragma GCC target "+nosme" +#pragma GCC target "+sme" + +void ns_k () {} + +#pragma GCC target "+nosme" diff --git a/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c new file mode 100644 index 00000000000..50e92f2e18a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c @@ -0,0 +1,22 @@ +// { dg-options "-mgeneral-regs-only" } + +void sc_a () [[arm::streaming_compatible]] {} +void s_a () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } +void ns_a () {} + +void sc_b () [[arm::streaming_compatible]] {} +void ns_b () {} +void s_b () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } + +void sc_c () [[arm::streaming_compatible]] {} +void sc_d () [[arm::streaming_compatible]] {} + +void s_c () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } +void s_d () [[arm::streaming]] {} // { dg-error "streaming functions require the ISA extension 'sme'" } + +void ns_c () {} +void ns_d () {} + +void sc_e () [[arm::streaming_compatible]]; +void s_e () [[arm::streaming]]; +void ns_e (); diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1a7bea96c1e..a78a210b79f 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4413,6 +4413,18 @@ proc aarch64_sve_bits { } { }] } +# Return 1 if this is an AArch64 target that generates instructions for SME. +proc check_effective_target_aarch64_sme { } { + if { ![istarget aarch64*-*-*] } { + return 0 + } + return [check_no_compiler_messages aarch64_sme assembly { + #if !defined (__ARM_FEATURE_SME) + #error FOO + #endif + }] +} + # Return 1 if this is a compiler supporting ARC atomic operations proc check_effective_target_arc_atomic { } { return [check_no_compiler_messages arc_atomic assembly { From patchwork Fri Nov 17 17:25:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3jr2GhBz1yS7 for ; Sat, 18 Nov 2023 04:26:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D4D203857B97 for ; Fri, 17 Nov 2023 17:26:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E1EDD38708BC for ; Fri, 17 Nov 2023 17:26:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E1EDD38708BC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E1EDD38708BC Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241966; cv=none; b=h9CUJs/1NYldLpE/7lk0/1M/BezecWYuJ/czRn4mSnHq61OucNFKioN1u5luXpzHPiNf/rIFoXTRT/EnNX2qcPw6GIVB5W0BNEEPZp0tHPC7v2Wqgsre28O3PpWW1y7PeC8e3dEGS9i93kye+YNMiRaLo2tu1Lu9nL+tGG8FFFk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241966; c=relaxed/simple; bh=PvQa8V3Iencu9Q7j4amuO+hxBsSElB1ZpBLxGRLVCIA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=bQhvcqSVzceeKGcD4iTYraQTE8zyTxongxdslKMjpb/1r3CyJDlf1hY+llknEukeu9OIL6prd3K/b3YPAC4YpdIXhHNi5wKw3eFDiyKjDGiKcfhsEgMpdO/kDpC0HWE8DwCZV4EDKKEAkUINJ+JahcLR0hmtIoKStuHzngu3vDA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C466C1477 for ; Fri, 17 Nov 2023 09:26:47 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 28C3B3F73F for ; Fri, 17 Nov 2023 09:26:01 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 09/21] aarch64: Distinguish streaming-compatible AdvSIMD insns References: Date: Fri, 17 Nov 2023 17:25:59 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not streaming-compatible, so we need to use the SVE move instructions where enabled, or fall back to the nofp16 handling otherwise. I haven't found a good way of testing the SVE EXT alternative in aarch64_simd_mov_from_high, but I'd rather provide it than not. gcc/ * config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro. (TARGET_SIMD): Require PSTATE.SM to be 0. (AARCH64_ISA_SM_OFF): New macro. * config/aarch64/aarch64.cc (aarch64_array_mode_supported_p): Allow Advanced SIMD structure modes for TARGET_BASE_SIMD. (aarch64_print_operand): Support '%Z'. (aarch64_secondary_reload): Expect SVE moves to be used for Advanced SIMD modes if SVE is enabled and non-streaming Advanced SIMD isn't. (aarch64_register_move_cost): Likewise. (aarch64_simd_container_mode): Extend Advanced SIMD mode handling to TARGET_BASE_SIMD. (aarch64_expand_cpymem): Expand commentary. * config/aarch64/aarch64.md (arches): Add base_simd and nobase_simd. (arch_enabled): Handle it. (*mov_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD. (*movti_aarch64): Use an SVE move instruction if non-streaming SIMD isn't available. (*mov_aarch64): Likewise. (load_pair_dw_tftf): Extend to TARGET_BASE_SIMD. (store_pair_dw_tftf): Likewise. (loadwb_pair_): Likewise. (storewb_pair_): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Allow UMOV in streaming mode. (*aarch64_simd_mov): Use an SVE move instruction if non-streaming SIMD isn't available. (aarch64_store_lane0): Depend on TARGET_FLOAT rather than TARGET_SIMD. (aarch64_simd_mov_from_low): Likewise. Use fmov if Advanced SIMD is completely disabled. (aarch64_simd_mov_from_high): Use SVE EXT instructions if non-streaming SIMD isn't available. gcc/testsuite/ * gcc.target/aarch64/movdf_2.c: New test. * gcc.target/aarch64/movdi_3.c: Likewise. * gcc.target/aarch64/movhf_2.c: Likewise. * gcc.target/aarch64/movhi_2.c: Likewise. * gcc.target/aarch64/movqi_2.c: Likewise. * gcc.target/aarch64/movsf_2.c: Likewise. * gcc.target/aarch64/movsi_2.c: Likewise. * gcc.target/aarch64/movtf_3.c: Likewise. * gcc.target/aarch64/movtf_4.c: Likewise. * gcc.target/aarch64/movti_3.c: Likewise. * gcc.target/aarch64/movti_4.c: Likewise. * gcc.target/aarch64/movv16qi_4.c: Likewise. * gcc.target/aarch64/movv16qi_5.c: Likewise. * gcc.target/aarch64/movv8qi_4.c: Likewise. * gcc.target/aarch64/sme/arm_neon_1.c: Likewise. * gcc.target/aarch64/sme/arm_neon_2.c: Likewise. * gcc.target/aarch64/sme/arm_neon_3.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 48 +++++------ gcc/config/aarch64/aarch64.cc | 16 ++-- gcc/config/aarch64/aarch64.h | 12 ++- gcc/config/aarch64/aarch64.md | 79 +++++++++-------- gcc/testsuite/gcc.target/aarch64/movdf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movdi_3.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movhf_2.c | 53 ++++++++++++ gcc/testsuite/gcc.target/aarch64/movhi_2.c | 61 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movqi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movsf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movsi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_3.c | 81 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_4.c | 78 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_3.c | 86 +++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_4.c | 83 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv8qi_4.c | 55 ++++++++++++ .../gcc.target/aarch64/sme/arm_neon_1.c | 13 +++ .../gcc.target/aarch64/sme/arm_neon_2.c | 11 +++ .../gcc.target/aarch64/sme/arm_neon_3.c | 11 +++ 21 files changed, 1060 insertions(+), 67 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c6f2d582837..f94ee74799e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -149,20 +149,20 @@ (define_insn_and_split "*aarch64_simd_mov" && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" {@ [cons: =0, 1; attrs: type, arch, length] - [w , m ; neon_load1_1reg , * , *] ldr\t%d0, %1 - [r , m ; load_8 , * , *] ldr\t%x0, %1 - [m , Dz; store_8 , * , *] str\txzr, %0 - [m , w ; neon_store1_1reg, * , *] str\t%d1, %0 - [m , r ; store_8 , * , *] str\t%x1, %0 - [w , w ; neon_logic , simd, *] mov\t%0., %1. - [w , w ; neon_logic , * , *] fmov\t%d0, %d1 - [?r, w ; neon_to_gp , simd, *] umov\t%0, %1.d[0] - [?r, w ; neon_to_gp , * , *] fmov\t%x0, %d1 - [?w, r ; f_mcr , * , *] fmov\t%d0, %1 - [?r, r ; mov_reg , * , *] mov\t%0, %1 - [w , Dn; neon_move , simd, *] << aarch64_output_simd_mov_immediate (operands[1], 64); - [w , Dz; f_mcr , * , *] fmov\t%d0, xzr - [w , Dx; neon_move , simd, 8] # + [w , m ; neon_load1_1reg , * , *] ldr\t%d0, %1 + [r , m ; load_8 , * , *] ldr\t%x0, %1 + [m , Dz; store_8 , * , *] str\txzr, %0 + [m , w ; neon_store1_1reg, * , *] str\t%d1, %0 + [m , r ; store_8 , * , *] str\t%x1, %0 + [w , w ; neon_logic , simd , *] mov\t%0., %1. + [w , w ; neon_logic , * , *] fmov\t%d0, %d1 + [?r, w ; neon_to_gp , base_simd, *] umov\t%0, %1.d[0] + [?r, w ; neon_to_gp , * , *] fmov\t%x0, %d1 + [?w, r ; f_mcr , * , *] fmov\t%d0, %1 + [?r, r ; mov_reg , * , *] mov\t%0, %1 + [w , Dn; neon_move , simd , *] << aarch64_output_simd_mov_immediate (operands[1], 64); + [w , Dz; f_mcr , * , *] fmov\t%d0, xzr + [w , Dx; neon_move , simd , 8] # } "CONST_INT_P (operands[1]) && aarch64_simd_special_constant_p (operands[1], mode) @@ -185,6 +185,7 @@ (define_insn_and_split "*aarch64_simd_mov" [Umn, Dz; store_16 , * , 4] stp\txzr, xzr, %0 [m , w ; neon_store1_1reg, * , 4] str\t%q1, %0 [w , w ; neon_logic , simd, 4] mov\t%0., %1. + [w , w ; * , sve , 4] mov\t%Z0.d, %Z1.d [?r , w ; multiple , * , 8] # [?w , r ; multiple , * , 8] # [?r , r ; multiple , * , 8] # @@ -225,7 +226,7 @@ (define_insn "aarch64_store_lane0" [(set (match_operand: 0 "memory_operand" "=m") (vec_select: (match_operand:VALL_F16 1 "register_operand" "w") (parallel [(match_operand 2 "const_int_operand" "n")])))] - "TARGET_SIMD + "TARGET_FLOAT && ENDIAN_LANE_N (, INTVAL (operands[2])) == 0" "str\\t%1, %0" [(set_attr "type" "neon_store1_1reg")] @@ -374,18 +375,18 @@ (define_insn_and_split "aarch64_simd_mov_from_low" (vec_select: (match_operand:VQMOV_NO2E 1 "register_operand") (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))] - "TARGET_SIMD" - {@ [ cons: =0 , 1 ; attrs: type ] - [ w , w ; mov_reg ] # - [ ?r , w ; neon_to_gp ] umov\t%0, %1.d[0] + "TARGET_FLOAT" + {@ [ cons: =0 , 1 ; attrs: type , arch ] + [ w , w ; mov_reg , simd ] # + [ ?r , w ; neon_to_gp , base_simd ] umov\t%0, %1.d[0] + [ ?r , w ; f_mrc , * ] fmov\t%0, %d1 } "&& reload_completed && aarch64_simd_register (operands[0], mode)" [(set (match_dup 0) (match_dup 1))] { operands[1] = aarch64_replace_reg_mode (operands[1], mode); } - [ - (set_attr "length" "4")] + [(set_attr "length" "4")] ) (define_insn "aarch64_simd_mov_from_high" @@ -396,12 +397,11 @@ (define_insn "aarch64_simd_mov_from_high" "TARGET_FLOAT" {@ [ cons: =0 , 1 ; attrs: type , arch ] [ w , w ; neon_dup , simd ] dup\t%d0, %1.d[1] + [ w , w ; * , sve ] ext\t%Z0.b, %Z0.b, %Z0.b, #8 [ ?r , w ; neon_to_gp , simd ] umov\t%0, %1.d[1] [ ?r , w ; f_mrc , * ] fmov\t%0, %1.d[1] } - [ - - (set_attr "length" "4")] + [(set_attr "length" "4")] ) (define_insn "orn3" diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fcaea87c737..af9f3876532 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -3774,7 +3774,7 @@ static bool aarch64_array_mode_supported_p (machine_mode mode, unsigned HOST_WIDE_INT nelems) { - if (TARGET_SIMD + if (TARGET_BASE_SIMD && (AARCH64_VALID_SIMD_QREG_MODE (mode) || AARCH64_VALID_SIMD_DREG_MODE (mode)) && (nelems >= 2 && nelems <= 4)) @@ -13171,8 +13171,8 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } - /* Without the TARGET_SIMD instructions we cannot move a Q register - to a Q register directly. We need a scratch. */ + /* Without the TARGET_SIMD or TARGET_SVE instructions we cannot move a + Q register to a Q register directly. We need a scratch. */ if (REG_P (x) && (mode == TFmode || mode == TImode @@ -15765,7 +15765,7 @@ aarch64_register_move_cost (machine_mode mode, secondary reload. A general register is used as a scratch to move the upper DI value and the lower DI value is moved directly, hence the cost is the sum of three moves. */ - if (! TARGET_SIMD) + if (!TARGET_SIMD && !TARGET_SVE) return regmove_cost->GP2FP + regmove_cost->FP2GP + regmove_cost->FP2FP; return regmove_cost->FP2FP; @@ -21374,7 +21374,7 @@ aarch64_simd_container_mode (scalar_mode mode, poly_int64 width) return aarch64_full_sve_mode (mode).else_mode (word_mode); gcc_assert (known_eq (width, 64) || known_eq (width, 128)); - if (TARGET_SIMD) + if (TARGET_BASE_SIMD) { if (known_eq (width, 128)) return aarch64_vq_mode (mode).else_mode (word_mode); @@ -25764,7 +25764,11 @@ aarch64_expand_cpymem (rtx *operands) int copy_bits = 256; /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD - support or slow 256-bit LDP/STP fall back to 128-bit chunks. */ + support or slow 256-bit LDP/STP fall back to 128-bit chunks. + + ??? Although it would be possible to use LDP/STP Qn in streaming mode + (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear + whether that would improve performance. */ if (size <= 24 || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index ded640e8c7b..687c1317b4f 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -61,8 +61,15 @@ #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN) /* AdvSIMD is supported in the default configuration, unless disabled by - -mgeneral-regs-only or by the +nosimd extension. */ -#define TARGET_SIMD (AARCH64_ISA_SIMD) + -mgeneral-regs-only or by the +nosimd extension. The set of available + instructions is then subdivided into: + + - the "base" set, available both in SME streaming mode and in + non-streaming mode + + - the full set, available only in non-streaming mode. */ +#define TARGET_BASE_SIMD (AARCH64_ISA_SIMD) +#define TARGET_SIMD (AARCH64_ISA_SIMD && AARCH64_ISA_SM_OFF) #define TARGET_FLOAT (AARCH64_ISA_FP) #define UNITS_PER_WORD 8 @@ -199,6 +206,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* Macros to test ISA flags. */ +#define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF) #define AARCH64_ISA_MODE (aarch64_isa_flags & AARCH64_FL_ISA_MODES) #define AARCH64_ISA_CRC (aarch64_isa_flags & AARCH64_FL_CRC) #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b4608d1c5e3..9585879a1b1 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -366,7 +366,8 @@ (define_constants ;; As a convenience, "fp_q" means "fp" + the ability to move between ;; Q registers and is equivalent to "simd". -(define_enum "arches" [ any rcpc8_4 fp fp_q simd nosimd sve fp16]) +(define_enum "arches" [any rcpc8_4 fp fp_q base_simd nobase_simd + simd nosimd sve fp16]) (define_enum_attr "arch" "arches" (const_string "any")) @@ -394,6 +395,12 @@ (define_attr "arch_enabled" "no,yes" (and (eq_attr "arch" "fp") (match_test "TARGET_FLOAT")) + (and (eq_attr "arch" "base_simd") + (match_test "TARGET_BASE_SIMD")) + + (and (eq_attr "arch" "nobase_simd") + (match_test "!TARGET_BASE_SIMD")) + (and (eq_attr "arch" "fp_q, simd") (match_test "TARGET_SIMD")) @@ -1224,23 +1231,23 @@ (define_insn "*mov_aarch64" "(register_operand (operands[0], mode) || aarch64_reg_or_zero (operands[1], mode))" {@ [cons: =0, 1; attrs: type, arch] - [w, Z ; neon_move , simd ] movi\t%0., #0 - [r, r ; mov_reg , * ] mov\t%w0, %w1 - [r, M ; mov_imm , * ] mov\t%w0, %1 - [w, D; neon_move , simd ] << aarch64_output_scalar_simd_mov_immediate (operands[1], mode); + [w, Z ; neon_move , simd ] movi\t%0., #0 + [r, r ; mov_reg , * ] mov\t%w0, %w1 + [r, M ; mov_imm , * ] mov\t%w0, %1 + [w, D; neon_move , simd ] << aarch64_output_scalar_simd_mov_immediate (operands[1], mode); /* The "mov_imm" type for CNT is just a placeholder. */ - [r, Usv ; mov_imm , sve ] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); - [r, Usr ; mov_imm , sve ] << aarch64_output_sve_rdvl (operands[1]); - [r, m ; load_4 , * ] ldr\t%w0, %1 - [w, m ; load_4 , * ] ldr\t%0, %1 - [m, r Z ; store_4 , * ] str\\t%w1, %0 - [m, w ; store_4 , * ] str\t%1, %0 - [r, w ; neon_to_gp , simd ] umov\t%w0, %1.[0] - [r, w ; neon_to_gp , nosimd] fmov\t%w0, %s1 - [w, r Z ; neon_from_gp, simd ] dup\t%0., %w1 - [w, r Z ; neon_from_gp, nosimd] fmov\t%s0, %w1 - [w, w ; neon_dup , simd ] dup\t%0, %1.[0] - [w, w ; neon_dup , nosimd] fmov\t%s0, %s1 + [r, Usv ; mov_imm , sve ] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r, Usr ; mov_imm , sve ] << aarch64_output_sve_rdvl (operands[1]); + [r, m ; load_4 , * ] ldr\t%w0, %1 + [w, m ; load_4 , * ] ldr\t%0, %1 + [m, r Z ; store_4 , * ] str\\t%w1, %0 + [m, w ; store_4 , * ] str\t%1, %0 + [r, w ; neon_to_gp , base_simd ] umov\t%w0, %1.[0] + [r, w ; neon_to_gp , nobase_simd] fmov\t%w0, %s1 + [w, r Z ; neon_from_gp, simd ] dup\t%0., %w1 + [w, r Z ; neon_from_gp, nosimd ] fmov\t%s0, %w1 + [w, w ; neon_dup , simd ] dup\t%0, %1.[0] + [w, w ; neon_dup , nosimd ] fmov\t%s0, %s1 } ) @@ -1405,9 +1412,9 @@ (define_expand "movti" (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "= r,w,w,w, r,w,r,m,m,w,m") + "nonimmediate_operand" "= r,w,w,w, r,w,w,r,m,m,w,m") (match_operand:TI 1 - "aarch64_movti_operand" " rUti,Z,Z,r, w,w,m,r,Z,m,w"))] + "aarch64_movti_operand" " rUti,Z,Z,r, w,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) || aarch64_reg_or_zero (operands[1], TImode))" "@ @@ -1417,16 +1424,17 @@ (define_insn "*movti_aarch64" # # mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0 ldr\\t%q0, %1 str\\t%q1, %0" - [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q, \ + [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q,*,\ load_16,store_16,store_16,\ load_16,store_16") - (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4") - (set_attr "arch" "*,simd,*,*,*,simd,*,*,*,fp,fp")] + (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "*,simd,*,*,*,simd,sve,*,*,*,fp,fp")] ) ;; Split a TImode register-register or register-immediate move into @@ -1553,13 +1561,14 @@ (define_split (define_insn "*mov_aarch64" [(set (match_operand:TFD 0 - "nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m") (match_operand:TFD 1 - "general_operand" " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], mode) || aarch64_reg_or_fp_zero (operands[1], mode))" "@ mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d # # # @@ -1570,10 +1579,10 @@ (define_insn "*mov_aarch64" ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0" - [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ + [(set_attr "type" "logic_reg,*,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ f_loadd,f_stored,load_16,store_16,store_16") - (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4") - (set_attr "arch" "simd,*,*,*,simd,*,*,*,*,*,*")] + (set_attr "length" "4,4,8,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "simd,sve,*,*,*,simd,*,*,*,*,*,*")] ) (define_split @@ -1767,7 +1776,7 @@ (define_insn "load_pair_dw_" (match_operand:TX 1 "aarch64_mem_pair_operand" "Ump")) (set (match_operand:TX2 2 "register_operand" "=w") (match_operand:TX2 3 "memory_operand" "m"))] - "TARGET_SIMD + "TARGET_BASE_SIMD && rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), @@ -1815,11 +1824,11 @@ (define_insn "store_pair_dw_" (match_operand:TX 1 "register_operand" "w")) (set (match_operand:TX2 2 "memory_operand" "=m") (match_operand:TX2 3 "register_operand" "w"))] - "TARGET_SIMD && - rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (TFmode)))" + "TARGET_BASE_SIMD + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[0], 0), + GET_MODE_SIZE (TFmode)))" "stp\\t%q1, %q3, %z0" [(set_attr "type" "neon_stp_q") (set_attr "fp" "yes")] @@ -1867,7 +1876,7 @@ (define_insn "loadwb_pair_" (set (match_operand:TX 3 "register_operand" "=w") (mem:TX (plus:P (match_dup 1) (match_operand:P 5 "const_int_operand" "n"))))])] - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" + "TARGET_BASE_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" "ldp\\t%q2, %q3, [%1], %4" [(set_attr "type" "neon_ldp_q")] ) @@ -1917,7 +1926,7 @@ (define_insn "storewb_pair_" (set (mem:TX (plus:P (match_dup 0) (match_operand:P 5 "const_int_operand" "n"))) (match_operand:TX 3 "register_operand" "w"))])] - "TARGET_SIMD + "TARGET_BASE_SIMD && INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (mode)" "stp\\t%q2, %q3, [%0, %4]!" diff --git a/gcc/testsuite/gcc.target/aarch64/movdf_2.c b/gcc/testsuite/gcc.target/aarch64/movdf_2.c new file mode 100644 index 00000000000..0d459d31760 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +double +fpr_to_fpr (double q0, double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +void +fpr_to_gpr (double q0) [[arm::streaming_compatible]] +{ + register double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_3.c b/gcc/testsuite/gcc.target/aarch64/movdi_3.c new file mode 100644 index 00000000000..31b2cbbaeb0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_3.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + register uint64_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +void +gpr_to_fpr (uint64_t x0) [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +uint64_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhf_2.c b/gcc/testsuite/gcc.target/aarch64/movhf_2.c new file mode 100644 index 00000000000..3292b0de8d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhf_2.c @@ -0,0 +1,53 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +_Float16 +fpr_to_fpr (_Float16 q0, _Float16 q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +_Float16 +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register _Float16 w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +_Float16 +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void +fpr_to_gpr (_Float16 q0) [[arm::streaming_compatible]] +{ + register _Float16 w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhi_2.c b/gcc/testsuite/gcc.target/aarch64/movhi_2.c new file mode 100644 index 00000000000..dbbf3486f58 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhi_2.c @@ -0,0 +1,61 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + register uint16_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint16_t w0) [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.h\[0\] +** ret +*/ +uint16_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movqi_2.c b/gcc/testsuite/gcc.target/aarch64/movqi_2.c new file mode 100644 index 00000000000..aec087e4e2c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movqi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + register uint8_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint8_t w0) [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.b\[0\] +** ret +*/ +uint8_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsf_2.c b/gcc/testsuite/gcc.target/aarch64/movsf_2.c new file mode 100644 index 00000000000..7fed4b22f7a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +float +fpr_to_fpr (float q0, float q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +float +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register float w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +float +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void +fpr_to_gpr (float q0) [[arm::streaming_compatible]] +{ + register float w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsi_2.c b/gcc/testsuite/gcc.target/aarch64/movsi_2.c new file mode 100644 index 00000000000..c14d2468af3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + register uint32_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint32_t w0) [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +uint32_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_3.c b/gcc/testsuite/gcc.target/aarch64/movtf_3.c new file mode 100644 index 00000000000..dd164a41855 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_3.c @@ -0,0 +1,81 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +long double +fpr_to_fpr (long double q0, long double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void +fpr_to_gpr (long double q0) [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_4.c b/gcc/testsuite/gcc.target/aarch64/movtf_4.c new file mode 100644 index 00000000000..faf9703e2b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_4.c @@ -0,0 +1,78 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +long double +fpr_to_fpr (long double q0, long double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void +fpr_to_gpr (long double q0) [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_3.c b/gcc/testsuite/gcc.target/aarch64/movti_3.c new file mode 100644 index 00000000000..243109181d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_3.c @@ -0,0 +1,86 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void +gpr_to_fpr (__int128_t x0) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_4.c b/gcc/testsuite/gcc.target/aarch64/movti_4.c new file mode 100644 index 00000000000..a70feccb0e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_4.c @@ -0,0 +1,83 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0\.d, z1\.d +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void +gpr_to_fpr (__int128_t x0) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c new file mode 100644 index 00000000000..7bec888b71d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c @@ -0,0 +1,82 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +v16qi +fpr_to_fpr (v16qi q0, v16qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void +fpr_to_gpr (v16qi q0) [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c new file mode 100644 index 00000000000..2d36342b3f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c @@ -0,0 +1,79 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +v16qi +fpr_to_fpr (v16qi q0, v16qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void +fpr_to_gpr (v16qi q0) [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c new file mode 100644 index 00000000000..12ae25a3a4a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c @@ -0,0 +1,55 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v8qi __attribute__((vector_size(8))); + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +v8qi +fpr_to_fpr (v8qi q0, v8qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +v8qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v8qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v8qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v8qi) {}; +} + +/* +** fpr_to_gpr: +** umov x0, v0\.d\[0\] +** ret +*/ +void +fpr_to_gpr (v8qi q0) [[arm::streaming_compatible]] +{ + register v8qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c new file mode 100644 index 00000000000..5b5346cf435 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c @@ -0,0 +1,13 @@ +// { dg-options "" } + +#include + +#pragma GCC target "+nosme" + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming_compatible]] +{ + return vhaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c new file mode 100644 index 00000000000..2092c4471f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming_compatible]] +{ + return vhaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c new file mode 100644 index 00000000000..36794e5b0df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming]] +{ + return vhaddq_s32 (x, y); +} From patchwork Fri Nov 17 17:26:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865159 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3jg46qjz1yS7 for ; Sat, 18 Nov 2023 04:26:47 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A83E53858034 for ; Fri, 17 Nov 2023 17:26:44 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 30C51385AC31 for ; Fri, 17 Nov 2023 17:26:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 30C51385AC31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 30C51385AC31 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241981; cv=none; b=Z8I3m1EVRH8rQtnWf4aIbPpY5PRAYXHcV7b5sc2OKb6Kt/A5P+3Qau5DNjIOk98cegIQoh3MyUfTnpVTKlV2Jq44Gx6OiEjVTXLTwvMvh5WVyn+DGZWnGo9VUwC9FbmUyiN1QDl8tWfh707RXj/o5OCa/XL7pzkq49zIuXsfdK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241981; c=relaxed/simple; bh=84Kk5bUb065IucIoZD74qU1Y/IxQ2ILXhjORhrNliII=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Ff3sFf290y1XtK5v+P44A+TBGV3XfNkQmCz32cS/OCNYuDYfDSNcR05/7YHGgXJYtsRsmdkEYMI/oxIpkXWO7ER5TD11wgTURIOuW6LvG8WkvwnywQkKdm+T4nORq8WtAENVXwq1zt6IvClBHCB6IBbSORYDO9LZoUeG9E3zRDw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F28011477 for ; Fri, 17 Nov 2023 09:27:00 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D31473F73F for ; Fri, 17 Nov 2023 09:26:13 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 10/21] aarch64: Mark relevant SVE instructions as non-streaming References: Date: Fri, 17 Nov 2023 17:26:12 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Following on from the previous Advanced SIMD patch, this one divides SVE instructions into non-streaming and streaming- compatible groups. gcc/ * config/aarch64/aarch64.h (TARGET_NON_STREAMING): New macro. (TARGET_SVE2_AES, TARGET_SVE2_BITPERM): Use it. (TARGET_SVE2_SHA3, TARGET_SVE2_SM4): Likewise. * config/aarch64/aarch64-sve-builtins-base.def: Separate out the functions that require PSTATE.SM to be 0 and guard them with AARCH64_FL_SM_OFF. * config/aarch64/aarch64-sve-builtins-sve2.def: Likewise. * config/aarch64/aarch64-sve-builtins.cc (check_required_extensions): Enforce AARCH64_FL_SM_OFF requirements. * config/aarch64/aarch64-sve.md (aarch64_wrffr): Require TARGET_NON_STREAMING (aarch64_rdffr, aarch64_rdffr_z, *aarch64_rdffr_z_ptest): Likewise. (*aarch64_rdffr_ptest, *aarch64_rdffr_z_cc, *aarch64_rdffr_cc) (@aarch64_ldf1): Likewise. (@aarch64_ldf1_) (gather_load): Likewise (mask_gather_load): Likewise. (mask_gather_load): Likewise. (*mask_gather_load_xtw_unpacked): Likewise. (*mask_gather_load_sxtw): Likewise. (*mask_gather_load_uxtw): Likewise. (@aarch64_gather_load_) (@aarch64_gather_load_ ): Likewise. (*aarch64_gather_load_ _xtw_unpacked) (*aarch64_gather_load_ _sxtw): Likewise. (*aarch64_gather_load_ _uxtw): Likewise. (@aarch64_ldff1_gather, @aarch64_ldff1_gather): Likewise. (*aarch64_ldff1_gather_sxtw): Likewise. (*aarch64_ldff1_gather_uxtw): Likewise. (@aarch64_ldff1_gather_ ): Likewise. (@aarch64_ldff1_gather_ ): Likewise. (*aarch64_ldff1_gather_ _sxtw): Likewise. (*aarch64_ldff1_gather_ _uxtw): Likewise. (@aarch64_sve_gather_prefetch) (@aarch64_sve_gather_prefetch) (*aarch64_sve_gather_prefetch_sxtw) (*aarch64_sve_gather_prefetch_uxtw) (scatter_store): Likewise. (mask_scatter_store): Likewise. (*mask_scatter_store_xtw_unpacked) (*mask_scatter_store_sxtw): Likewise. (*mask_scatter_store_uxtw): Likewise. (@aarch64_scatter_store_trunc) (@aarch64_scatter_store_trunc) (*aarch64_scatter_store_trunc_sxtw) (*aarch64_scatter_store_trunc_uxtw) (@aarch64_sve_ld1ro, @aarch64_adr): Likewise. (*aarch64_adr_sxtw, *aarch64_adr_uxtw_unspec): Likewise. (*aarch64_adr_uxtw_and, @aarch64_adr_shift): Likewise. (*aarch64_adr_shift, *aarch64_adr_shift_sxtw): Likewise. (*aarch64_adr_shift_uxtw, @aarch64_sve_add_): Likewise. (@aarch64_sve_, fold_left_plus_): Likewise. (mask_fold_left_plus_, @aarch64_sve_compact): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt) (@aarch64_gather_ldnt_ ): Likewise. (@aarch64_sve2_histcnt, @aarch64_sve2_histseg): Likewise. (@aarch64_pred_): Likewise. (*aarch64_pred__cc): Likewise. (*aarch64_pred__ptest): Likewise. * config/aarch64/iterators.md (SVE_FP_UNARY_INT): Make FEXPA depend on TARGET_NON_STREAMING. (SVE_BFLOAT_TERNARY_LONG): Likewise BFMMLA. gcc/testsuite/ * g++.target/aarch64/sve/aarch64-ssve.exp: New harness. * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Add -DSTREAMING_COMPATIBLE to the list of options. * g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise. Fix pasto in variable name. * gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Mark functions as streaming-compatible if STREAMING_COMPATIBLE is defined. * gcc.target/aarch64/sve/acle/asm/adda_f16.c: Disable for streaming-compatible code. * gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrb.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrd.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrh.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adrw.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/compact_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/expa_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mmla_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfb_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfd_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfh_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/prfw_gather.c: Likewise. * gcc.target/aarch64/sve/acle/asm/rdffr_1.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tmad_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tsmul_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/tssel_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/usmmla_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bdep_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bext_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histseg_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/histseg_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/match_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rax1_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/rax1_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c: Likewise. --- .../aarch64/aarch64-sve-builtins-base.def | 158 +++++---- .../aarch64/aarch64-sve-builtins-sve2.def | 63 ++-- gcc/config/aarch64/aarch64-sve-builtins.cc | 7 + gcc/config/aarch64/aarch64-sve.md | 124 +++---- gcc/config/aarch64/aarch64-sve2.md | 14 +- gcc/config/aarch64/aarch64.h | 11 +- gcc/config/aarch64/iterators.md | 4 +- .../g++.target/aarch64/sve/aarch64-ssve.exp | 308 ++++++++++++++++++ .../aarch64/sve/acle/aarch64-sve-acle-asm.exp | 1 + .../sve2/acle/aarch64-sve2-acle-asm.exp | 1 + .../aarch64/sve/acle/aarch64-sve-acle-asm.exp | 1 + .../aarch64/sve/acle/asm/adda_f16.c | 1 + .../aarch64/sve/acle/asm/adda_f32.c | 1 + .../aarch64/sve/acle/asm/adda_f64.c | 1 + .../gcc.target/aarch64/sve/acle/asm/adrb.c | 1 + .../gcc.target/aarch64/sve/acle/asm/adrd.c | 1 + .../gcc.target/aarch64/sve/acle/asm/adrh.c | 1 + .../gcc.target/aarch64/sve/acle/asm/adrw.c | 1 + .../aarch64/sve/acle/asm/bfmmla_f32.c | 1 + .../aarch64/sve/acle/asm/compact_f32.c | 1 + .../aarch64/sve/acle/asm/compact_f64.c | 1 + .../aarch64/sve/acle/asm/compact_s32.c | 1 + .../aarch64/sve/acle/asm/compact_s64.c | 1 + .../aarch64/sve/acle/asm/compact_u32.c | 1 + .../aarch64/sve/acle/asm/compact_u64.c | 1 + .../aarch64/sve/acle/asm/expa_f16.c | 1 + .../aarch64/sve/acle/asm/expa_f32.c | 1 + .../aarch64/sve/acle/asm/expa_f64.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_f32.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_f64.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ld1_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1ro_bf16.c | 1 + .../aarch64/sve/acle/asm/ld1ro_f16.c | 1 + .../aarch64/sve/acle/asm/ld1ro_f32.c | 1 + .../aarch64/sve/acle/asm/ld1ro_f64.c | 1 + .../aarch64/sve/acle/asm/ld1ro_s16.c | 1 + .../aarch64/sve/acle/asm/ld1ro_s32.c | 1 + .../aarch64/sve/acle/asm/ld1ro_s64.c | 1 + .../aarch64/sve/acle/asm/ld1ro_s8.c | 1 + .../aarch64/sve/acle/asm/ld1ro_u16.c | 1 + .../aarch64/sve/acle/asm/ld1ro_u32.c | 1 + .../aarch64/sve/acle/asm/ld1ro_u64.c | 1 + .../aarch64/sve/acle/asm/ld1ro_u8.c | 1 + .../aarch64/sve/acle/asm/ld1sb_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ld1sb_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1sb_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ld1sb_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1sh_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ld1sh_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1sh_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ld1sh_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1sw_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1sw_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1ub_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ld1ub_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1ub_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ld1ub_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1uh_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ld1uh_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1uh_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ld1uh_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ld1uw_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ld1uw_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1_bf16.c | 1 + .../aarch64/sve/acle/asm/ldff1_f16.c | 1 + .../aarch64/sve/acle/asm/ldff1_f32.c | 1 + .../aarch64/sve/acle/asm/ldff1_f64.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_f32.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_f64.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1_s16.c | 1 + .../aarch64/sve/acle/asm/ldff1_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1_s8.c | 1 + .../aarch64/sve/acle/asm/ldff1_u16.c | 1 + .../aarch64/sve/acle/asm/ldff1_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1_u8.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_s16.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_u16.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1sb_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1sh_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1sw_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sw_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1sw_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1sw_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_s16.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_u16.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1ub_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_gather_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_gather_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_s32.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_u32.c | 1 + .../aarch64/sve/acle/asm/ldff1uh_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1uw_gather_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1uw_gather_u64.c | 1 + .../aarch64/sve/acle/asm/ldff1uw_s64.c | 1 + .../aarch64/sve/acle/asm/ldff1uw_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1_bf16.c | 1 + .../aarch64/sve/acle/asm/ldnf1_f16.c | 1 + .../aarch64/sve/acle/asm/ldnf1_f32.c | 1 + .../aarch64/sve/acle/asm/ldnf1_f64.c | 1 + .../aarch64/sve/acle/asm/ldnf1_s16.c | 1 + .../aarch64/sve/acle/asm/ldnf1_s32.c | 1 + .../aarch64/sve/acle/asm/ldnf1_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1_s8.c | 1 + .../aarch64/sve/acle/asm/ldnf1_u16.c | 1 + .../aarch64/sve/acle/asm/ldnf1_u32.c | 1 + .../aarch64/sve/acle/asm/ldnf1_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1_u8.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_s16.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_s32.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_u16.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_u32.c | 1 + .../aarch64/sve/acle/asm/ldnf1sb_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1sh_s32.c | 1 + .../aarch64/sve/acle/asm/ldnf1sh_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1sh_u32.c | 1 + .../aarch64/sve/acle/asm/ldnf1sh_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1sw_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1sw_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_s16.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_s32.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_u16.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_u32.c | 1 + .../aarch64/sve/acle/asm/ldnf1ub_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1uh_s32.c | 1 + .../aarch64/sve/acle/asm/ldnf1uh_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1uh_u32.c | 1 + .../aarch64/sve/acle/asm/ldnf1uh_u64.c | 1 + .../aarch64/sve/acle/asm/ldnf1uw_s64.c | 1 + .../aarch64/sve/acle/asm/ldnf1uw_u64.c | 1 + .../aarch64/sve/acle/asm/mmla_f32.c | 1 + .../aarch64/sve/acle/asm/mmla_f64.c | 1 + .../aarch64/sve/acle/asm/mmla_s32.c | 1 + .../aarch64/sve/acle/asm/mmla_u32.c | 1 + .../aarch64/sve/acle/asm/prfb_gather.c | 1 + .../aarch64/sve/acle/asm/prfd_gather.c | 1 + .../aarch64/sve/acle/asm/prfh_gather.c | 1 + .../aarch64/sve/acle/asm/prfw_gather.c | 1 + .../gcc.target/aarch64/sve/acle/asm/rdffr_1.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_f32.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_f64.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_s32.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_s64.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_u32.c | 1 + .../aarch64/sve/acle/asm/st1_scatter_u64.c | 1 + .../aarch64/sve/acle/asm/st1b_scatter_s32.c | 1 + .../aarch64/sve/acle/asm/st1b_scatter_s64.c | 1 + .../aarch64/sve/acle/asm/st1b_scatter_u32.c | 1 + .../aarch64/sve/acle/asm/st1b_scatter_u64.c | 1 + .../aarch64/sve/acle/asm/st1h_scatter_s32.c | 1 + .../aarch64/sve/acle/asm/st1h_scatter_s64.c | 1 + .../aarch64/sve/acle/asm/st1h_scatter_u32.c | 1 + .../aarch64/sve/acle/asm/st1h_scatter_u64.c | 1 + .../aarch64/sve/acle/asm/st1w_scatter_s64.c | 1 + .../aarch64/sve/acle/asm/st1w_scatter_u64.c | 1 + .../aarch64/sve/acle/asm/test_sve_acle.h | 11 +- .../aarch64/sve/acle/asm/tmad_f16.c | 1 + .../aarch64/sve/acle/asm/tmad_f32.c | 1 + .../aarch64/sve/acle/asm/tmad_f64.c | 1 + .../aarch64/sve/acle/asm/tsmul_f16.c | 1 + .../aarch64/sve/acle/asm/tsmul_f32.c | 1 + .../aarch64/sve/acle/asm/tsmul_f64.c | 1 + .../aarch64/sve/acle/asm/tssel_f16.c | 1 + .../aarch64/sve/acle/asm/tssel_f32.c | 1 + .../aarch64/sve/acle/asm/tssel_f64.c | 1 + .../aarch64/sve/acle/asm/usmmla_s32.c | 1 + .../sve2/acle/aarch64-sve2-acle-asm.exp | 1 + .../aarch64/sve2/acle/asm/aesd_u8.c | 1 + .../aarch64/sve2/acle/asm/aese_u8.c | 1 + .../aarch64/sve2/acle/asm/aesimc_u8.c | 1 + .../aarch64/sve2/acle/asm/aesmc_u8.c | 1 + .../aarch64/sve2/acle/asm/bdep_u16.c | 1 + .../aarch64/sve2/acle/asm/bdep_u32.c | 1 + .../aarch64/sve2/acle/asm/bdep_u64.c | 1 + .../aarch64/sve2/acle/asm/bdep_u8.c | 1 + .../aarch64/sve2/acle/asm/bext_u16.c | 1 + .../aarch64/sve2/acle/asm/bext_u32.c | 1 + .../aarch64/sve2/acle/asm/bext_u64.c | 1 + .../aarch64/sve2/acle/asm/bext_u8.c | 1 + .../aarch64/sve2/acle/asm/bgrp_u16.c | 1 + .../aarch64/sve2/acle/asm/bgrp_u32.c | 1 + .../aarch64/sve2/acle/asm/bgrp_u64.c | 1 + .../aarch64/sve2/acle/asm/bgrp_u8.c | 1 + .../aarch64/sve2/acle/asm/histcnt_s32.c | 1 + .../aarch64/sve2/acle/asm/histcnt_s64.c | 1 + .../aarch64/sve2/acle/asm/histcnt_u32.c | 1 + .../aarch64/sve2/acle/asm/histcnt_u64.c | 1 + .../aarch64/sve2/acle/asm/histseg_s8.c | 1 + .../aarch64/sve2/acle/asm/histseg_u8.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_f32.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_f64.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_s32.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_s64.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_u32.c | 1 + .../aarch64/sve2/acle/asm/ldnt1_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1sb_gather_s32.c | 1 + .../sve2/acle/asm/ldnt1sb_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1sb_gather_u32.c | 1 + .../sve2/acle/asm/ldnt1sb_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1sh_gather_s32.c | 1 + .../sve2/acle/asm/ldnt1sh_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1sh_gather_u32.c | 1 + .../sve2/acle/asm/ldnt1sh_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1sw_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1sw_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1ub_gather_s32.c | 1 + .../sve2/acle/asm/ldnt1ub_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1ub_gather_u32.c | 1 + .../sve2/acle/asm/ldnt1ub_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1uh_gather_s32.c | 1 + .../sve2/acle/asm/ldnt1uh_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1uh_gather_u32.c | 1 + .../sve2/acle/asm/ldnt1uh_gather_u64.c | 1 + .../sve2/acle/asm/ldnt1uw_gather_s64.c | 1 + .../sve2/acle/asm/ldnt1uw_gather_u64.c | 1 + .../aarch64/sve2/acle/asm/match_s16.c | 1 + .../aarch64/sve2/acle/asm/match_s8.c | 1 + .../aarch64/sve2/acle/asm/match_u16.c | 1 + .../aarch64/sve2/acle/asm/match_u8.c | 1 + .../aarch64/sve2/acle/asm/nmatch_s16.c | 1 + .../aarch64/sve2/acle/asm/nmatch_s8.c | 1 + .../aarch64/sve2/acle/asm/nmatch_u16.c | 1 + .../aarch64/sve2/acle/asm/nmatch_u8.c | 1 + .../aarch64/sve2/acle/asm/pmullb_pair_u64.c | 1 + .../aarch64/sve2/acle/asm/pmullt_pair_u64.c | 1 + .../aarch64/sve2/acle/asm/rax1_s64.c | 1 + .../aarch64/sve2/acle/asm/rax1_u64.c | 1 + .../aarch64/sve2/acle/asm/sm4e_u32.c | 1 + .../aarch64/sve2/acle/asm/sm4ekey_u32.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_f32.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_f64.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_s32.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_s64.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_u32.c | 1 + .../aarch64/sve2/acle/asm/stnt1_scatter_u64.c | 1 + .../sve2/acle/asm/stnt1b_scatter_s32.c | 1 + .../sve2/acle/asm/stnt1b_scatter_s64.c | 1 + .../sve2/acle/asm/stnt1b_scatter_u32.c | 1 + .../sve2/acle/asm/stnt1b_scatter_u64.c | 1 + .../sve2/acle/asm/stnt1h_scatter_s32.c | 1 + .../sve2/acle/asm/stnt1h_scatter_s64.c | 1 + .../sve2/acle/asm/stnt1h_scatter_u32.c | 1 + .../sve2/acle/asm/stnt1h_scatter_u64.c | 1 + .../sve2/acle/asm/stnt1w_scatter_s64.c | 1 + .../sve2/acle/asm/stnt1w_scatter_u64.c | 1 + 279 files changed, 805 insertions(+), 165 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def index 4e31f67ac47..ac53f35220d 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.def +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def @@ -25,12 +25,7 @@ DEF_SVE_FUNCTION (svacgt, compare_opt_n, all_float, implicit) DEF_SVE_FUNCTION (svacle, compare_opt_n, all_float, implicit) DEF_SVE_FUNCTION (svaclt, compare_opt_n, all_float, implicit) DEF_SVE_FUNCTION (svadd, binary_opt_n, all_arith, mxz) -DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit) DEF_SVE_FUNCTION (svaddv, reduction_wide, all_arith, implicit) -DEF_SVE_FUNCTION (svadrb, adr_offset, none, none) -DEF_SVE_FUNCTION (svadrd, adr_index, none, none) -DEF_SVE_FUNCTION (svadrh, adr_index, none, none) -DEF_SVE_FUNCTION (svadrw, adr_index, none, none) DEF_SVE_FUNCTION (svand, binary_opt_n, all_integer, mxz) DEF_SVE_FUNCTION (svand, binary_opt_n, b, z) DEF_SVE_FUNCTION (svandv, reduction, all_integer, implicit) @@ -75,7 +70,6 @@ DEF_SVE_FUNCTION (svcnth_pat, count_pat, none, none) DEF_SVE_FUNCTION (svcntp, count_pred, all_pred, implicit) DEF_SVE_FUNCTION (svcntw, count_inherent, none, none) DEF_SVE_FUNCTION (svcntw_pat, count_pat, none, none) -DEF_SVE_FUNCTION (svcompact, unary, sd_data, implicit) DEF_SVE_FUNCTION (svcreate2, create, all_data, none) DEF_SVE_FUNCTION (svcreate3, create, all_data, none) DEF_SVE_FUNCTION (svcreate4, create, all_data, none) @@ -93,7 +87,6 @@ DEF_SVE_FUNCTION (svdupq_lane, binary_uint64_n, all_data, none) DEF_SVE_FUNCTION (sveor, binary_opt_n, all_integer, mxz) DEF_SVE_FUNCTION (sveor, binary_opt_n, b, z) DEF_SVE_FUNCTION (sveorv, reduction, all_integer, implicit) -DEF_SVE_FUNCTION (svexpa, unary_uint, all_float, none) DEF_SVE_FUNCTION (svext, ext, all_data, none) DEF_SVE_FUNCTION (svextb, unary, hsd_integer, mxz) DEF_SVE_FUNCTION (svexth, unary, sd_integer, mxz) @@ -106,51 +99,13 @@ DEF_SVE_FUNCTION (svinsr, binary_n, all_data, none) DEF_SVE_FUNCTION (svlasta, reduction, all_data, implicit) DEF_SVE_FUNCTION (svlastb, reduction, all_data, implicit) DEF_SVE_FUNCTION (svld1, load, all_data, implicit) -DEF_SVE_FUNCTION (svld1_gather, load_gather_sv, sd_data, implicit) -DEF_SVE_FUNCTION (svld1_gather, load_gather_vs, sd_data, implicit) DEF_SVE_FUNCTION (svld1rq, load_replicate, all_data, implicit) DEF_SVE_FUNCTION (svld1sb, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svld1sb_gather, load_ext_gather_offset, sd_integer, implicit) DEF_SVE_FUNCTION (svld1sh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_index, sd_integer, implicit) DEF_SVE_FUNCTION (svld1sw, load_ext, d_integer, implicit) -DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_offset, d_integer, implicit) -DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_index, d_integer, implicit) DEF_SVE_FUNCTION (svld1ub, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svld1ub_gather, load_ext_gather_offset, sd_integer, implicit) DEF_SVE_FUNCTION (svld1uh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_index, sd_integer, implicit) DEF_SVE_FUNCTION (svld1uw, load_ext, d_integer, implicit) -DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_offset, d_integer, implicit) -DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_index, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1, load, all_data, implicit) -DEF_SVE_FUNCTION (svldff1_gather, load_gather_sv, sd_data, implicit) -DEF_SVE_FUNCTION (svldff1_gather, load_gather_vs, sd_data, implicit) -DEF_SVE_FUNCTION (svldff1sb, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svldff1sb_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1sh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_index, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1sw, load_ext, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_offset, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_index, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1ub, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svldff1ub_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1uh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_offset, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_index, sd_integer, implicit) -DEF_SVE_FUNCTION (svldff1uw, load_ext, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_offset, d_integer, implicit) -DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_index, d_integer, implicit) -DEF_SVE_FUNCTION (svldnf1, load, all_data, implicit) -DEF_SVE_FUNCTION (svldnf1sb, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svldnf1sh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnf1sw, load_ext, d_integer, implicit) -DEF_SVE_FUNCTION (svldnf1ub, load_ext, hsd_integer, implicit) -DEF_SVE_FUNCTION (svldnf1uh, load_ext, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnf1uw, load_ext, d_integer, implicit) DEF_SVE_FUNCTION (svldnt1, load, all_data, implicit) DEF_SVE_FUNCTION (svld2, load, all_data, implicit) DEF_SVE_FUNCTION (svld3, load, all_data, implicit) @@ -173,7 +128,6 @@ DEF_SVE_FUNCTION (svmla, ternary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svmla_lane, ternary_lane, all_float, none) DEF_SVE_FUNCTION (svmls, ternary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svmls_lane, ternary_lane, all_float, none) -DEF_SVE_FUNCTION (svmmla, mmla, none, none) DEF_SVE_FUNCTION (svmov, unary, b, z) DEF_SVE_FUNCTION (svmsb, ternary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svmul, binary_opt_n, all_arith, mxz) @@ -197,13 +151,9 @@ DEF_SVE_FUNCTION (svpfalse, inherent_b, b, none) DEF_SVE_FUNCTION (svpfirst, unary, b, implicit) DEF_SVE_FUNCTION (svpnext, unary_pred, all_pred, implicit) DEF_SVE_FUNCTION (svprfb, prefetch, none, implicit) -DEF_SVE_FUNCTION (svprfb_gather, prefetch_gather_offset, none, implicit) DEF_SVE_FUNCTION (svprfd, prefetch, none, implicit) -DEF_SVE_FUNCTION (svprfd_gather, prefetch_gather_index, none, implicit) DEF_SVE_FUNCTION (svprfh, prefetch, none, implicit) -DEF_SVE_FUNCTION (svprfh_gather, prefetch_gather_index, none, implicit) DEF_SVE_FUNCTION (svprfw, prefetch, none, implicit) -DEF_SVE_FUNCTION (svprfw_gather, prefetch_gather_index, none, implicit) DEF_SVE_FUNCTION (svptest_any, ptest, none, implicit) DEF_SVE_FUNCTION (svptest_first, ptest, none, implicit) DEF_SVE_FUNCTION (svptest_last, ptest, none, implicit) @@ -244,7 +194,6 @@ DEF_SVE_FUNCTION (svqincw_pat, inc_dec_pat, s_integer, none) DEF_SVE_FUNCTION (svqincw_pat, inc_dec_pat, sd_integer, none) DEF_SVE_FUNCTION (svqsub, binary_opt_n, all_integer, none) DEF_SVE_FUNCTION (svrbit, unary, all_integer, mxz) -DEF_SVE_FUNCTION (svrdffr, rdffr, none, z_or_none) DEF_SVE_FUNCTION (svrecpe, unary, all_float, none) DEF_SVE_FUNCTION (svrecps, binary, all_float, none) DEF_SVE_FUNCTION (svrecpx, unary, all_float, mxz) @@ -269,20 +218,12 @@ DEF_SVE_FUNCTION (svsel, binary, b, implicit) DEF_SVE_FUNCTION (svset2, set, all_data, none) DEF_SVE_FUNCTION (svset3, set, all_data, none) DEF_SVE_FUNCTION (svset4, set, all_data, none) -DEF_SVE_FUNCTION (svsetffr, setffr, none, none) DEF_SVE_FUNCTION (svsplice, binary, all_data, implicit) DEF_SVE_FUNCTION (svsqrt, unary, all_float, mxz) DEF_SVE_FUNCTION (svst1, store, all_data, implicit) -DEF_SVE_FUNCTION (svst1_scatter, store_scatter_index, sd_data, implicit) -DEF_SVE_FUNCTION (svst1_scatter, store_scatter_offset, sd_data, implicit) DEF_SVE_FUNCTION (svst1b, store, hsd_integer, implicit) -DEF_SVE_FUNCTION (svst1b_scatter, store_scatter_offset, sd_integer, implicit) DEF_SVE_FUNCTION (svst1h, store, sd_integer, implicit) -DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_index, sd_integer, implicit) -DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_offset, sd_integer, implicit) DEF_SVE_FUNCTION (svst1w, store, d_integer, implicit) -DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_index, d_integer, implicit) -DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_offset, d_integer, implicit) DEF_SVE_FUNCTION (svst2, store, all_data, implicit) DEF_SVE_FUNCTION (svst3, store, all_data, implicit) DEF_SVE_FUNCTION (svst4, store, all_data, implicit) @@ -290,13 +231,10 @@ DEF_SVE_FUNCTION (svstnt1, store, all_data, implicit) DEF_SVE_FUNCTION (svsub, binary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svsubr, binary_opt_n, all_arith, mxz) DEF_SVE_FUNCTION (svtbl, binary_uint, all_data, none) -DEF_SVE_FUNCTION (svtmad, tmad, all_float, none) DEF_SVE_FUNCTION (svtrn1, binary, all_data, none) DEF_SVE_FUNCTION (svtrn1, binary_pred, all_pred, none) DEF_SVE_FUNCTION (svtrn2, binary, all_data, none) DEF_SVE_FUNCTION (svtrn2, binary_pred, all_pred, none) -DEF_SVE_FUNCTION (svtsmul, binary_uint, all_float, none) -DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none) DEF_SVE_FUNCTION (svundef, inherent, all_data, none) DEF_SVE_FUNCTION (svundef2, inherent, all_data, none) DEF_SVE_FUNCTION (svundef3, inherent, all_data, none) @@ -311,13 +249,78 @@ DEF_SVE_FUNCTION (svuzp2, binary, all_data, none) DEF_SVE_FUNCTION (svuzp2, binary_pred, all_pred, none) DEF_SVE_FUNCTION (svwhilele, compare_scalar, while, none) DEF_SVE_FUNCTION (svwhilelt, compare_scalar, while, none) -DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit) DEF_SVE_FUNCTION (svzip1, binary, all_data, none) DEF_SVE_FUNCTION (svzip1, binary_pred, all_pred, none) DEF_SVE_FUNCTION (svzip2, binary, all_data, none) DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none) #undef REQUIRED_EXTENSIONS +#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SM_OFF +DEF_SVE_FUNCTION (svadda, fold_left, all_float, implicit) +DEF_SVE_FUNCTION (svadrb, adr_offset, none, none) +DEF_SVE_FUNCTION (svadrd, adr_index, none, none) +DEF_SVE_FUNCTION (svadrh, adr_index, none, none) +DEF_SVE_FUNCTION (svadrw, adr_index, none, none) +DEF_SVE_FUNCTION (svcompact, unary, sd_data, implicit) +DEF_SVE_FUNCTION (svexpa, unary_uint, all_float, none) +DEF_SVE_FUNCTION (svld1_gather, load_gather_sv, sd_data, implicit) +DEF_SVE_FUNCTION (svld1_gather, load_gather_vs, sd_data, implicit) +DEF_SVE_FUNCTION (svld1sb_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1sh_gather, load_ext_gather_index, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_offset, d_integer, implicit) +DEF_SVE_FUNCTION (svld1sw_gather, load_ext_gather_index, d_integer, implicit) +DEF_SVE_FUNCTION (svld1ub_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1uh_gather, load_ext_gather_index, sd_integer, implicit) +DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_offset, d_integer, implicit) +DEF_SVE_FUNCTION (svld1uw_gather, load_ext_gather_index, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1, load, all_data, implicit) +DEF_SVE_FUNCTION (svldff1_gather, load_gather_sv, sd_data, implicit) +DEF_SVE_FUNCTION (svldff1_gather, load_gather_vs, sd_data, implicit) +DEF_SVE_FUNCTION (svldff1sb, load_ext, hsd_integer, implicit) +DEF_SVE_FUNCTION (svldff1sb_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1sh, load_ext, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1sh_gather, load_ext_gather_index, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1sw, load_ext, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_offset, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1sw_gather, load_ext_gather_index, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1ub, load_ext, hsd_integer, implicit) +DEF_SVE_FUNCTION (svldff1ub_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1uh, load_ext, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1uh_gather, load_ext_gather_index, sd_integer, implicit) +DEF_SVE_FUNCTION (svldff1uw, load_ext, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_offset, d_integer, implicit) +DEF_SVE_FUNCTION (svldff1uw_gather, load_ext_gather_index, d_integer, implicit) +DEF_SVE_FUNCTION (svldnf1, load, all_data, implicit) +DEF_SVE_FUNCTION (svldnf1sb, load_ext, hsd_integer, implicit) +DEF_SVE_FUNCTION (svldnf1sh, load_ext, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnf1sw, load_ext, d_integer, implicit) +DEF_SVE_FUNCTION (svldnf1ub, load_ext, hsd_integer, implicit) +DEF_SVE_FUNCTION (svldnf1uh, load_ext, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnf1uw, load_ext, d_integer, implicit) +DEF_SVE_FUNCTION (svmmla, mmla, none, none) +DEF_SVE_FUNCTION (svprfb_gather, prefetch_gather_offset, none, implicit) +DEF_SVE_FUNCTION (svprfd_gather, prefetch_gather_index, none, implicit) +DEF_SVE_FUNCTION (svprfh_gather, prefetch_gather_index, none, implicit) +DEF_SVE_FUNCTION (svprfw_gather, prefetch_gather_index, none, implicit) +DEF_SVE_FUNCTION (svrdffr, rdffr, none, z_or_none) +DEF_SVE_FUNCTION (svsetffr, setffr, none, none) +DEF_SVE_FUNCTION (svst1_scatter, store_scatter_index, sd_data, implicit) +DEF_SVE_FUNCTION (svst1_scatter, store_scatter_offset, sd_data, implicit) +DEF_SVE_FUNCTION (svst1b_scatter, store_scatter_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_index, sd_integer, implicit) +DEF_SVE_FUNCTION (svst1h_scatter, store_scatter_offset, sd_integer, implicit) +DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_index, d_integer, implicit) +DEF_SVE_FUNCTION (svst1w_scatter, store_scatter_offset, d_integer, implicit) +DEF_SVE_FUNCTION (svtmad, tmad, all_float, none) +DEF_SVE_FUNCTION (svtsmul, binary_uint, all_float, none) +DEF_SVE_FUNCTION (svtssel, binary_uint, all_float, none) +DEF_SVE_FUNCTION (svwrffr, setffr, none, implicit) +#undef REQUIRED_EXTENSIONS + #define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_BF16 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none) DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none) @@ -325,27 +328,37 @@ DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none) DEF_SVE_FUNCTION (svbfmlalb_lane, ternary_bfloat_lane, s_float, none) DEF_SVE_FUNCTION (svbfmlalt, ternary_bfloat_opt_n, s_float, none) DEF_SVE_FUNCTION (svbfmlalt_lane, ternary_bfloat_lane, s_float, none) -DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none) DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz) DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx) #undef REQUIRED_EXTENSIONS +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_BF16 \ + | AARCH64_FL_SM_OFF) +DEF_SVE_FUNCTION (svbfmmla, ternary_bfloat, s_float, none) +#undef REQUIRED_EXTENSIONS + #define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_I8MM -DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none) -DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none) DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none) DEF_SVE_FUNCTION (svsudot_lane, ternary_intq_uintq_lane, s_signed, none) DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, s_signed, none) DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none) #undef REQUIRED_EXTENSIONS -#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F32MM +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_I8MM \ + | AARCH64_FL_SM_OFF) +DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none) +DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none) +#undef REQUIRED_EXTENSIONS + +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_F32MM \ + | AARCH64_FL_SM_OFF) DEF_SVE_FUNCTION (svmmla, mmla, s_float, none) #undef REQUIRED_EXTENSIONS #define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F64MM -DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit) -DEF_SVE_FUNCTION (svmmla, mmla, d_float, none) DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none) DEF_SVE_FUNCTION (svtrn2q, binary, all_data, none) DEF_SVE_FUNCTION (svuzp1q, binary, all_data, none) @@ -353,3 +366,10 @@ DEF_SVE_FUNCTION (svuzp2q, binary, all_data, none) DEF_SVE_FUNCTION (svzip1q, binary, all_data, none) DEF_SVE_FUNCTION (svzip2q, binary, all_data, none) #undef REQUIRED_EXTENSIONS + +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_F64MM \ + | AARCH64_FL_SM_OFF) +DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit) +DEF_SVE_FUNCTION (svmmla, mmla, d_float, none) +#undef REQUIRED_EXTENSIONS diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def index 565393f3081..4aac1ac942a 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def +++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def @@ -51,24 +51,9 @@ DEF_SVE_FUNCTION (sveor3, ternary_opt_n, all_integer, none) DEF_SVE_FUNCTION (sveorbt, ternary_opt_n, all_integer, none) DEF_SVE_FUNCTION (sveortb, ternary_opt_n, all_integer, none) DEF_SVE_FUNCTION (svhadd, binary_opt_n, all_integer, mxz) -DEF_SVE_FUNCTION (svhistcnt, binary_to_uint, sd_integer, z) -DEF_SVE_FUNCTION (svhistseg, binary_to_uint, b_integer, none) DEF_SVE_FUNCTION (svhsub, binary_opt_n, all_integer, mxz) DEF_SVE_FUNCTION (svhsubr, binary_opt_n, all_integer, mxz) -DEF_SVE_FUNCTION (svldnt1_gather, load_gather_sv_restricted, sd_data, implicit) -DEF_SVE_FUNCTION (svldnt1_gather, load_gather_vs, sd_data, implicit) -DEF_SVE_FUNCTION (svldnt1sb_gather, load_ext_gather_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_index_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_offset_restricted, d_integer, implicit) -DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_index_restricted, d_integer, implicit) -DEF_SVE_FUNCTION (svldnt1ub_gather, load_ext_gather_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_index_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_offset_restricted, d_integer, implicit) -DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_index_restricted, d_integer, implicit) DEF_SVE_FUNCTION (svlogb, unary_to_int, all_float, mxz) -DEF_SVE_FUNCTION (svmatch, compare, bh_integer, implicit) DEF_SVE_FUNCTION (svmaxp, binary, all_arith, mx) DEF_SVE_FUNCTION (svmaxnmp, binary, all_float, mx) DEF_SVE_FUNCTION (svmla_lane, ternary_lane, hsd_integer, none) @@ -91,7 +76,6 @@ DEF_SVE_FUNCTION (svmullb_lane, binary_long_lane, sd_integer, none) DEF_SVE_FUNCTION (svmullt, binary_long_opt_n, hsd_integer, none) DEF_SVE_FUNCTION (svmullt_lane, binary_long_lane, sd_integer, none) DEF_SVE_FUNCTION (svnbsl, ternary_opt_n, all_integer, none) -DEF_SVE_FUNCTION (svnmatch, compare, bh_integer, implicit) DEF_SVE_FUNCTION (svpmul, binary_opt_n, b_unsigned, none) DEF_SVE_FUNCTION (svpmullb, binary_long_opt_n, hd_unsigned, none) DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, bs_unsigned, none) @@ -164,13 +148,6 @@ DEF_SVE_FUNCTION (svsli, ternary_shift_left_imm, all_integer, none) DEF_SVE_FUNCTION (svsqadd, binary_int_opt_n, all_unsigned, mxz) DEF_SVE_FUNCTION (svsra, ternary_shift_right_imm, all_integer, none) DEF_SVE_FUNCTION (svsri, ternary_shift_right_imm, all_integer, none) -DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_index_restricted, sd_data, implicit) -DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_offset_restricted, sd_data, implicit) -DEF_SVE_FUNCTION (svstnt1b_scatter, store_scatter_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_index_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_offset_restricted, sd_integer, implicit) -DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_index_restricted, d_integer, implicit) -DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_offset_restricted, d_integer, implicit) DEF_SVE_FUNCTION (svsubhnb, binary_narrowb_opt_n, hsd_integer, none) DEF_SVE_FUNCTION (svsubhnt, binary_narrowt_opt_n, hsd_integer, none) DEF_SVE_FUNCTION (svsublb, binary_long_opt_n, hsd_integer, none) @@ -191,7 +168,36 @@ DEF_SVE_FUNCTION (svxar, ternary_shift_right_imm, all_integer, none) #define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ | AARCH64_FL_SVE2 \ - | AARCH64_FL_SVE2_AES) + | AARCH64_FL_SM_OFF) +DEF_SVE_FUNCTION (svhistcnt, binary_to_uint, sd_integer, z) +DEF_SVE_FUNCTION (svhistseg, binary_to_uint, b_integer, none) +DEF_SVE_FUNCTION (svldnt1_gather, load_gather_sv_restricted, sd_data, implicit) +DEF_SVE_FUNCTION (svldnt1_gather, load_gather_vs, sd_data, implicit) +DEF_SVE_FUNCTION (svldnt1sb_gather, load_ext_gather_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1sh_gather, load_ext_gather_index_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_offset_restricted, d_integer, implicit) +DEF_SVE_FUNCTION (svldnt1sw_gather, load_ext_gather_index_restricted, d_integer, implicit) +DEF_SVE_FUNCTION (svldnt1ub_gather, load_ext_gather_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1uh_gather, load_ext_gather_index_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_offset_restricted, d_integer, implicit) +DEF_SVE_FUNCTION (svldnt1uw_gather, load_ext_gather_index_restricted, d_integer, implicit) +DEF_SVE_FUNCTION (svmatch, compare, bh_integer, implicit) +DEF_SVE_FUNCTION (svnmatch, compare, bh_integer, implicit) +DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_index_restricted, sd_data, implicit) +DEF_SVE_FUNCTION (svstnt1_scatter, store_scatter_offset_restricted, sd_data, implicit) +DEF_SVE_FUNCTION (svstnt1b_scatter, store_scatter_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_index_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svstnt1h_scatter, store_scatter_offset_restricted, sd_integer, implicit) +DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_index_restricted, d_integer, implicit) +DEF_SVE_FUNCTION (svstnt1w_scatter, store_scatter_offset_restricted, d_integer, implicit) +#undef REQUIRED_EXTENSIONS + +#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ + | AARCH64_FL_SVE2 \ + | AARCH64_FL_SVE2_AES \ + | AARCH64_FL_SM_OFF) DEF_SVE_FUNCTION (svaesd, binary, b_unsigned, none) DEF_SVE_FUNCTION (svaese, binary, b_unsigned, none) DEF_SVE_FUNCTION (svaesmc, unary, b_unsigned, none) @@ -202,7 +208,8 @@ DEF_SVE_FUNCTION (svpmullt_pair, binary_opt_n, d_unsigned, none) #define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ | AARCH64_FL_SVE2 \ - | AARCH64_FL_SVE2_BITPERM) + | AARCH64_FL_SVE2_BITPERM \ + | AARCH64_FL_SM_OFF) DEF_SVE_FUNCTION (svbdep, binary_opt_n, all_unsigned, none) DEF_SVE_FUNCTION (svbext, binary_opt_n, all_unsigned, none) DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none) @@ -210,13 +217,15 @@ DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none) #define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ | AARCH64_FL_SVE2 \ - | AARCH64_FL_SVE2_SHA3) + | AARCH64_FL_SVE2_SHA3 \ + | AARCH64_FL_SM_OFF) DEF_SVE_FUNCTION (svrax1, binary, d_integer, none) #undef REQUIRED_EXTENSIONS #define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \ | AARCH64_FL_SVE2 \ - | AARCH64_FL_SVE2_SM4) + | AARCH64_FL_SVE2_SM4 \ + | AARCH64_FL_SM_OFF) DEF_SVE_FUNCTION (svsm4e, binary, s_unsigned, none) DEF_SVE_FUNCTION (svsm4ekey, binary, s_unsigned, none) #undef REQUIRED_EXTENSIONS diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index ced3fcfafdf..41e7d88bffa 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -715,6 +715,13 @@ check_required_extensions (location_t location, tree fndecl, if (missing_extensions == 0) return check_required_registers (location, fndecl); + if (missing_extensions & AARCH64_FL_SM_OFF) + { + error_at (location, "ACLE function %qD cannot be called when" + " SME streaming mode is enabled", fndecl); + return false; + } + static const struct { aarch64_feature_flags flag; const char *name; diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e9cebffe3e0..3f48e4cdf26 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -1086,7 +1086,7 @@ (define_insn "aarch64_wrffr" (match_operand:VNx16BI 0 "aarch64_simd_reg_or_minus_one")) (set (reg:VNx16BI FFRT_REGNUM) (unspec:VNx16BI [(match_dup 0)] UNSPEC_WRFFR))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 ] [ Dm ] setffr [ Upa ] wrffr\t%0.b @@ -1128,7 +1128,7 @@ (define_insn "aarch64_copy_ffr_to_ffrt" (define_insn "aarch64_rdffr" [(set (match_operand:VNx16BI 0 "register_operand" "=Upa") (reg:VNx16BI FFRT_REGNUM))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffr\t%0.b" ) @@ -1138,7 +1138,7 @@ (define_insn "aarch64_rdffr_z" (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_operand:VNx16BI 1 "register_operand" "Upa")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffr\t%0.b, %1/z" ) @@ -1154,7 +1154,7 @@ (define_insn "*aarch64_rdffr_z_ptest" (match_dup 1))] UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0 "=Upa"))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffrs\t%0.b, %1/z" ) @@ -1168,7 +1168,7 @@ (define_insn "*aarch64_rdffr_ptest" (reg:VNx16BI FFRT_REGNUM)] UNSPEC_PTEST)) (clobber (match_scratch:VNx16BI 0 "=Upa"))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffrs\t%0.b, %1/z" ) @@ -1187,7 +1187,7 @@ (define_insn "*aarch64_rdffr_z_cc" (and:VNx16BI (reg:VNx16BI FFRT_REGNUM) (match_dup 1)))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffrs\t%0.b, %1/z" ) @@ -1202,7 +1202,7 @@ (define_insn "*aarch64_rdffr_cc" UNSPEC_PTEST)) (set (match_operand:VNx16BI 0 "register_operand" "=Upa") (reg:VNx16BI FFRT_REGNUM))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "rdffrs\t%0.b, %1/z" ) @@ -1332,7 +1332,7 @@ (define_insn "@aarch64_ldf1" (match_operand:SVE_FULL 1 "aarch64_sve_ldf1_operand" "Ut") (reg:VNx16BI FFRT_REGNUM)] SVE_LDFF1_LDNF1))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "ldf1\t%0., %2/z, %1" ) @@ -1366,7 +1366,9 @@ (define_insn_and_rewrite "@aarch64_ldf1_ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" "ldf1\t%0., %2/z, %1" "&& !CONSTANT_P (operands[3])" { @@ -1414,7 +1416,7 @@ (define_expand "gather_load" (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { operands[5] = aarch64_ptrue_reg (mode); } @@ -1432,7 +1434,7 @@ (define_insn "mask_gather_load" (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5 ] [&w, Z, w, Ui1, Ui1, Upl] ld1\t%0.s, %5/z, [%2.s] [?w, Z, 0, Ui1, Ui1, Upl] ^ @@ -1461,7 +1463,7 @@ (define_insn "mask_gather_load" (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, Z, w, i, Ui1, Upl] ld1\t%0.d, %5/z, [%2.d] [?w, Z, 0, i, Ui1, Upl] ^ @@ -1489,7 +1491,7 @@ (define_insn_and_rewrite "*mask_gather_load_xtw_unpac (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, xtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1519,7 +1521,7 @@ (define_insn_and_rewrite "*mask_gather_load_sxtw" (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, sxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1546,7 +1548,7 @@ (define_insn "*mask_gather_load_uxtw" (match_operand:DI 4 "aarch64_gather_scale_operand_") (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, uxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1583,7 +1585,9 @@ (define_insn_and_rewrite "@aarch64_gather_load_ (mem:BLK (scratch))] UNSPEC_LD1_GATHER))] UNSPEC_PRED_X))] - "TARGET_SVE && (~ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" {@ [cons: =0, 1, 2, 3, 4, 5, 6] [&w, Z, w, Ui1, Ui1, Upl, UplDnm] ld1\t%0.s, %5/z, [%2.s] [?w, Z, 0, Ui1, Ui1, Upl, UplDnm] ^ @@ -1620,7 +1624,9 @@ (define_insn_and_rewrite "@aarch64_gather_load_ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" {@ [cons: =0, 1, 2, 3, 4, 5, 6] [&w, Z, w, i, Ui1, Upl, UplDnm] ld1\t%0.d, %5/z, [%2.d] [?w, Z, 0, i, Ui1, Upl, UplDnm] ^ @@ -1656,7 +1662,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, xtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1691,7 +1699,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, sxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1723,7 +1733,9 @@ (define_insn_and_rewrite "*aarch64_gather_load_ & ) == 0" + "TARGET_SVE + && TARGET_NON_STREAMING + && (~ & ) == 0" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ld1\t%0.d, %5/z, [%1, %2.d, uxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1757,7 +1769,7 @@ (define_insn "@aarch64_ldff1_gather" (mem:BLK (scratch)) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_LDFF1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5 ] [&w, Z, w, i, Ui1, Upl] ldff1w\t%0.s, %5/z, [%2.s] [?w, Z, 0, i, Ui1, Upl] ^ @@ -1787,7 +1799,7 @@ (define_insn "@aarch64_ldff1_gather" (mem:BLK (scratch)) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_LDFF1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5 ] [&w, Z, w, i, Ui1, Upl ] ldff1d\t%0.d, %5/z, [%2.d] [?w, Z, 0, i, Ui1, Upl ] ^ @@ -1817,7 +1829,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather_sxtw" (mem:BLK (scratch)) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_LDFF1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1844,7 +1856,7 @@ (define_insn "*aarch64_ldff1_gather_uxtw" (mem:BLK (scratch)) (reg:VNx16BI FFRT_REGNUM)] UNSPEC_LDFF1_GATHER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3, 4, 5] [&w, rk, w, i, Ui1, Upl ] ldff1d\t%0.d, %5/z, [%1, %2.d, uxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1882,7 +1894,7 @@ (define_insn_and_rewrite "@aarch64_ldff1_gather_\t%0.s, %5/z, [%2.s] [?w, Z, 0, i, Ui1, Upl, UplDnm] ^ @@ -1920,7 +1932,7 @@ (define_insn_and_rewrite "@aarch64_ldff1_gather_\t%0.d, %5/z, [%2.d] [?w, Z, 0, i, Ui1, Upl, UplDnm] ^ @@ -1958,7 +1970,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather_\t%0.d, %5/z, [%1, %2.d, sxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -1990,7 +2002,7 @@ (define_insn_and_rewrite "*aarch64_ldff1_gather_\t%0.d, %5/z, [%1, %2.d, uxtw] [?w, rk, 0, i, Ui1, Upl ] ^ @@ -2068,7 +2080,7 @@ (define_insn "@aarch64_sve_gather_prefetch" UNSPEC_SVE_PREFETCH_GATHER) (match_operand:DI 7 "const_int_operand") (match_operand:DI 8 "const_int_operand"))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { static const char *const insns[][2] = { "prf", "%0, [%2.s]", @@ -2097,7 +2109,7 @@ (define_insn "@aarch64_sve_gather_prefetch" UNSPEC_SVE_PREFETCH_GATHER) (match_operand:DI 7 "const_int_operand") (match_operand:DI 8 "const_int_operand"))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { static const char *const insns[][2] = { "prf", "%0, [%2.d]", @@ -2128,7 +2140,7 @@ (define_insn_and_rewrite "*aarch64_sve_gather_prefetch_ux UNSPEC_SVE_PREFETCH_GATHER) (match_operand:DI 7 "const_int_operand") (match_operand:DI 8 "const_int_operand"))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { static const char *const insns[][2] = { "prfb", "%0, [%1, %2.d, uxtw]", @@ -2325,7 +2337,7 @@ (define_expand "scatter_store" (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_24 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { operands[5] = aarch64_ptrue_reg (mode); } @@ -2343,7 +2355,7 @@ (define_insn "mask_scatter_store" (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_4 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 2 , 3 , 4 , 5 ] [ Z , w , Ui1 , Ui1 , w , Upl ] st1\t%4.s, %5, [%1.s] [ vgw , w , Ui1 , Ui1 , w , Upl ] st1\t%4.s, %5, [%1.s, #%0] @@ -2366,7 +2378,7 @@ (define_insn "mask_scatter_store" (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_2 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 3 , 4 , 5 ] [ Z , w , Ui1 , w , Upl ] st1\t%4.d, %5, [%1.d] [ vgd , w , Ui1 , w , Upl ] st1\t%4.d, %5, [%1.d, #%0] @@ -2390,7 +2402,7 @@ (define_insn_and_rewrite "*mask_scatter_store_xtw_unp (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_2 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 3 , 4 , 5 ] [ rk , w , Ui1 , w , Upl ] st1\t%4.d, %5, [%0, %1.d, xtw] [ rk , w , i , w , Upl ] st1\t%4.d, %5, [%0, %1.d, xtw %p3] @@ -2418,7 +2430,7 @@ (define_insn_and_rewrite "*mask_scatter_store_sxtw" (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_2 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 3 , 4 , 5 ] [ rk , w , Ui1 , w , Upl ] st1\t%4.d, %5, [%0, %1.d, sxtw] [ rk , w , i , w , Upl ] st1\t%4.d, %5, [%0, %1.d, sxtw %p3] @@ -2443,7 +2455,7 @@ (define_insn "*mask_scatter_store_uxtw" (match_operand:DI 3 "aarch64_gather_scale_operand_") (match_operand:SVE_2 4 "register_operand")] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 3 , 4 , 5 ] [ rk , w , Ui1 , w , Upl ] st1\t%4.d, %5, [%0, %1.d, uxtw] [ rk , w , i , w , Upl ] st1\t%4.d, %5, [%0, %1.d, uxtw %p3] @@ -2472,7 +2484,7 @@ (define_insn "@aarch64_scatter_store_trunc" (truncate:VNx4_NARROW (match_operand:VNx4_WIDE 4 "register_operand"))] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 1 , 2 , 4 , 5 ] [ w , Ui1 , w , Upl ] st1\t%4.s, %5, [%1.s] [ w , Ui1 , w , Upl ] st1\t%4.s, %5, [%1.s, #%0] @@ -2496,7 +2508,7 @@ (define_insn "@aarch64_scatter_store_trunc" (truncate:VNx2_NARROW (match_operand:VNx2_WIDE 4 "register_operand"))] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 1 , 4 , 5 ] [ w , w , Upl ] st1\t%4.d, %5, [%1.d] [ w , w , Upl ] st1\t%4.d, %5, [%1.d, #%0] @@ -2522,7 +2534,7 @@ (define_insn_and_rewrite "*aarch64_scatter_store_trunc\t%4.d, %5, [%0, %1.d, sxtw] [ rk , w , w , Upl ] st1\t%4.d, %5, [%0, %1.d, sxtw %p3] @@ -2547,7 +2559,7 @@ (define_insn "*aarch64_scatter_store_trunc_uxt (truncate:VNx2_NARROW (match_operand:VNx2_WIDE 4 "register_operand"))] UNSPEC_ST1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 4 , 5 ] [ rk , w , w , Upl ] st1\t%4.d, %5, [%0, %1.d, uxtw] [ rk , w , w , Upl ] st1\t%4.d, %5, [%0, %1.d, uxtw %p3] @@ -2727,7 +2739,7 @@ (define_insn "@aarch64_sve_ld1ro" (match_operand:OI 1 "aarch64_sve_ld1ro_operand_" "UO")] UNSPEC_LD1RO))] - "TARGET_SVE_F64MM" + "TARGET_SVE_F64MM && TARGET_NON_STREAMING" { operands[1] = gen_rtx_MEM (mode, XEXP (operands[1], 0)); return "ld1ro\t%0., %2/z, %1"; @@ -3971,7 +3983,7 @@ (define_insn "@aarch64_adr" [(match_operand:SVE_FULL_SDI 1 "register_operand" "w") (match_operand:SVE_FULL_SDI 2 "register_operand" "w")] UNSPEC_ADR))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0., [%1., %2.]" ) @@ -3987,7 +3999,7 @@ (define_insn_and_rewrite "*aarch64_adr_sxtw" (match_operand:VNx2DI 2 "register_operand" "w")))] UNSPEC_PRED_X)] UNSPEC_ADR))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, sxtw]" "&& !CONSTANT_P (operands[3])" { @@ -4004,7 +4016,7 @@ (define_insn "*aarch64_adr_uxtw_unspec" (match_operand:VNx2DI 2 "register_operand" "w") (match_operand:VNx2DI 3 "aarch64_sve_uxtw_immediate"))] UNSPEC_ADR))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, uxtw]" ) @@ -4016,7 +4028,7 @@ (define_insn "*aarch64_adr_uxtw_and" (match_operand:VNx2DI 2 "register_operand" "w") (match_operand:VNx2DI 3 "aarch64_sve_uxtw_immediate")) (match_operand:VNx2DI 1 "register_operand" "w")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, uxtw]" ) @@ -4031,7 +4043,7 @@ (define_expand "@aarch64_adr_shift" (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))] UNSPEC_PRED_X) (match_operand:SVE_FULL_SDI 1 "register_operand")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { operands[4] = CONSTM1_RTX (mode); } @@ -4047,7 +4059,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift" (match_operand:SVE_24I 3 "const_1_to_3_operand"))] UNSPEC_PRED_X) (match_operand:SVE_24I 1 "register_operand" "w")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0., [%1., %2., lsl %3]" "&& !CONSTANT_P (operands[4])" { @@ -4071,7 +4083,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift_sxtw" (match_operand:VNx2DI 3 "const_1_to_3_operand"))] UNSPEC_PRED_X) (match_operand:VNx2DI 1 "register_operand" "w")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, sxtw %3]" "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))" { @@ -4092,7 +4104,7 @@ (define_insn_and_rewrite "*aarch64_adr_shift_uxtw" (match_operand:VNx2DI 3 "const_1_to_3_operand"))] UNSPEC_PRED_X) (match_operand:VNx2DI 1 "register_operand" "w")))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, uxtw %3]" "&& !CONSTANT_P (operands[5])" { @@ -7197,7 +7209,7 @@ (define_insn "@aarch64_sve_add_" (match_operand: 3 "register_operand")] MATMUL) (match_operand:VNx4SI_ONLY 1 "register_operand")))] - "TARGET_SVE_I8MM" + "TARGET_SVE_I8MM && TARGET_NON_STREAMING" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] [ w , 0 , w , w ; * ] mmla\t%0.s, %2.b, %3.b [ ?&w , w , w , w ; yes ] movprfx\t%0, %1\;mmla\t%0.s, %2.b, %3.b @@ -7772,7 +7784,7 @@ (define_insn "@aarch64_sve_" (match_operand:SVE_MATMULF 3 "register_operand") (match_operand:SVE_MATMULF 1 "register_operand")] FMMLA))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] [ w , 0 , w , w ; * ] \t%0., %2., %3. [ ?&w , w , w , w ; yes ] movprfx\t%0, %1\;\t%0., %2., %3. @@ -8841,7 +8853,7 @@ (define_expand "fold_left_plus_" (match_operand: 1 "register_operand") (match_operand:SVE_FULL_F 2 "register_operand")] UNSPEC_FADDA))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" { operands[3] = aarch64_ptrue_reg (mode); } @@ -8854,7 +8866,7 @@ (define_insn "mask_fold_left_plus_" (match_operand: 1 "register_operand" "0") (match_operand:SVE_FULL_F 2 "register_operand" "w")] UNSPEC_FADDA))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "fadda\t%0, %3, %0, %2." ) @@ -8908,7 +8920,7 @@ (define_insn "@aarch64_sve_compact" [(match_operand: 1 "register_operand" "Upl") (match_operand:SVE_FULL_SD 2 "register_operand" "w")] UNSPEC_SVE_COMPACT))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" "compact\t%0., %1, %2." ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ffa964d6060..79e19699bc4 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -109,7 +109,7 @@ (define_insn "@aarch64_gather_ldnt" (match_operand: 3 "register_operand") (mem:BLK (scratch))] UNSPEC_LDNT1_GATHER))] - "TARGET_SVE2" + "TARGET_SVE2 && TARGET_NON_STREAMING" {@ [cons: =0, 1, 2, 3] [&w, Upl, Z, w ] ldnt1\t%0., %1/z, [%3.] [?w, Upl, Z, 0 ] ^ @@ -132,6 +132,7 @@ (define_insn_and_rewrite "@aarch64_gather_ldnt_ & ) == 0" {@ [cons: =0, 1, 2, 3, 4] [&w, Upl, Z, w, UplDnm] ldnt1\t%0., %1/z, [%3.] @@ -165,7 +166,7 @@ (define_insn "@aarch64_scatter_stnt" (match_operand:SVE_FULL_SD 3 "register_operand")] UNSPEC_STNT1_SCATTER))] - "TARGET_SVE" + "TARGET_SVE && TARGET_NON_STREAMING" {@ [ cons: 0 , 1 , 2 , 3 ] [ Upl , Z , w , w ] stnt1\t%3., %0, [%2.] [ Upl , r , w , w ] stnt1\t%3., %0, [%2., %1] @@ -183,6 +184,7 @@ (define_insn "@aarch64_scatter_stnt_" (match_operand:SVE_FULL_SDI 3 "register_operand"))] UNSPEC_STNT1_SCATTER))] "TARGET_SVE2 + && TARGET_NON_STREAMING && (~ & ) == 0" {@ [ cons: 0 , 1 , 2 , 3 ] [ Upl , Z , w , w ] stnt1\t%3., %0, [%2.] @@ -2469,7 +2471,7 @@ (define_insn "@aarch64_sve2_histcnt" (match_operand:SVE_FULL_SDI 2 "register_operand" "w") (match_operand:SVE_FULL_SDI 3 "register_operand" "w")] UNSPEC_HISTCNT))] - "TARGET_SVE2" + "TARGET_SVE2 && TARGET_NON_STREAMING" "histcnt\t%0., %1/z, %2., %3." ) @@ -2479,7 +2481,7 @@ (define_insn "@aarch64_sve2_histseg" [(match_operand:VNx16QI_ONLY 1 "register_operand" "w") (match_operand:VNx16QI_ONLY 2 "register_operand" "w")] UNSPEC_HISTSEG))] - "TARGET_SVE2" + "TARGET_SVE2 && TARGET_NON_STREAMING" "histseg\t%0., %1., %2." ) @@ -2503,7 +2505,7 @@ (define_insn "@aarch64_pred_" SVE2_MATCH)] UNSPEC_PRED_Z)) (clobber (reg:CC_NZC CC_REGNUM))] - "TARGET_SVE2" + "TARGET_SVE2 && TARGET_NON_STREAMING" "\t%0., %1/z, %3., %4." ) @@ -2534,6 +2536,7 @@ (define_insn_and_rewrite "*aarch64_pred__cc" SVE2_MATCH)] UNSPEC_PRED_Z))] "TARGET_SVE2 + && TARGET_NON_STREAMING && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" "\t%0., %1/z, %2., %3." "&& !rtx_equal_p (operands[4], operands[6])" @@ -2561,6 +2564,7 @@ (define_insn_and_rewrite "*aarch64_pred__ptest" UNSPEC_PTEST)) (clobber (match_scratch: 0 "=Upa"))] "TARGET_SVE2 + && TARGET_NON_STREAMING && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" "\t%0., %1/z, %2., %3." "&& !rtx_equal_p (operands[4], operands[6])" diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 687c1317b4f..0ea8b2d3524 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -253,6 +253,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define AARCH64_ISA_LS64 (aarch64_isa_flags & AARCH64_FL_LS64) #define AARCH64_ISA_CSSC (aarch64_isa_flags & AARCH64_FL_CSSC) +/* The current function is a normal non-streaming function. */ +#define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF) + /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO) @@ -291,16 +294,16 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define TARGET_SVE2 (AARCH64_ISA_SVE2) /* SVE2 AES instructions, enabled through +sve2-aes. */ -#define TARGET_SVE2_AES (AARCH64_ISA_SVE2_AES) +#define TARGET_SVE2_AES (AARCH64_ISA_SVE2_AES && TARGET_NON_STREAMING) /* SVE2 BITPERM instructions, enabled through +sve2-bitperm. */ -#define TARGET_SVE2_BITPERM (AARCH64_ISA_SVE2_BITPERM) +#define TARGET_SVE2_BITPERM (AARCH64_ISA_SVE2_BITPERM && TARGET_NON_STREAMING) /* SVE2 SHA3 instructions, enabled through +sve2-sha3. */ -#define TARGET_SVE2_SHA3 (AARCH64_ISA_SVE2_SHA3) +#define TARGET_SVE2_SHA3 (AARCH64_ISA_SVE2_SHA3 && TARGET_NON_STREAMING) /* SVE2 SM4 instructions, enabled through +sve2-sm4. */ -#define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4) +#define TARGET_SVE2_SM4 (AARCH64_ISA_SVE2_SM4 && TARGET_NON_STREAMING) /* SME instructions, enabled through +sme. Note that this does not imply anything about the state of PSTATE.SM. */ diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e7aa7e35ae1..5f7cd886283 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2707,7 +2707,7 @@ (define_int_iterator SVE_INT_UNARY [UNSPEC_RBIT UNSPEC_REVB (define_int_iterator SVE_FP_UNARY [UNSPEC_FRECPE UNSPEC_RSQRTE]) -(define_int_iterator SVE_FP_UNARY_INT [UNSPEC_FEXPA]) +(define_int_iterator SVE_FP_UNARY_INT [(UNSPEC_FEXPA "TARGET_NON_STREAMING")]) (define_int_iterator SVE_INT_SHIFT_IMM [UNSPEC_ASRD (UNSPEC_SQSHLU "TARGET_SVE2") @@ -2721,7 +2721,7 @@ (define_int_iterator SVE_FP_BINARY_INT [UNSPEC_FTSMUL UNSPEC_FTSSEL]) (define_int_iterator SVE_BFLOAT_TERNARY_LONG [UNSPEC_BFDOT UNSPEC_BFMLALB UNSPEC_BFMLALT - UNSPEC_BFMMLA]) + (UNSPEC_BFMMLA "TARGET_NON_STREAMING")]) (define_int_iterator SVE_BFLOAT_TERNARY_LONG_LANE [UNSPEC_BFDOT UNSPEC_BFMLALB diff --git a/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp new file mode 100644 index 00000000000..d6a5a561a33 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sve/aarch64-ssve.exp @@ -0,0 +1,308 @@ +# Specific regression driver for AArch64 SME. +# Copyright (C) 2009-2023 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +# Test whether certain SVE instructions are accepted or rejected in +# SME streaming mode. + +# Exit immediately if this isn't an AArch64 target. +if {![istarget aarch64*-*-*] } { + return +} + +load_lib gcc-defs.exp + +gcc_parallel_test_enable 0 + +# Code shared by all tests. +set preamble { +#include + +#pragma GCC target "+i8mm+f32mm+f64mm+sve2+sve2-bitperm+sve2-sm4+sve2-aes+sve2-sha3+sme" + +extern svbool_t &pred; + +extern svint8_t &s8; +extern svint32_t &s32; + +extern svuint8_t &u8; +extern svuint16_t &u16; +extern svuint32_t &u32; +extern svuint64_t &u64; + +extern svbfloat16_t &bf16; +extern svfloat32_t &f32; + +extern void *void_ptr; + +extern int8_t *s8_ptr; +extern int16_t *s16_ptr; +extern int32_t *s32_ptr; + +extern uint8_t *u8_ptr; +extern uint16_t *u16_ptr; +extern uint32_t *u32_ptr; +extern uint64_t *u64_ptr; + +extern uint64_t indx; +} + +# Wrap a standalone call in a streaming-compatible function. +set sc_harness { +void +foo () [[arm::streaming_compatible]] +{ + $CALL; +} +} + +# HARNESS is some source code that should be appended to the preamble +# variable defined above. It includes the string "$CALL", which should be +# replaced by the function call in CALL. The result after both steps is +# a complete C++ translation unit. +# +# Try compiling the C++ code and see what output GCC produces. +# The expected output is either: +# +# - empty, if SHOULD_PASS is true +# - a message rejecting CALL in streaming mode, if SHOULD_PASS is false +# +# CALL is simple enough that it can be used in test names. +proc check_ssve_call { harness name call should_pass } { + global preamble + + set filename test-[pid] + set fd [open $filename.cc w] + puts $fd $preamble + puts -nonewline $fd [string map [list {$CALL} $call] $harness] + close $fd + remote_download host $filename.cc + + set test "streaming SVE call $name" + + set gcc_output [g++_target_compile $filename.cc $filename.s assembly ""] + remote_file build delete $filename.cc $filename.s + + if { [string equal $gcc_output ""] } { + if { $should_pass } { + pass $test + } else { + fail $test + } + return + } + + set lines [split $gcc_output "\n"] + set error_text "cannot be called when SME streaming mode is enabled" + if { [llength $lines] == 3 + && [string first "In function" [lindex $lines 0]] >= 0 + && [string first $error_text [lindex $lines 1]] >= 0 + && [string equal [lindex $lines 2] ""] } { + if { $should_pass } { + fail $test + } else { + pass $test + } + return + } + + verbose -log "$test: unexpected output" + fail $test +} + +# Apply check_ssve_call to each line in CALLS. The other arguments are +# as for check_ssve_call. +proc check_ssve_calls { harness calls should_pass } { + foreach line [split $calls "\n"] { + set call [string trim $line] + if { [string equal $call ""] } { + continue + } + check_ssve_call $harness "$call" $call $should_pass + } +} + +# A small selection of things that are valid in streaming mode. +set streaming_ok { + s8 = svadd_x (pred, s8, s8) + s8 = svld1 (pred, s8_ptr) +} + +# This order follows the list in the SME manual. +set nonstreaming_only { + u32 = svadrb_offset (u32, u32) + u64 = svadrb_offset (u64, u64) + u32 = svadrh_index (u32, u32) + u64 = svadrh_index (u64, u64) + u32 = svadrw_index (u32, u32) + u64 = svadrw_index (u64, u64) + u32 = svadrd_index (u32, u32) + u64 = svadrd_index (u64, u64) + u8 = svaesd (u8, u8) + u8 = svaese (u8, u8) + u8 = svaesimc (u8) + u8 = svaesmc (u8) + u8 = svbdep (u8, u8) + u8 = svbext (u8, u8) + f32 = svbfmmla (f32, bf16, bf16) + u8 = svbgrp (u8, u8) + u32 = svcompact (pred, u32) + f32 = svadda (pred, 1.0f, f32) + f32 = svexpa (u32) + f32 = svmmla (f32, f32, f32) + f32 = svtmad (f32, f32, 0) + f32 = svtsmul (f32, u32) + f32 = svtssel (f32, u32) + u32 = svhistcnt_z (pred, u32, u32) + u8 = svhistseg (u8, u8) + u32 = svld1ub_gather_offset_u32 (pred, u8_ptr, u32) + u32 = svld1ub_gather_offset_u32 (pred, u32, 1) + u64 = svld1_gather_index (pred, u64_ptr, u64) + u64 = svld1_gather_index_u64 (pred, u64, 1) + u32 = svld1uh_gather_index_u32 (pred, u16_ptr, u32) + u32 = svld1uh_gather_index_u32 (pred, u32, 1) + u8 = svld1ro (pred, u8_ptr + indx) + u8 = svld1ro (pred, u8_ptr + 1) + u16 = svld1ro (pred, u16_ptr + indx) + u16 = svld1ro (pred, u16_ptr + 1) + u32 = svld1ro (pred, u32_ptr + indx) + u32 = svld1ro (pred, u32_ptr + 1) + u64 = svld1ro (pred, u64_ptr + indx) + u64 = svld1ro (pred, u64_ptr + 1) + u32 = svld1sb_gather_offset_u32 (pred, s8_ptr, u32) + u32 = svld1sb_gather_offset_u32 (pred, u32, 1) + u32 = svld1sh_gather_index_u32 (pred, s16_ptr, u32) + u32 = svld1sh_gather_index_u32 (pred, u32, 1) + u64 = svld1sw_gather_index_u64 (pred, s32_ptr, u64) + u64 = svld1sw_gather_index_u64 (pred, u64, 1) + u64 = svld1uw_gather_index_u64 (pred, u32_ptr, u64) + u64 = svld1uw_gather_index_u64 (pred, u64, 1) + u32 = svld1_gather_index (pred, u32_ptr, u32) + u32 = svld1_gather_index_u32 (pred, u32, 1) + u8 = svldff1(pred, u8_ptr) + u16 = svldff1ub_u16(pred, u8_ptr) + u32 = svldff1ub_u32(pred, u8_ptr) + u64 = svldff1ub_u64(pred, u8_ptr) + u32 = svldff1ub_gather_offset_u32 (pred, u8_ptr, u32) + u32 = svldff1ub_gather_offset_u32 (pred, u32, 1) + u64 = svldff1(pred, u64_ptr) + u64 = svldff1_gather_index (pred, u64_ptr, u64) + u64 = svldff1_gather_index_u64 (pred, u64, 1) + u16 = svldff1(pred, u16_ptr) + u32 = svldff1uh_u32(pred, u16_ptr) + u64 = svldff1uh_u64(pred, u16_ptr) + u32 = svldff1uh_gather_offset_u32 (pred, u16_ptr, u32) + u32 = svldff1uh_gather_offset_u32 (pred, u32, 1) + u16 = svldff1sb_u16(pred, s8_ptr) + u32 = svldff1sb_u32(pred, s8_ptr) + u64 = svldff1sb_u64(pred, s8_ptr) + u32 = svldff1sb_gather_offset_u32 (pred, s8_ptr, u32) + u32 = svldff1sb_gather_offset_u32 (pred, u32, 1) + u32 = svldff1sh_u32(pred, s16_ptr) + u64 = svldff1sh_u64(pred, s16_ptr) + u32 = svldff1sh_gather_offset_u32 (pred, s16_ptr, u32) + u32 = svldff1sh_gather_offset_u32 (pred, u32, 1) + u64 = svldff1sw_u64(pred, s32_ptr) + u64 = svldff1sw_gather_offset_u64 (pred, s32_ptr, u64) + u64 = svldff1sw_gather_offset_u64 (pred, u64, 1) + u32 = svldff1(pred, u32_ptr) + u32 = svldff1_gather_index (pred, u32_ptr, u32) + u32 = svldff1_gather_index_u32 (pred, u32, 1) + u64 = svldff1uw_u64(pred, u32_ptr) + u64 = svldff1uw_gather_offset_u64 (pred, u32_ptr, u64) + u64 = svldff1uw_gather_offset_u64 (pred, u64, 1) + u8 = svldnf1(pred, u8_ptr) + u16 = svldnf1ub_u16(pred, u8_ptr) + u32 = svldnf1ub_u32(pred, u8_ptr) + u64 = svldnf1ub_u64(pred, u8_ptr) + u64 = svldnf1(pred, u64_ptr) + u16 = svldnf1(pred, u16_ptr) + u32 = svldnf1uh_u32(pred, u16_ptr) + u64 = svldnf1uh_u64(pred, u16_ptr) + u16 = svldnf1sb_u16(pred, s8_ptr) + u32 = svldnf1sb_u32(pred, s8_ptr) + u64 = svldnf1sb_u64(pred, s8_ptr) + u32 = svldnf1sh_u32(pred, s16_ptr) + u64 = svldnf1sh_u64(pred, s16_ptr) + u64 = svldnf1sw_u64(pred, s32_ptr) + u32 = svldnf1(pred, u32_ptr) + u64 = svldnf1uw_u64(pred, u32_ptr) + u32 = svldnt1ub_gather_offset_u32 (pred, u8_ptr, u32) + u32 = svldnt1ub_gather_offset_u32 (pred, u32, 1) + u64 = svldnt1_gather_index (pred, u64_ptr, u64) + u64 = svldnt1_gather_index_u64 (pred, u64, 1) + u32 = svldnt1uh_gather_offset_u32 (pred, u16_ptr, u32) + u32 = svldnt1uh_gather_offset_u32 (pred, u32, 1) + u32 = svldnt1sb_gather_offset_u32 (pred, s8_ptr, u32) + u32 = svldnt1sb_gather_offset_u32 (pred, u32, 1) + u32 = svldnt1sh_gather_offset_u32 (pred, s16_ptr, u32) + u32 = svldnt1sh_gather_offset_u32 (pred, u32, 1) + u64 = svldnt1sw_gather_offset_u64 (pred, s32_ptr, u64) + u64 = svldnt1sw_gather_offset_u64 (pred, u64, 1) + u64 = svldnt1uw_gather_offset_u64 (pred, u32_ptr, u64) + u64 = svldnt1uw_gather_offset_u64 (pred, u64, 1) + u32 = svldnt1_gather_offset (pred, u32_ptr, u32) + u32 = svldnt1_gather_offset_u32 (pred, u32, 1) + pred = svmatch (pred, u8, u8) + pred = svnmatch (pred, u8, u8) + u64 = svpmullb_pair (u64, u64) + u64 = svpmullt_pair (u64, u64) + svprfb_gather_offset (pred, void_ptr, u64, SV_PLDL1KEEP) + svprfb_gather_offset (pred, u64, 1, SV_PLDL1KEEP) + svprfd_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP) + svprfd_gather_index (pred, u64, 1, SV_PLDL1KEEP) + svprfh_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP) + svprfh_gather_index (pred, u64, 1, SV_PLDL1KEEP) + svprfw_gather_index (pred, void_ptr, u64, SV_PLDL1KEEP) + svprfw_gather_index (pred, u64, 1, SV_PLDL1KEEP) + u64 = svrax1 (u64, u64) + pred = svrdffr () + pred = svrdffr_z (pred) + svsetffr () + u32 = svsm4e (u32, u32) + u32 = svsm4ekey (u32, u32) + s32 = svmmla (s32, s8, s8) + svst1b_scatter_offset (pred, u8_ptr, u32, u32) + svst1b_scatter_offset (pred, u32, 1, u32) + svst1_scatter_index (pred, u64_ptr, u64, u64) + svst1_scatter_index (pred, u64, 1, u64) + svst1h_scatter_index (pred, u16_ptr, u32, u32) + svst1h_scatter_index (pred, u32, 1, u32) + svst1w_scatter_index (pred, u32_ptr, u64, u64) + svst1w_scatter_index (pred, u64, 1, u64) + svst1_scatter_index (pred, u32_ptr, u32, u32) + svst1_scatter_index (pred, u32, 1, u32) + svstnt1b_scatter_offset (pred, u8_ptr, u32, u32) + svstnt1b_scatter_offset (pred, u32, 1, u32) + svstnt1_scatter_offset (pred, u64_ptr, u64, u64) + svstnt1_scatter_offset (pred, u64, 1, u64) + svstnt1h_scatter_offset (pred, u16_ptr, u32, u32) + svstnt1h_scatter_offset (pred, u32, 1, u32) + svstnt1w_scatter_offset (pred, u32_ptr, u64, u64) + svstnt1w_scatter_offset (pred, u64, 1, u64) + svstnt1_scatter_offset (pred, u32_ptr, u32, u32) + svstnt1_scatter_offset (pred, u32, 1, u32) + u32 = svmmla (u32, u8, u8) + s32 = svusmmla (s32, u8, s8) + svwrffr (pred) +} + +check_ssve_calls $sc_harness $streaming_ok 1 +check_ssve_calls $sc_harness $nonstreaming_only 0 + +gcc_parallel_test_enable 1 diff --git a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp index 5b40d0d5c39..4b4ee10a014 100644 --- a/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp +++ b/gcc/testsuite/g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp @@ -50,6 +50,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } { torture-init set-torture-options { "-std=c++98 -O0 -g" + "-std=c++11 -O0 -DSTREAMING_COMPATIBLE" "-std=c++98 -O1 -g" "-std=c++11 -O2 -g" "-std=c++14 -O3 -g" diff --git a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp index b605da8770b..9cd2efd05cb 100644 --- a/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp +++ b/gcc/testsuite/g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp @@ -53,6 +53,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } { torture-init set-torture-options { "-std=c++98 -O0 -g" + "-std=c++11 -O0 -DSTREAMING_COMPATIBLE" "-std=c++98 -O1 -g" "-std=c++11 -O2 -g" "-std=c++14 -O3 -g" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp index ba4704e54f4..eee7c420ffd 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp @@ -50,6 +50,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } { torture-init set-torture-options { "-std=c90 -O0 -g" + "-std=c90 -O0 -DSTREAMING_COMPATIBLE" "-std=c90 -O1 -g" "-std=c99 -O2 -g" "-std=c11 -O3 -g" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c index 642c45ab492..d381d881d82 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c index 79bdd3d8048..e0b908837a0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c index c8f56772218..fd730c85153 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adda_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c index a61eec9712e..5dcdc54b007 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrb.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c index 970485bd67d..d9d16ce3f7d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrd.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c index d06f51fe35b..a358c240389 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrh.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c index b23f25a1125..bd1e9af0a6d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/adrw.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c index b1d98fbf536..4bb2912a45a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bfmmla_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-additional-options "-march=armv8.2-a+sve+bf16" } */ /* { dg-require-effective-target aarch64_asm_bf16_ok } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c index 2e80d6830ca..d261ec00b92 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c index e0bc33efec2..024b0510faa 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c index e4634982bf6..0b32dfb609c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c index 71cb97b8a2a..38688dbca73 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c index 954329a0b2f..a3e89cc97a1 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c index ec664845f4a..602ab048c99 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/compact_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c index 5a5411e46cb..87c26e6ea6b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c index 4ded1c5756e..5e9839537c7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c index c31f9ccb5b2..b117df2a4b1 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/expa_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c index 00b68ff290c..8b972f61b49 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c index 47127960c0d..413d4d62d4e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c index 9b6335547f5..b3df7d154cf 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c index c9cea3ad8c7..0da1e52966b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c index 2cccc8d4906..a3304c4197a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c index 6ee1d48ab0c..73ef94805dc 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c index cb1801778d4..fe909b666c9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_bf16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c index 86081edbd65..30ba3063900 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c index c8df00f8a02..cf62fada91a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c index 2fb9d5b7486..b9fde4dac69 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c index 3cd211b1646..35b7dd1d27e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c index 44b16ed5f72..57b6a6567c0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c index 3aa9a15eeee..bd7e28478e2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c index 49aff5146f2..1438000038e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c index 00bf9e129f5..145b0b7f3aa 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c index 9e9b3290a12..9f150631b94 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c index 64ec628714b..8dd75d13607 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c index 22701320bf7..f154545868b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ro_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ /* { dg-additional-options "-march=armv8.6-a+f64mm" } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c index 16a5316a9e4..06249ad4c5c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c index 3f953247ea1..8d141e133e6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c index 424de65a6fe..77836cbf652 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c index aa375bea2e3..f4b24ab419a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sb_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c index ed07b4dfcfa..1b978236845 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c index 20ca4272059..2009dec812e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c index e3a85a23fb6..0e1d4896665 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c index 3a0094fba59..115d7d3a996 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c index 4d076b4861a..5dc44421ca4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c index ffa85eb3e73..fac4ec41c00 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1sw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c index a9c4182659e..f57df42266d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c index 99af86ddf82..0c069fa4f44 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c index 77c7e0a2dff..98102e01393 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c index b605f8b67e3..f86a34d1248 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1ub_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c index 84fb5c335d7..13937187895 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c index 44700179322..f0338aae6b4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c index 09d3cc8c298..5810bc0accb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c index f3dcf03cd81..52e95abb9b4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c index f4e9d5db970..0889eefdddd 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c index 854d19233f5..fb144d756ab 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ld1uw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c index 80f6468700e..1f997480ea8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_bf16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c index 13ce863c96a..60405d0a0ed 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c index 2fcc633906c..225e9969dd2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c index cc15b927aba..366e36afdbe 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c index 7e330c04221..b84b9bcdda7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c index d0e47f0bf19..e779b071283 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c index 66bf0f74630..17e0f9aa2d8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c index faf71bf9dd5..030f187b152 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c index 41c7dc9cf31..fb86530166f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c index 8b53ce94f85..5be30a2d842 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c index 1d5fde0e639..61d242c074b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c index 97a36e88499..afe748ef939 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c index c018a4c1ca6..bee22285539 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c index cf620d1f4b0..ccaac2ca4eb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c index 1fa819296cb..c8416f99df9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c index 5224ec40ac8..ec26a82ca19 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c index 18e87f2b805..e211f179486 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c index 83883fca43a..24dfe452f03 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c index c2a676807a5..f7e3977bfcf 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c index 2f2a04d24bb..7f2a829a8e4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c index e3e83a205cb..685f628088d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c index 769f2c266e9..49a7a85367f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c index e0a748c6a6b..1d30c7ba618 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c index 86716da9ba1..c2b3f42cb5b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c index e7a4aa6e93d..585a6241e0b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c index 69ba96d52e2..ebb2f0f66f0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c index e1a1873f0a4..f4ea96cf91c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c index 0a49cbcc07f..e3735239c4e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sb_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c index b633335dc71..67e70361b5c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c index 32a4309b633..5755c79bc1a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c index 73a9be8923b..a5848999573 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c index 94ea73b6306..b1875120980 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c index 81b64e836b8..bffac936527 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c index 453b3ff244a..a4acb1e5ea9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c index bbbed79dc35..828288cd825 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c index 5430e256b46..e3432c46c27 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sh_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c index e5da8a83dc3..78aa34ec055 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c index 41142875673..9dad1212c81 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c index d795ace6391..33b6c10ddc5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c index 6caf2f5045d..e8c9c845f95 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1sw_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c index af0be08d21c..b1c9c81357f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c index 43124dd8930..9ab776a218f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c index 90c4e58a275..745740dfa3f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c index 302623a400b..3a7bd6a436b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c index 88ad2d1dc61..ade0704f7ad 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c index e8e06411f98..5d3e0ce95e5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c index 21d02ddb721..08ae802ee26 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c index 904cb027e3e..d8dc5e15738 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c index a400123188b..042ae5a9f02 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c index a9a98a68362..d0844fa5197 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1ub_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c index d02e443428a..12460105d0e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c index 663a73d2715..536331371b0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c index 5e0ef067f54..602e6a686e6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c index 1cfae1b9532..4b307b3416e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c index abb3d769a74..db205b1ef7b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c index 6e330e8e8a8..0eac877eb82 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c index 4eb5323e957..266ecf167fe 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c index ebac26e7d37..bdd725e4a35 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uh_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c index 6c0daea52b5..ab2c79da782 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c index 0e400c6790f..361d7de05d8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c index ac97798991c..8adcec3d512 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c index c7ab0617106..781fc1a9c66 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldff1uw_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c index 947a896e778..93b4425ecb5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_bf16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c index cf017868839..d47d748c76c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c index 83b73ec8e09..e390d685797 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c index 778096e826b..97a0e39e7c8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c index 592c8237de3..21008d7f9ca 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c index 634092af8ea..8a3d795b309 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c index 4a03f66767a..c0b57a2f3fc 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c index 162ee176ad5..6714152d93c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c index e920ac43b45..3df404d77bb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c index 65e28c5c206..e899a4a6ff4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c index 70d3f27d87a..ab69656cfa8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c index 5c29f1d196a..5d7b074973e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c index e04b9a7887f..5b53c885d6a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c index 0553fc98da4..992eba7cc2f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c index 61a474fdf52..99e0f8bd091 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c index be63d8bf9b2..fe23913f23c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c index 4f52490b4a8..6deb39770a1 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c index 73f50d182a5..e76457da6cd 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sb_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c index 08c7dc6dd4d..e49a7f8ed49 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c index 6a41bc26b7f..00b40281c24 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c index 2f7718730f1..41560af330f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c index d7f1a68a4cd..0acf4b34916 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sh_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c index 5b483e4aa1d..5782128982c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c index 62121ce0a44..8249c4c3f79 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1sw_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c index 8fe13411f31..e59c451f790 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c index 50122e3b786..d788576e275 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c index d7cce11b60c..b21fdb96491 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c index 7bf82c3b6c0..1ae41b002ff 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c index e2fef064b47..e3d8fb3b5f0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c index 57c61e122ac..df9a0c07fa7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1ub_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c index ed9686c4ed5..c3467d84675 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c index a3107f562b8..bf3355e9986 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c index 93d5abaf76e..bcc3eb3fd8f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c index 32d36a84ce3..4c01c13ac3f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uh_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c index 373922791d0..3c655659115 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c index b3c3be1d01f..b222a0dc648 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ldnf1uw_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c index f66dbf397c4..e1c7f47dc96 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-require-effective-target aarch64_asm_f32mm_ok } */ /* { dg-additional-options "-march=armv8.2-a+f32mm" } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c index 49dc0607cff..c45caa70001 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-require-effective-target aarch64_asm_f64mm_ok } */ /* { dg-additional-options "-march=armv8.2-a+f64mm" } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c index e7ce009acfc..dc155461c61 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-require-effective-target aarch64_asm_i8mm_ok } */ /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c index 81f5166fbf9..43d601a471d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mmla_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-require-effective-target aarch64_asm_i8mm_ok } */ /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c index c4bfbbbf7d7..f32cfbfcb19 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfb_gather.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c index a84acb1a106..8a4293b6253 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfd_gather.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c index 04b7a15758c..6beca4b8e0f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfh_gather.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c index 2bbae1b9e02..6af44ac8290 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/prfw_gather.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c index 5564e967fcf..7e28ef6412f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/rdffr_1.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c index cb6774ad04f..1efd4344532 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c index fe978bbe5f1..f50c43e8309 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c index d244e701a81..bb6fb10b83f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c index 5c4ebf440bc..19ec78e9e6e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c index fe3f7259f24..57fbb91b0ef 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c index 23212356625..60018be5b80 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c index d59033356be..fb1bb29dbe2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c index c7a35f1b470..65ee9a071fd 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c index e098cb9b77e..ceec6193952 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c index 058d1313fc2..aeedbc6d7a7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1b_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c index 2a23d41f3a1..2d69d085bc0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c index 6a1adb05609..3e5733ef9bb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c index 12197315d09..5cd330a3dec 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c index 7021ea68f49..0ee9948cb4e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1h_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c index 2363f592b19..f18bedce1ca 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c index 767c009b4f7..6850865ec9a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st1w_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h index 2da61ff5c0b..d8916809b8e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/test_sve_acle.h @@ -11,10 +11,17 @@ #error "Please define -DTEST_OVERLOADS or -DTEST_FULL" #endif +#ifdef STREAMING_COMPATIBLE +#define ATTR __arm_streaming_compatible +#else +#define ATTR +#endif + #ifdef __cplusplus -#define PROTO(NAME, RET, ARGS) extern "C" RET NAME ARGS; RET NAME ARGS +#define PROTO(NAME, RET, ARGS) \ + extern "C" RET NAME ARGS ATTR; RET NAME ARGS ATTR #else -#define PROTO(NAME, RET, ARGS) RET NAME ARGS +#define PROTO(NAME, RET, ARGS) RET NAME ARGS ATTR #endif #define TEST_UNIFORM_Z(NAME, TYPE, CODE1, CODE2) \ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c index 3a00716e37f..c0b03a0d331 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c index b73d420fbac..8eef8a12ca8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c index fc31928a6c3..5c96c55796c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tmad_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c index 94bc696eb07..9deed667f89 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c index d0ec91882d2..749ea8664be 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c index 23e0da3f7a0..053abcb26e9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tsmul_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c index e7c3ea03b81..3ab251fe04a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c index 022573a191d..6c6471c5e56 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c index ffcdf4224b3..9559e0f352d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/tssel_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c index 9440f3fd919..a0dd7e334aa 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/usmmla_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-require-effective-target aarch64_asm_i8mm_ok } */ /* { dg-additional-options "-march=armv8.2-a+sve+i8mm" } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp index 0ad6463d832..f62782ef40b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp @@ -52,6 +52,7 @@ if { [info exists gcc_runtest_parallelize_limit_minor] } { torture-init set-torture-options { "-std=c90 -O0 -g" + "-std=c90 -O0 -DSTREAMING_COMPATIBLE" "-std=c90 -O1 -g" "-std=c99 -O2 -g" "-std=c11 -O3 -g" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c index 384b6ffc9aa..65ba09471ac 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesd_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c index 6381bce1661..f902c3c1d32 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aese_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c index 76259326467..dab06b79a95 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c index 30e83d381dc..7e7cc65be5d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c index 14230850f70..c1a4e10614f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c index 7f08df4baa2..4f14cc4c432 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c index 7f7cbbeebad..091253ec60b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c index b420323b906..deb1ad27d90 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bdep_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c index 50a647918e5..9efa501efa8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c index 9f98b843c1a..18963da5bd3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c index 9dbaec1b762..91591f93b88 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c index 81ed5a463a0..1211587ef41 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bext_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c index 70aeae3f329..72868bea7f6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c index 6e19e38d897..c8923816fe4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c index 27fa40f4777..86989529faf 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c index b667e03e3a4..5cd941a7a6e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/bgrp_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c index 7bf783a7c18..53d6c5c5636 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c index 001f5f0f187..c6d9862e31f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c index d93091adc55..cb11a00261b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c index 3b889802395..0bb06cdb45d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histcnt_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c index 380ccdf85a5..ce3458e5ef6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c index f43292f0ccd..7b1eff811c5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/histseg_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c index 102810e25c8..17e3673a4a7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c index a0ed71227e8..8ce32e9f9ff 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c index 94c64971c77..b7e1d7a99c8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c index a0aa6703f9c..b0789ad21ce 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c index e1479684e82..df09eaa7680 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c index 77cdcfebafe..5f185ea824b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c index bb729483fcd..71fece575d9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c index de5b693140c..1183e72f0fb 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c index d01ec18e442..4d5e6e7716f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c index b96e94353f1..ed329a23f19 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sb_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c index 1dcfbc0fb95..6dbd6cea0f6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c index 4166ed0a6c8..4ea3335a29f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c index 7680344da28..d5545151994 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c index 2427c83ab67..18c8ca44e7b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c index 2f538e847c2..41bff31d021 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c index ace1c2f2fe5..30b8f6948f7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1sw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c index d3b29eb193d..8750d11af0f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c index 3bc406620d7..f7981991a6a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c index 0af4b40b851..4d5ee4ef4ef 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c index fe28d78ed46..005c29c0644 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1ub_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c index 985432615ca..92613b16685 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c index 3c5baeee60e..be2e6d126e8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c index 4d945e9f994..4d122059f72 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c index 680238ac4f7..e3bc1044cd7 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uh_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c index 787ae9defb2..9efa4b2cbf0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c index 4810bc3c45c..4ded4454df1 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/ldnt1uw_gather_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c index baebc7693c6..d0ce8129475 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c index f35a753791d..03473906aa2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c index 0bdf4462f3d..2a8b4d250ab 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c index 6d78692bdb4..8409276d905 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/match_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c index 935b19a1040..044ba1de397 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c index 8a00b30f308..6c2d890fa41 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_s8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c index 868c20a11e5..863e31054e2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u16.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c index af6b5816513..a62783db763 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/nmatch_u8.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c index 944609214a1..1fd85e0ce80 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullb_pair_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c index 90e2e991f9b..300d885abb0 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/pmullt_pair_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c index ea80d40dbdf..9dbc7183992 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c index b237c7edd5a..5caa2a5443b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/rax1_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c index cf6a2a95235..96c20dcaac4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c index 58ad33c5ddb..e72384108e6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/sm4ekey_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c index 3f928e20eac..75539f6928f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c index 8a35c76b90a..c0d47d0c13f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_f64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c index bd600268228..80fb3e8695b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c index 0bfa2616ef5..edd2bc41832 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c index fbfa008c1d5..a6e5059def9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c index c283135c4ec..067e5b109c3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c index bf6ba597362..498fe82e5c2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c index a24d0c89c76..614f5fb1a49 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c index 2b05a7720bd..ce2c482afbd 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c index a13c5f5bb9d..593dc193975 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1b_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c index 4e012f61f34..b9d06c1c5ab 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c index e934a708d89..006e0e24dec 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c index db21821eb58..8cd7cb86ab3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u32.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c index 53f930da1fc..972ee36896b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1h_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c index ec6c837d907..368a17c4769 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_s64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c index 3c5d96de4f8..57d60a350de 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/stnt1w_scatter_u64.c @@ -1,3 +1,4 @@ +/* { dg-skip-if "" { *-*-* } { "-DSTREAMING_COMPATIBLE" } { "" } } */ /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" { target { ! ilp32 } } } } */ #include "test_sve_acle.h" From patchwork Fri Nov 17 17:26:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865163 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3kv3Zvhz1yS7 for ; Sat, 18 Nov 2023 04:27:51 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E15CD385C6F4 for ; Fri, 17 Nov 2023 17:27:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 45CFB3858018 for ; Fri, 17 Nov 2023 17:26:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 45CFB3858018 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 45CFB3858018 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241995; cv=none; b=G/k2jFftJSv8kmL02x7qUIdcePGGljnHjT8/K9IGY3NfzpguIp6tt5NPOLGkjwaUvXKY7kVc+FEfQbqvkbs78SOmqYjq17azVn3Dm0VMYdiP0tzEWanYzBpb7pKTWE2B1KJNRA+GSIt3y7zU2PWFbbx9zQ0Vi/mkzBsXtPgvmco= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700241995; c=relaxed/simple; bh=3u9GDcrbVfH+HhcvKmEAdIT4VV5kI7uJK3afg42hDRE=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=qul+48IpGNq8+rEGQiAXaa4nekx58o97WOk0tl+kXEtzOvC5zOQ1a79ltCJDEw5OeVukUQfB24ST13qtqGlGtQQ4K2rK0E92LfTpc+QW484JfneIiY9r1jA6g1JELtDbKuo1RRRHIG8H++/NqJeVjpRcbbOHIdgXmfSVRnNFPzU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EFB5C1477 for ; Fri, 17 Nov 2023 09:27:14 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 382883F73F for ; Fri, 17 Nov 2023 09:26:28 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 11/21] aarch64: Switch PSTATE.SM around calls References: Date: Fri, 17 Nov 2023 17:26:26 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds support for switching to the appropriate SME mode for each call. Switching to streaming mode requires an SMSTART SM instruction and switching to non-streaming mode requires an SMSTOP SM instruction. If the call is being made from streaming-compatible code, these switches are conditional on the current mode being the opposite of the one that the call needs. Since changing PSTATE.SM changes the vector length and effectively changes the ISA, the code to do the switching has to be emitted late. The patch does this using a new pass that runs next to late prologue/ epilogue insertion. (It doesn't use md_reorg because later additions need the CFG.) If a streaming-compatible function needs to switch mode for a call, it must restore the original mode afterwards. The old mode must therefore be available immediately after the call. The easiest way of ensuring this is to force the use of a hard frame pointer and ensure that the old state is saved at an in-range offset from there. Changing modes clobbers the Z and P registers, so we need to save and restore live Z and P state around each mode switch. However, mode switches are not expected to be performance critical, so it seemed better to err on the side of being correct rather than trying to optimise the save and restore with surrounding code. gcc/ * config/aarch64/aarch64-passes.def (pass_late_thread_prologue_and_epilogue): New pass. * config/aarch64/aarch64-sme.md: New file. * config/aarch64/aarch64.md: Include it. (*tb1): Rename to... (@aarch64_tb): ...this. (call, call_value, sibcall, sibcall_value): Don't require operand 2 to be a CONST_INT. * config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return the insn. (make_pass_switch_sm_state): Declare. * config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro. (CALL_USED_REGISTER): Mark VG as call-preserved. (aarch64_frame::old_svcr_offset): New member variable. (machine_function::call_switches_sm_state): Likewise. (CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise. (CUMULATIVE_ARGS::sme_mode_switch_args): Likewise. * config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h. (aarch64_cfun_incoming_pstate_sm): New function. (aarch64_call_switches_pstate_sm): Likewise. (aarch64_reg_save_mode): Return DImode for VG_REGNUM. (aarch64_callee_isa_mode): New function. (aarch64_insn_callee_isa_mode): Likewise. (aarch64_guard_switch_pstate_sm): Likewise. (aarch64_switch_pstate_sm): Likewise. (aarch64_sme_mode_switch_regs): New class. (aarch64_record_sme_mode_switch_args): New function. (aarch64_finish_sme_mode_switch_args): Likewise. (aarch64_function_arg): Handle the end marker by returning a PARALLEL that contains the ABI cookie that we used previously alongside the result of aarch64_finish_sme_mode_switch_args. (aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args. (aarch64_function_arg_advance): If a call would switch SM state, record all argument registers that would need to be saved around the mode switch. (aarch64_need_old_pstate_sm): New function. (aarch64_layout_frame): Decide whether the frame needs to store the incoming value of PSTATE.SM and allocate a save slot for it if so. If a function switches SME state, arrange to save the old value of the DWARF VG register. Handle the case where this is the only register save slot above the FP. (aarch64_save_callee_saves): Handles saves of the DWARF VG register. (aarch64_get_separate_components): Prevent such saves from being shrink-wrapped. (aarch64_old_svcr_mem): New function. (aarch64_read_old_svcr): Likewise. (aarch64_guard_switch_pstate_sm): Likewise. (aarch64_expand_prologue): Handle saves of the DWARF VG register. Initialize any SVCR save slot. (aarch64_expand_call): Allow the cookie to be PARALLEL that contains both the UNSPEC_CALLEE_ABI value and a list of registers that need to be preserved across a change to PSTATE.SM. If the call does involve such a change to PSTATE.SM, record the registers that would be clobbered by this process. Also emit an instruction to mark the temporary change in VG. Update call_switches_pstate_sm. (aarch64_emit_call_insn): Return the emitted instruction. (aarch64_frame_pointer_required): New function. (aarch64_conditional_register_usage): Prevent VG_REGNUM from being treated as a register operand. (aarch64_switch_pstate_sm_for_call): New function. (pass_data_switch_pstate_sm): New pass variable. (pass_switch_pstate_sm): New pass class. (make_pass_switch_pstate_sm): New function. (TARGET_FRAME_POINTER_REQUIRED): Define. * config/aarch64/t-aarch64 (s-check-sve-md): Add aarch64-sme.md. gcc/testsuite/ * gcc.target/aarch64/sme/call_sm_switch_1.c: New test. * gcc.target/aarch64/sme/call_sm_switch_2.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_4.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_6.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_7.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_9.c: Likewise. * gcc.target/aarch64/sme/call_sm_switch_10.c: Likewise. --- gcc/config/aarch64/aarch64-passes.def | 1 + gcc/config/aarch64/aarch64-protos.h | 3 +- gcc/config/aarch64/aarch64-sme.md | 171 ++++ gcc/config/aarch64/aarch64.cc | 883 +++++++++++++++++- gcc/config/aarch64/aarch64.h | 25 +- gcc/config/aarch64/aarch64.md | 13 +- gcc/config/aarch64/t-aarch64 | 5 +- .../gcc.target/aarch64/sme/call_sm_switch_1.c | 233 +++++ .../aarch64/sme/call_sm_switch_10.c | 37 + .../gcc.target/aarch64/sme/call_sm_switch_2.c | 43 + .../gcc.target/aarch64/sme/call_sm_switch_3.c | 166 ++++ .../gcc.target/aarch64/sme/call_sm_switch_4.c | 43 + .../gcc.target/aarch64/sme/call_sm_switch_5.c | 318 +++++++ .../gcc.target/aarch64/sme/call_sm_switch_6.c | 45 + .../gcc.target/aarch64/sme/call_sm_switch_7.c | 516 ++++++++++ .../gcc.target/aarch64/sme/call_sm_switch_8.c | 87 ++ .../gcc.target/aarch64/sme/call_sm_switch_9.c | 103 ++ 17 files changed, 2668 insertions(+), 24 deletions(-) create mode 100644 gcc/config/aarch64/aarch64-sme.md create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def index 6ace797b738..662a13fd5e6 100644 --- a/gcc/config/aarch64/aarch64-passes.def +++ b/gcc/config/aarch64/aarch64-passes.def @@ -20,6 +20,7 @@ INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering); INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation); +INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstate_sm); INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index abc94e482af..d3a2c693f85 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -913,7 +913,7 @@ void aarch64_sve_expand_vector_init (rtx, rtx); void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, const_tree, unsigned, bool = false); void aarch64_init_expanders (void); -void aarch64_emit_call_insn (rtx); +rtx_call_insn *aarch64_emit_call_insn (rtx); void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); @@ -1054,6 +1054,7 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context *); rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *); rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt); rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt); +rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt); poly_uint64 aarch64_regmode_natural_size (machine_mode); diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md new file mode 100644 index 00000000000..52427b4f17a --- /dev/null +++ b/gcc/config/aarch64/aarch64-sme.md @@ -0,0 +1,171 @@ +;; Machine description for AArch64 SME. +;; Copyright (C) 2023 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +;; The file is organised into the following sections (search for the full +;; line): +;; +;; == State management +;; ---- Test current state +;; ---- PSTATE.SM management + +;; ========================================================================= +;; == State management +;; ========================================================================= +;; +;; Many of the instructions in this section are only valid when SME is +;; present. However, they don't have a TARGET_SME condition since +;; (a) they are only emitted under direct control of aarch64 code and +;; (b) they are sometimes used conditionally, particularly in streaming- +;; compatible code. +;; +;; ========================================================================= + +;; ------------------------------------------------------------------------- +;; ---- Test current state +;; ------------------------------------------------------------------------- + +(define_c_enum "unspec" [ + UNSPEC_OLD_VG_SAVED + UNSPEC_UPDATE_VG + UNSPEC_GET_SME_STATE + UNSPEC_READ_SVCR +]) + +;; A marker instruction to say that the old value of the DWARF VG register +;; has been saved to the stack, for CFI purposes. Operand 0 is the old +;; value of the register and operand 1 is the save slot. +(define_insn "aarch64_old_vg_saved" + [(set (reg:DI VG_REGNUM) + (unspec:DI [(match_operand 0) + (match_operand 1)] UNSPEC_OLD_VG_SAVED))] + "" + "" + [(set_attr "type" "no_insn")] +) + +;; A marker to indicate places where a call temporarily changes VG. +(define_insn "aarch64_update_vg" + [(set (reg:DI VG_REGNUM) + (unspec:DI [(reg:DI VG_REGNUM)] UNSPEC_UPDATE_VG))] + "" + "" + [(set_attr "type" "no_insn")] +) + +(define_insn "aarch64_get_sme_state" + [(set (reg:TI R0_REGNUM) + (unspec_volatile:TI [(const_int 0)] UNSPEC_GET_SME_STATE)) + (clobber (reg:DI R16_REGNUM)) + (clobber (reg:DI R17_REGNUM)) + (clobber (reg:DI R18_REGNUM)) + (clobber (reg:DI R30_REGNUM)) + (clobber (reg:CC CC_REGNUM))] + "" + "bl\t__arm_sme_state" +) + +(define_insn "aarch64_read_svcr" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec_volatile:DI [(const_int 0)] UNSPEC_READ_SVCR))] + "" + "mrs\t%0, svcr" +) + +;; ------------------------------------------------------------------------- +;; ---- PSTATE.SM management +;; ------------------------------------------------------------------------- +;; Includes: +;; - SMSTART SM +;; - SMSTOP SM +;; ------------------------------------------------------------------------- + +(define_c_enum "unspec" [ + UNSPEC_SMSTART_SM + UNSPEC_SMSTOP_SM +]) + +;; Turn on streaming mode. This clobbers all SVE state. +;; +;; Depend on VG_REGNUM to ensure that the VG save slot has already been +;; initialized. +(define_insn "aarch64_smstart_sm" + [(unspec_volatile [(const_int 0)] UNSPEC_SMSTART_SM) + (use (reg:DI VG_REGNUM)) + (clobber (reg:V4x16QI V0_REGNUM)) + (clobber (reg:V4x16QI V4_REGNUM)) + (clobber (reg:V4x16QI V8_REGNUM)) + (clobber (reg:V4x16QI V12_REGNUM)) + (clobber (reg:V4x16QI V16_REGNUM)) + (clobber (reg:V4x16QI V20_REGNUM)) + (clobber (reg:V4x16QI V24_REGNUM)) + (clobber (reg:V4x16QI V28_REGNUM)) + (clobber (reg:VNx16BI P0_REGNUM)) + (clobber (reg:VNx16BI P1_REGNUM)) + (clobber (reg:VNx16BI P2_REGNUM)) + (clobber (reg:VNx16BI P3_REGNUM)) + (clobber (reg:VNx16BI P4_REGNUM)) + (clobber (reg:VNx16BI P5_REGNUM)) + (clobber (reg:VNx16BI P6_REGNUM)) + (clobber (reg:VNx16BI P7_REGNUM)) + (clobber (reg:VNx16BI P8_REGNUM)) + (clobber (reg:VNx16BI P9_REGNUM)) + (clobber (reg:VNx16BI P10_REGNUM)) + (clobber (reg:VNx16BI P11_REGNUM)) + (clobber (reg:VNx16BI P12_REGNUM)) + (clobber (reg:VNx16BI P13_REGNUM)) + (clobber (reg:VNx16BI P14_REGNUM)) + (clobber (reg:VNx16BI P15_REGNUM))] + "" + "smstart\tsm" +) + +;; Turn off streaming mode. This clobbers all SVE state. +;; +;; Depend on VG_REGNUM to ensure that the VG save slot has already been +;; initialized. +(define_insn "aarch64_smstop_sm" + [(unspec_volatile [(const_int 0)] UNSPEC_SMSTOP_SM) + (use (reg:DI VG_REGNUM)) + (clobber (reg:V4x16QI V0_REGNUM)) + (clobber (reg:V4x16QI V4_REGNUM)) + (clobber (reg:V4x16QI V8_REGNUM)) + (clobber (reg:V4x16QI V12_REGNUM)) + (clobber (reg:V4x16QI V16_REGNUM)) + (clobber (reg:V4x16QI V20_REGNUM)) + (clobber (reg:V4x16QI V24_REGNUM)) + (clobber (reg:V4x16QI V28_REGNUM)) + (clobber (reg:VNx16BI P0_REGNUM)) + (clobber (reg:VNx16BI P1_REGNUM)) + (clobber (reg:VNx16BI P2_REGNUM)) + (clobber (reg:VNx16BI P3_REGNUM)) + (clobber (reg:VNx16BI P4_REGNUM)) + (clobber (reg:VNx16BI P5_REGNUM)) + (clobber (reg:VNx16BI P6_REGNUM)) + (clobber (reg:VNx16BI P7_REGNUM)) + (clobber (reg:VNx16BI P8_REGNUM)) + (clobber (reg:VNx16BI P9_REGNUM)) + (clobber (reg:VNx16BI P10_REGNUM)) + (clobber (reg:VNx16BI P11_REGNUM)) + (clobber (reg:VNx16BI P12_REGNUM)) + (clobber (reg:VNx16BI P13_REGNUM)) + (clobber (reg:VNx16BI P14_REGNUM)) + (clobber (reg:VNx16BI P15_REGNUM))] + "" + "smstop\tsm" +) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index af9f3876532..6d5e9056c65 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -85,6 +85,8 @@ #include "config/arm/aarch-common.h" #include "config/arm/aarch-common-protos.h" #include "ssa.h" +#include "tree-pass.h" +#include "cfgbuild.h" /* This file should be included last. */ #include "target-def.h" @@ -4165,6 +4167,26 @@ aarch64_fndecl_isa_mode (const_tree fndecl) return aarch64_fndecl_pstate_sm (fndecl); } +/* Return the state of PSTATE.SM on entry to the current function. + This might be different from the state of PSTATE.SM in the function + body. */ + +static aarch64_feature_flags +aarch64_cfun_incoming_pstate_sm () +{ + return aarch64_fntype_pstate_sm (TREE_TYPE (cfun->decl)); +} + +/* Return true if a call from the current function to a function with + ISA mode CALLEE_MODE would involve a change to PSTATE.SM around + the BL instruction. */ + +static bool +aarch64_call_switches_pstate_sm (aarch64_feature_flags callee_mode) +{ + return (callee_mode & ~AARCH64_ISA_MODE & AARCH64_FL_SM_STATE) != 0; +} + /* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P. */ static bool @@ -4188,7 +4210,7 @@ aarch64_emit_cfi_for_reg_p (unsigned int regno) static machine_mode aarch64_reg_save_mode (unsigned int regno) { - if (GP_REGNUM_P (regno)) + if (GP_REGNUM_P (regno) || regno == VG_REGNUM) return DImode; if (FP_REGNUM_P (regno)) @@ -4247,6 +4269,16 @@ aarch64_callee_abi (rtx cookie) return function_abis[UINTVAL (cookie) >> AARCH64_NUM_ISA_MODES]; } +/* COOKIE is a CONST_INT from an UNSPEC_CALLEE_ABI rtx. Return the + required ISA mode on entry to the callee, which is also the ISA + mode on return from the callee. */ + +static aarch64_feature_flags +aarch64_callee_isa_mode (rtx cookie) +{ + return UINTVAL (cookie) & AARCH64_FL_ISA_MODES; +} + /* INSN is a call instruction. Return the CONST_INT stored in its UNSPEC_CALLEE_ABI rtx. */ @@ -4269,6 +4301,15 @@ aarch64_insn_callee_abi (const rtx_insn *insn) return aarch64_callee_abi (aarch64_insn_callee_cookie (insn)); } +/* INSN is a call instruction. Return the required ISA mode on entry to + the callee, which is also the ISA mode on return from the callee. */ + +static aarch64_feature_flags +aarch64_insn_callee_isa_mode (const rtx_insn *insn) +{ + return aarch64_callee_isa_mode (aarch64_insn_callee_cookie (insn)); +} + /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves the lower 64 bits of a 128-bit register. Tell the compiler the callee clobbers the top 64 bits when restoring the bottom 64 bits. */ @@ -6482,6 +6523,437 @@ aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p, temp1, temp2, frame_related_p, emit_move_imm); } +/* A streaming-compatible function needs to switch temporarily to the known + PSTATE.SM mode described by LOCAL_MODE. The low bit of OLD_SVCR contains + the runtime state of PSTATE.SM in the streaming-compatible code, before + the start of the switch to LOCAL_MODE. + + Emit instructions to branch around the mode switch if PSTATE.SM already + matches LOCAL_MODE. Return the label that the branch jumps to. */ + +static rtx_insn * +aarch64_guard_switch_pstate_sm (rtx old_svcr, aarch64_feature_flags local_mode) +{ + local_mode &= AARCH64_FL_SM_STATE; + gcc_assert (local_mode != 0); + auto already_ok_cond = (local_mode & AARCH64_FL_SM_ON ? NE : EQ); + auto *label = gen_label_rtx (); + auto *jump = emit_jump_insn (gen_aarch64_tb (already_ok_cond, DImode, DImode, + old_svcr, const0_rtx, label)); + JUMP_LABEL (jump) = label; + return label; +} + +/* Emit code to switch from the PSTATE.SM state in OLD_MODE to the PSTATE.SM + state in NEW_MODE. This is known to involve either an SMSTART SM or + an SMSTOP SM. */ + +static void +aarch64_switch_pstate_sm (aarch64_feature_flags old_mode, + aarch64_feature_flags new_mode) +{ + old_mode &= AARCH64_FL_SM_STATE; + new_mode &= AARCH64_FL_SM_STATE; + gcc_assert (old_mode != new_mode); + + if ((new_mode & AARCH64_FL_SM_ON) + || (new_mode == 0 && (old_mode & AARCH64_FL_SM_OFF))) + emit_insn (gen_aarch64_smstart_sm ()); + else + emit_insn (gen_aarch64_smstop_sm ()); +} + +/* As a side-effect, SMSTART SM and SMSTOP SM clobber the contents of all + FP and predicate registers. This class emits code to preserve any + necessary registers around the mode switch. + + The class uses four approaches to saving and restoring contents, enumerated + by group_type: + + - GPR: save and restore the contents of FP registers using GPRs. + This is used if the FP register contains no more than 64 significant + bits. The registers used are FIRST_GPR onwards. + + - MEM_128: save and restore 128-bit SIMD registers using memory. + + - MEM_SVE_PRED: save and restore full SVE predicate registers using memory. + + - MEM_SVE_DATA: save and restore full SVE vector registers using memory. + + The save slots within each memory group are consecutive, with the + MEM_SVE_PRED slots occupying a region below the MEM_SVE_DATA slots. + + There will only be two mode switches for each use of SME, so they should + not be particularly performance-sensitive. It's also rare for SIMD, SVE + or predicate registers to be live across mode switches. We therefore + don't preallocate the save slots but instead allocate them locally on + demand. This makes the code emitted by the class self-contained. */ + +class aarch64_sme_mode_switch_regs +{ +public: + static const unsigned int FIRST_GPR = R10_REGNUM; + + void add_reg (machine_mode, unsigned int); + void add_call_args (rtx_call_insn *); + void add_call_result (rtx_call_insn *); + + void emit_prologue (); + void emit_epilogue (); + + /* The number of GPRs needed to save FP registers, starting from + FIRST_GPR. */ + unsigned int num_gprs () { return m_group_count[GPR]; } + +private: + enum sequence { PROLOGUE, EPILOGUE }; + enum group_type { GPR, MEM_128, MEM_SVE_PRED, MEM_SVE_DATA, NUM_GROUPS }; + + /* Information about the save location for one FP, SIMD, SVE data, or + SVE predicate register. */ + struct save_location { + /* The register to be saved. */ + rtx reg; + + /* Which group the save location belongs to. */ + group_type group; + + /* A zero-based index of the register within the group. */ + unsigned int index; + }; + + unsigned int sve_data_headroom (); + rtx get_slot_mem (machine_mode, poly_int64); + void emit_stack_adjust (sequence, poly_int64); + void emit_mem_move (sequence, const save_location &, poly_int64); + + void emit_gpr_moves (sequence); + void emit_mem_128_moves (sequence); + void emit_sve_sp_adjust (sequence); + void emit_sve_pred_moves (sequence); + void emit_sve_data_moves (sequence); + + /* All save locations, in no particular order. */ + auto_vec m_save_locations; + + /* The number of registers in each group. */ + unsigned int m_group_count[NUM_GROUPS] = {}; +}; + +/* Record that (reg:MODE REGNO) needs to be preserved around the mode + switch. */ + +void +aarch64_sme_mode_switch_regs::add_reg (machine_mode mode, unsigned int regno) +{ + if (!FP_REGNUM_P (regno) && !PR_REGNUM_P (regno)) + return; + + unsigned int end_regno = end_hard_regno (mode, regno); + unsigned int vec_flags = aarch64_classify_vector_mode (mode); + gcc_assert ((vec_flags & VEC_STRUCT) || end_regno == regno + 1); + for (; regno < end_regno; regno++) + { + machine_mode submode = mode; + if (vec_flags & VEC_STRUCT) + { + if (vec_flags & VEC_SVE_DATA) + submode = SVE_BYTE_MODE; + else if (vec_flags & VEC_PARTIAL) + submode = V8QImode; + else + submode = V16QImode; + } + save_location loc; + loc.reg = gen_rtx_REG (submode, regno); + if (vec_flags == VEC_SVE_PRED) + { + gcc_assert (PR_REGNUM_P (regno)); + loc.group = MEM_SVE_PRED; + } + else + { + gcc_assert (FP_REGNUM_P (regno)); + if (known_le (GET_MODE_SIZE (submode), 8)) + loc.group = GPR; + else if (known_eq (GET_MODE_SIZE (submode), 16)) + loc.group = MEM_128; + else + loc.group = MEM_SVE_DATA; + } + loc.index = m_group_count[loc.group]++; + m_save_locations.quick_push (loc); + } +} + +/* Record that the arguments to CALL_INSN need to be preserved around + the mode switch. */ + +void +aarch64_sme_mode_switch_regs::add_call_args (rtx_call_insn *call_insn) +{ + for (rtx node = CALL_INSN_FUNCTION_USAGE (call_insn); + node; node = XEXP (node, 1)) + { + rtx item = XEXP (node, 0); + if (GET_CODE (item) != USE) + continue; + item = XEXP (item, 0); + if (!REG_P (item)) + continue; + add_reg (GET_MODE (item), REGNO (item)); + } +} + +/* Record that the return value from CALL_INSN (if any) needs to be + preserved around the mode switch. */ + +void +aarch64_sme_mode_switch_regs::add_call_result (rtx_call_insn *call_insn) +{ + rtx pat = PATTERN (call_insn); + gcc_assert (GET_CODE (pat) == PARALLEL); + pat = XVECEXP (pat, 0, 0); + if (GET_CODE (pat) == CALL) + return; + rtx dest = SET_DEST (pat); + if (GET_CODE (dest) == PARALLEL) + for (int i = 0; i < XVECLEN (dest, 0); ++i) + { + rtx x = XVECEXP (dest, 0, i); + gcc_assert (GET_CODE (x) == EXPR_LIST); + rtx reg = XEXP (x, 0); + add_reg (GET_MODE (reg), REGNO (reg)); + } + else + add_reg (GET_MODE (dest), REGNO (dest)); +} + +/* Emit code to save registers before the mode switch. */ + +void +aarch64_sme_mode_switch_regs::emit_prologue () +{ + emit_sve_sp_adjust (PROLOGUE); + emit_sve_pred_moves (PROLOGUE); + emit_sve_data_moves (PROLOGUE); + emit_mem_128_moves (PROLOGUE); + emit_gpr_moves (PROLOGUE); +} + +/* Emit code to restore registers after the mode switch. */ + +void +aarch64_sme_mode_switch_regs::emit_epilogue () +{ + emit_gpr_moves (EPILOGUE); + emit_mem_128_moves (EPILOGUE); + emit_sve_pred_moves (EPILOGUE); + emit_sve_data_moves (EPILOGUE); + emit_sve_sp_adjust (EPILOGUE); +} + +/* The SVE predicate registers are stored below the SVE data registers, + with the predicate save area being padded to a data-register-sized + boundary. Return the size of this padded area as a whole number + of data register slots. */ + +unsigned int +aarch64_sme_mode_switch_regs::sve_data_headroom () +{ + return CEIL (m_group_count[MEM_SVE_PRED], 8); +} + +/* Return a memory reference of mode MODE to OFFSET bytes from the + stack pointer. */ + +rtx +aarch64_sme_mode_switch_regs::get_slot_mem (machine_mode mode, + poly_int64 offset) +{ + rtx addr = plus_constant (Pmode, stack_pointer_rtx, offset); + return gen_rtx_MEM (mode, addr); +} + +/* Allocate or deallocate SIZE bytes of stack space: SEQ decides which. */ + +void +aarch64_sme_mode_switch_regs::emit_stack_adjust (sequence seq, + poly_int64 size) +{ + if (seq == PROLOGUE) + size = -size; + emit_insn (gen_rtx_SET (stack_pointer_rtx, + plus_constant (Pmode, stack_pointer_rtx, size))); +} + +/* Save or restore the register in LOC, whose slot is OFFSET bytes from + the stack pointer. SEQ chooses between saving and restoring. */ + +void +aarch64_sme_mode_switch_regs::emit_mem_move (sequence seq, + const save_location &loc, + poly_int64 offset) +{ + rtx mem = get_slot_mem (GET_MODE (loc.reg), offset); + if (seq == PROLOGUE) + emit_move_insn (mem, loc.reg); + else + emit_move_insn (loc.reg, mem); +} + +/* Emit instructions to save or restore the GPR group. SEQ chooses between + saving and restoring. */ + +void +aarch64_sme_mode_switch_regs::emit_gpr_moves (sequence seq) +{ + for (auto &loc : m_save_locations) + if (loc.group == GPR) + { + gcc_assert (loc.index < 8); + rtx gpr = gen_rtx_REG (GET_MODE (loc.reg), FIRST_GPR + loc.index); + if (seq == PROLOGUE) + emit_move_insn (gpr, loc.reg); + else + emit_move_insn (loc.reg, gpr); + } +} + +/* Emit instructions to save or restore the MEM_128 group. SEQ chooses + between saving and restoring. */ + +void +aarch64_sme_mode_switch_regs::emit_mem_128_moves (sequence seq) +{ + HOST_WIDE_INT count = m_group_count[MEM_128]; + if (count == 0) + return; + + auto sp = stack_pointer_rtx; + auto sp_adjust = (seq == PROLOGUE ? -count : count) * 16; + + /* Pick a common mode that supports LDR & STR with pre/post-modification + and LDP & STP with pre/post-modification. */ + auto mode = TFmode; + + /* An instruction pattern that should be emitted at the end. */ + rtx last_pat = NULL_RTX; + + /* A previous MEM_128 location that hasn't been handled yet. */ + save_location *prev_loc = nullptr; + + /* Look for LDP/STPs and record any leftover LDR/STR in PREV_LOC. */ + for (auto &loc : m_save_locations) + if (loc.group == MEM_128) + { + if (!prev_loc) + { + prev_loc = &loc; + continue; + } + gcc_assert (loc.index == prev_loc->index + 1); + + /* The offset of the base of the save area from the current + stack pointer. */ + HOST_WIDE_INT bias = 0; + if (prev_loc->index == 0 && seq == PROLOGUE) + bias = sp_adjust; + + /* Get the two sets in the LDP/STP. */ + rtx ops[] = { + gen_rtx_REG (mode, REGNO (prev_loc->reg)), + get_slot_mem (mode, prev_loc->index * 16 + bias), + gen_rtx_REG (mode, REGNO (loc.reg)), + get_slot_mem (mode, loc.index * 16 + bias) + }; + unsigned int lhs = (seq == PROLOGUE); + rtx set1 = gen_rtx_SET (ops[lhs], ops[1 - lhs]); + rtx set2 = gen_rtx_SET (ops[lhs + 2], ops[3 - lhs]); + + /* Combine the sets with any stack allocation/deallocation. */ + rtvec vec; + if (prev_loc->index == 0) + { + rtx plus_sp = plus_constant (Pmode, sp, sp_adjust); + vec = gen_rtvec (3, gen_rtx_SET (sp, plus_sp), set1, set2); + } + else + vec = gen_rtvec (2, set1, set2); + rtx pat = gen_rtx_PARALLEL (VOIDmode, vec); + + /* Queue a deallocation to the end, otherwise emit the + instruction now. */ + if (seq == EPILOGUE && prev_loc->index == 0) + last_pat = pat; + else + emit_insn (pat); + prev_loc = nullptr; + } + + /* Handle any leftover LDR/STR. */ + if (prev_loc) + { + rtx reg = gen_rtx_REG (mode, REGNO (prev_loc->reg)); + rtx addr; + if (prev_loc->index != 0) + addr = plus_constant (Pmode, sp, prev_loc->index * 16); + else if (seq == PROLOGUE) + { + rtx allocate = plus_constant (Pmode, sp, -count * 16); + addr = gen_rtx_PRE_MODIFY (Pmode, sp, allocate); + } + else + { + rtx deallocate = plus_constant (Pmode, sp, count * 16); + addr = gen_rtx_POST_MODIFY (Pmode, sp, deallocate); + } + rtx mem = gen_rtx_MEM (mode, addr); + if (seq == PROLOGUE) + emit_move_insn (mem, reg); + else + emit_move_insn (reg, mem); + } + + if (last_pat) + emit_insn (last_pat); +} + +/* Allocate or deallocate the stack space needed by the SVE groups. + SEQ chooses between allocating and deallocating. */ + +void +aarch64_sme_mode_switch_regs::emit_sve_sp_adjust (sequence seq) +{ + if (unsigned int count = m_group_count[MEM_SVE_DATA] + sve_data_headroom ()) + emit_stack_adjust (seq, count * BYTES_PER_SVE_VECTOR); +} + +/* Save or restore the MEM_SVE_DATA group. SEQ chooses between saving + and restoring. */ + +void +aarch64_sme_mode_switch_regs::emit_sve_data_moves (sequence seq) +{ + for (auto &loc : m_save_locations) + if (loc.group == MEM_SVE_DATA) + { + auto index = loc.index + sve_data_headroom (); + emit_mem_move (seq, loc, index * BYTES_PER_SVE_VECTOR); + } +} + +/* Save or restore the MEM_SVE_PRED group. SEQ chooses between saving + and restoring. */ + +void +aarch64_sme_mode_switch_regs::emit_sve_pred_moves (sequence seq) +{ + for (auto &loc : m_save_locations) + if (loc.group == MEM_SVE_PRED) + emit_mem_move (seq, loc, loc.index * BYTES_PER_SVE_PRED); +} + /* Set DEST to (vec_series BASE STEP). */ static void @@ -8180,6 +8652,40 @@ on_stack: return; } +/* Add the current argument register to the set of those that need + to be saved and restored around a change to PSTATE.SM. */ + +static void +aarch64_record_sme_mode_switch_args (CUMULATIVE_ARGS *pcum) +{ + subrtx_var_iterator::array_type array; + FOR_EACH_SUBRTX_VAR (iter, array, pcum->aapcs_reg, NONCONST) + { + rtx x = *iter; + if (REG_P (x) && (FP_REGNUM_P (REGNO (x)) || PR_REGNUM_P (REGNO (x)))) + { + unsigned int i = pcum->num_sme_mode_switch_args++; + gcc_assert (i < ARRAY_SIZE (pcum->sme_mode_switch_args)); + pcum->sme_mode_switch_args[i] = x; + } + } +} + +/* Return a parallel that contains all the registers that need to be + saved around a change to PSTATE.SM. Return const0_rtx if there is + no such mode switch, or if no registers need to be saved. */ + +static rtx +aarch64_finish_sme_mode_switch_args (CUMULATIVE_ARGS *pcum) +{ + if (!pcum->num_sme_mode_switch_args) + return const0_rtx; + + auto argvec = gen_rtvec_v (pcum->num_sme_mode_switch_args, + pcum->sme_mode_switch_args); + return gen_rtx_PARALLEL (VOIDmode, argvec); +} + /* Implement TARGET_FUNCTION_ARG. */ static rtx @@ -8191,7 +8697,13 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg) || pcum->pcs_variant == ARM_PCS_SVE); if (arg.end_marker_p ()) - return aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant); + { + rtx abi_cookie = aarch64_gen_callee_cookie (pcum->isa_mode, + pcum->pcs_variant); + rtx sme_mode_switch_args = aarch64_finish_sme_mode_switch_args (pcum); + return gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, abi_cookie, + sme_mode_switch_args)); + } aarch64_layout_arg (pcum_v, arg); return pcum->aapcs_reg; @@ -8226,6 +8738,7 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum, pcum->aapcs_stack_words = 0; pcum->aapcs_stack_size = 0; pcum->silent_p = silent_p; + pcum->num_sme_mode_switch_args = 0; if (!silent_p && !TARGET_FLOAT @@ -8266,6 +8779,10 @@ aarch64_function_arg_advance (cumulative_args_t pcum_v, aarch64_layout_arg (pcum_v, arg); gcc_assert ((pcum->aapcs_reg != NULL_RTX) != (pcum->aapcs_stack_words != 0)); + if (pcum->aapcs_reg + && aarch64_call_switches_pstate_sm (pcum->isa_mode)) + aarch64_record_sme_mode_switch_args (pcum); + pcum->aapcs_arg_processed = false; pcum->aapcs_ncrn = pcum->aapcs_nextncrn; pcum->aapcs_nvrn = pcum->aapcs_nextnvrn; @@ -8720,6 +9237,30 @@ aarch64_save_regs_above_locals_p () return crtl->stack_protect_guard; } +/* Return true if the current function needs to record the incoming + value of PSTATE.SM. */ +static bool +aarch64_need_old_pstate_sm () +{ + /* Exit early if the incoming value of PSTATE.SM is known at + compile time. */ + if (aarch64_cfun_incoming_pstate_sm () != 0) + return false; + + if (cfun->machine->call_switches_pstate_sm) + for (auto insn = get_insns (); insn; insn = NEXT_INSN (insn)) + if (auto *call = dyn_cast (insn)) + if (!SIBLING_CALL_P (call)) + { + /* Return true if there is a call to a non-streaming-compatible + function. */ + auto callee_isa_mode = aarch64_insn_callee_isa_mode (call); + if (aarch64_call_switches_pstate_sm (callee_isa_mode)) + return true; + } + return false; +} + /* Mark the registers that need to be saved by the callee and calculate the size of the callee-saved registers area and frame record (both FP and LR may be omitted). */ @@ -8753,6 +9294,7 @@ aarch64_layout_frame (void) /* First mark all the registers that really need to be saved... */ for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++) frame.reg_offset[regno] = SLOT_NOT_REQUIRED; + frame.old_svcr_offset = SLOT_NOT_REQUIRED; /* ... that includes the eh data registers (if needed)... */ if (crtl->calls_eh_return) @@ -8905,6 +9447,21 @@ aarch64_layout_frame (void) if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) allocate_gpr_slot (regno); + if (aarch64_need_old_pstate_sm ()) + { + frame.old_svcr_offset = offset; + offset += UNITS_PER_WORD; + } + + /* If the current function changes the SVE vector length, ensure that the + old value of the DWARF VG register is saved and available in the CFI, + so that outer frames with VL-sized offsets can be processed correctly. */ + if (cfun->machine->call_switches_pstate_sm) + { + frame.reg_offset[VG_REGNUM] = offset; + offset += UNITS_PER_WORD; + } + poly_int64 max_int_offset = offset; offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT); bool has_align_gap = maybe_ne (offset, max_int_offset); @@ -8942,8 +9499,6 @@ aarch64_layout_frame (void) if (push_regs.size () > 1) frame.wb_push_candidate2 = push_regs[1]; } - else - gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size)); /* With stack-clash, a register must be saved in non-leaf functions. The saving of the bottommost register counts as an implicit probe, @@ -9051,7 +9606,8 @@ aarch64_layout_frame (void) frame.initial_adjust = frame.frame_size - frame.bytes_below_saved_regs; frame.final_adjust = frame.bytes_below_saved_regs; } - else if (frame.bytes_above_hard_fp.is_constant (&const_above_fp) + else if (frame.wb_push_candidate1 != INVALID_REGNUM + && frame.bytes_above_hard_fp.is_constant (&const_above_fp) && const_above_fp < max_push_offset) { /* Frame with large area below the saved registers, or with SVE saves, @@ -9486,7 +10042,13 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, machine_mode mode = aarch64_reg_save_mode (regno); rtx reg = gen_rtx_REG (mode, regno); + rtx move_src = reg; offset = frame.reg_offset[regno] - bytes_below_sp; + if (regno == VG_REGNUM) + { + move_src = gen_rtx_REG (DImode, IP0_REGNUM); + emit_move_insn (move_src, gen_int_mode (aarch64_sve_vg, DImode)); + } rtx base_rtx = stack_pointer_rtx; poly_int64 sp_offset = offset; @@ -9494,7 +10056,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, if (mode == VNx2DImode && BYTES_BIG_ENDIAN) aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg, offset, ptrue); - else if (GP_REGNUM_P (regno) + else if (GP_REGNUM_P (REGNO (reg)) && (!offset.is_constant (&const_offset) || const_offset >= 512)) { poly_int64 fp_offset = frame.bytes_below_hard_fp - bytes_below_sp; @@ -9517,6 +10079,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, unsigned int regno2; if (!aarch64_sve_mode_p (mode) + && reg == move_src && i + 1 < regs.size () && (regno2 = regs[i + 1], !skip_save_p (regno2)) && known_eq (GET_MODE_SIZE (mode), @@ -9548,17 +10111,24 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, } else if (mode == VNx2DImode && BYTES_BIG_ENDIAN) { - insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg)); + insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, move_src)); need_cfa_note_p = true; } else if (aarch64_sve_mode_p (mode)) - insn = emit_insn (gen_rtx_SET (mem, reg)); + insn = emit_insn (gen_rtx_SET (mem, move_src)); else - insn = emit_move_insn (mem, reg); + insn = emit_move_insn (mem, move_src); RTX_FRAME_RELATED_P (insn) = frame_related_p; if (frame_related_p && need_cfa_note_p) aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset); + else if (frame_related_p && move_src != reg) + add_reg_note (insn, REG_FRAME_RELATED_EXPR, gen_rtx_SET (mem, reg)); + + /* Emit a fake instruction to indicate that the VG save slot has + been initialized. */ + if (regno == VG_REGNUM) + emit_insn (gen_aarch64_old_vg_saved (move_src, mem)); } } @@ -9781,6 +10351,10 @@ aarch64_get_separate_components (void) bitmap_clear_bit (components, frame.hard_fp_save_and_probe); } + /* The VG save sequence needs a temporary GPR. Punt for now on trying + to find one. */ + bitmap_clear_bit (components, VG_REGNUM); + return components; } @@ -10276,6 +10850,47 @@ aarch64_epilogue_uses (int regno) return 0; } +/* The current function's frame has a save slot for the incoming state + of SVCR. Return a legitimate memory for the slot, based on the hard + frame pointer. */ + +static rtx +aarch64_old_svcr_mem () +{ + gcc_assert (frame_pointer_needed + && known_ge (cfun->machine->frame.old_svcr_offset, 0)); + rtx base = hard_frame_pointer_rtx; + poly_int64 offset = (0 + /* hard fp -> bottom of frame. */ + - cfun->machine->frame.bytes_below_hard_fp + /* bottom of frame -> save slot. */ + + cfun->machine->frame.old_svcr_offset); + return gen_frame_mem (DImode, plus_constant (Pmode, base, offset)); +} + +/* The current function's frame has a save slot for the incoming state + of SVCR. Load the slot into register REGNO and return the register. */ + +static rtx +aarch64_read_old_svcr (unsigned int regno) +{ + rtx svcr = gen_rtx_REG (DImode, regno); + emit_move_insn (svcr, aarch64_old_svcr_mem ()); + return svcr; +} + +/* Like the rtx version of aarch64_guard_switch_pstate_sm, but first + load the incoming value of SVCR from its save slot into temporary + register REGNO. */ + +static rtx_insn * +aarch64_guard_switch_pstate_sm (unsigned int regno, + aarch64_feature_flags local_mode) +{ + rtx old_svcr = aarch64_read_old_svcr (regno); + return aarch64_guard_switch_pstate_sm (old_svcr, local_mode); +} + /* AArch64 stack frames generated by this compiler look like: +-------------------------------+ @@ -10490,6 +11105,12 @@ aarch64_expand_prologue (void) aarch64_save_callee_saves (bytes_below_sp, frame.saved_gprs, true, emit_frame_chain); + if (maybe_ge (frame.reg_offset[VG_REGNUM], 0)) + { + unsigned int saved_regs[] = { VG_REGNUM }; + aarch64_save_callee_saves (bytes_below_sp, saved_regs, true, + emit_frame_chain); + } if (maybe_ne (sve_callee_adjust, 0)) { gcc_assert (!flag_stack_clash_protection @@ -10511,6 +11132,40 @@ aarch64_expand_prologue (void) !frame_pointer_needed, true); if (emit_frame_chain && maybe_ne (final_adjust, 0)) aarch64_emit_stack_tie (hard_frame_pointer_rtx); + + /* Save the incoming value of PSTATE.SM, if required. */ + if (known_ge (frame.old_svcr_offset, 0)) + { + rtx mem = aarch64_old_svcr_mem (); + MEM_VOLATILE_P (mem) = 1; + if (TARGET_SME) + { + rtx reg = gen_rtx_REG (DImode, IP0_REGNUM); + emit_insn (gen_aarch64_read_svcr (reg)); + emit_move_insn (mem, reg); + } + else + { + rtx old_r0 = NULL_RTX, old_r1 = NULL_RTX; + auto &args = crtl->args.info; + if (args.aapcs_ncrn > 0) + { + old_r0 = gen_rtx_REG (DImode, PROBE_STACK_FIRST_REGNUM); + emit_move_insn (old_r0, gen_rtx_REG (DImode, R0_REGNUM)); + } + if (args.aapcs_ncrn > 1) + { + old_r1 = gen_rtx_REG (DImode, PROBE_STACK_SECOND_REGNUM); + emit_move_insn (old_r1, gen_rtx_REG (DImode, R1_REGNUM)); + } + emit_insn (gen_aarch64_get_sme_state ()); + emit_move_insn (mem, gen_rtx_REG (DImode, R0_REGNUM)); + if (old_r0) + emit_move_insn (gen_rtx_REG (DImode, R0_REGNUM), old_r0); + if (old_r1) + emit_move_insn (gen_rtx_REG (DImode, R1_REGNUM), old_r1); + } + } } /* Return TRUE if we can use a simple_return insn. @@ -11758,17 +12413,33 @@ aarch64_start_call_args (cumulative_args_t ca_v) RESULT is the register in which the result is returned. It's NULL for "call" and "sibcall". MEM is the location of the function call. - CALLEE_ABI is a const_int that gives the arm_pcs of the callee. + COOKIE is either: + - a const_int that gives the argument to the call's UNSPEC_CALLEE_ABI. + - a PARALLEL that contains such a const_int as its first element. + The second element is a PARALLEL that lists all the argument + registers that need to be saved and restored around a change + in PSTATE.SM, or const0_rtx if no such switch is needed. SIBCALL indicates whether this function call is normal call or sibling call. It will generate different pattern accordingly. */ void -aarch64_expand_call (rtx result, rtx mem, rtx callee_abi, bool sibcall) +aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall) { rtx call, callee, tmp; rtvec vec; machine_mode mode; + rtx callee_abi = cookie; + rtx sme_mode_switch_args = const0_rtx; + if (GET_CODE (cookie) == PARALLEL) + { + callee_abi = XVECEXP (cookie, 0, 0); + sme_mode_switch_args = XVECEXP (cookie, 0, 1); + } + + gcc_assert (CONST_INT_P (callee_abi)); + auto callee_isa_mode = aarch64_callee_isa_mode (callee_abi); + gcc_assert (MEM_P (mem)); callee = XEXP (mem, 0); mode = GET_MODE (callee); @@ -11793,26 +12464,75 @@ aarch64_expand_call (rtx result, rtx mem, rtx callee_abi, bool sibcall) else tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM)); - gcc_assert (CONST_INT_P (callee_abi)); callee_abi = gen_rtx_UNSPEC (DImode, gen_rtvec (1, callee_abi), UNSPEC_CALLEE_ABI); vec = gen_rtvec (3, call, callee_abi, tmp); call = gen_rtx_PARALLEL (VOIDmode, vec); - aarch64_emit_call_insn (call); + auto call_insn = aarch64_emit_call_insn (call); + + /* Check whether the call requires a change to PSTATE.SM. We can't + emit the instructions to change PSTATE.SM yet, since they involve + a change in vector length and a change in instruction set, which + cannot be represented in RTL. + + For now, just record which registers will be clobbered and used + by the changes to PSTATE.SM. */ + if (!sibcall && aarch64_call_switches_pstate_sm (callee_isa_mode)) + { + aarch64_sme_mode_switch_regs args_switch; + if (sme_mode_switch_args != const0_rtx) + { + unsigned int num_args = XVECLEN (sme_mode_switch_args, 0); + for (unsigned int i = 0; i < num_args; ++i) + { + rtx x = XVECEXP (sme_mode_switch_args, 0, i); + args_switch.add_reg (GET_MODE (x), REGNO (x)); + } + } + + aarch64_sme_mode_switch_regs result_switch; + if (result) + result_switch.add_call_result (call_insn); + + unsigned int num_gprs = MAX (args_switch.num_gprs (), + result_switch.num_gprs ()); + for (unsigned int i = 0; i < num_gprs; ++i) + clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (DImode, args_switch.FIRST_GPR + i)); + + for (int regno = V0_REGNUM; regno < V0_REGNUM + 32; regno += 4) + clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (V4x16QImode, regno)); + + for (int regno = P0_REGNUM; regno < P0_REGNUM + 16; regno += 1) + clobber_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (VNx16BImode, regno)); + + /* Ensure that the VG save slot has been initialized. Also emit + an instruction to model the effect of the temporary clobber + of VG, so that the prologue/epilogue pass sees the need to + save the old value. */ + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (DImode, VG_REGNUM)); + emit_insn_before (gen_aarch64_update_vg (), call_insn); + + cfun->machine->call_switches_pstate_sm = true; + } } /* Emit call insn with PAT and do aarch64-specific handling. */ -void +rtx_call_insn * aarch64_emit_call_insn (rtx pat) { - rtx insn = emit_call_insn (pat); + auto insn = emit_call_insn (pat); rtx *fusage = &CALL_INSN_FUNCTION_USAGE (insn); clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM)); clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM)); + return as_a (insn); } machine_mode @@ -13224,6 +13944,16 @@ aarch64_secondary_memory_needed (machine_mode mode, reg_class_t class1, return false; } +/* Implement TARGET_FRAME_POINTER_REQUIRED. */ + +static bool +aarch64_frame_pointer_required () +{ + /* If the function needs to record the incoming value of PSTATE.SM, + make sure that the slot is accessible from the frame pointer. */ + return aarch64_need_old_pstate_sm (); +} + static bool aarch64_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to) { @@ -20805,7 +21535,8 @@ aarch64_conditional_register_usage (void) call_used_regs[i] = 1; } - /* Only allow the FFR and FFRT to be accessed via special patterns. */ + /* Only allow these registers to be accessed via special patterns. */ + CLEAR_HARD_REG_BIT (operand_reg_set, VG_REGNUM); CLEAR_HARD_REG_BIT (operand_reg_set, FFR_REGNUM); CLEAR_HARD_REG_BIT (operand_reg_set, FFRT_REGNUM); @@ -28376,6 +29107,123 @@ aarch64_pars_overlap_p (rtx par1, rtx par2) return false; } +/* If CALL involves a change in PSTATE.SM, emit the instructions needed + to switch to the new mode and the instructions needed to restore the + original mode. Return true if something changed. */ +static bool +aarch64_switch_pstate_sm_for_call (rtx_call_insn *call) +{ + /* Mode switches for sibling calls are handled via the epilogue. */ + if (SIBLING_CALL_P (call)) + return false; + + auto callee_isa_mode = aarch64_insn_callee_isa_mode (call); + if (!aarch64_call_switches_pstate_sm (callee_isa_mode)) + return false; + + /* Switch mode before the call, preserving any argument registers + across the switch. */ + start_sequence (); + rtx_insn *args_guard_label = nullptr; + if (TARGET_STREAMING_COMPATIBLE) + args_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, + callee_isa_mode); + aarch64_sme_mode_switch_regs args_switch; + args_switch.add_call_args (call); + args_switch.emit_prologue (); + aarch64_switch_pstate_sm (AARCH64_ISA_MODE, callee_isa_mode); + args_switch.emit_epilogue (); + if (args_guard_label) + emit_label (args_guard_label); + auto args_seq = get_insns (); + end_sequence (); + emit_insn_before (args_seq, call); + + if (find_reg_note (call, REG_NORETURN, NULL_RTX)) + return true; + + /* Switch mode after the call, preserving any return registers across + the switch. */ + start_sequence (); + rtx_insn *return_guard_label = nullptr; + if (TARGET_STREAMING_COMPATIBLE) + return_guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, + callee_isa_mode); + aarch64_sme_mode_switch_regs return_switch; + return_switch.add_call_result (call); + return_switch.emit_prologue (); + aarch64_switch_pstate_sm (callee_isa_mode, AARCH64_ISA_MODE); + return_switch.emit_epilogue (); + if (return_guard_label) + emit_label (return_guard_label); + auto result_seq = get_insns (); + end_sequence (); + emit_insn_after (result_seq, call); + return true; +} + +namespace { + +const pass_data pass_data_switch_pstate_sm = +{ + RTL_PASS, // type + "smstarts", // name + OPTGROUP_NONE, // optinfo_flags + TV_NONE, // tv_id + 0, // properties_required + 0, // properties_provided + 0, // properties_destroyed + 0, // todo_flags_start + TODO_df_finish, // todo_flags_finish +}; + +class pass_switch_pstate_sm : public rtl_opt_pass +{ +public: + pass_switch_pstate_sm (gcc::context *ctxt) + : rtl_opt_pass (pass_data_switch_pstate_sm, ctxt) + {} + + // opt_pass methods: + bool gate (function *) override final; + unsigned int execute (function *) override final; +}; + +bool +pass_switch_pstate_sm::gate (function *) +{ + return cfun->machine->call_switches_pstate_sm; +} + +/* Emit any instructions needed to switch PSTATE.SM. */ +unsigned int +pass_switch_pstate_sm::execute (function *fn) +{ + basic_block bb; + + auto_sbitmap blocks (last_basic_block_for_fn (cfun)); + bitmap_clear (blocks); + FOR_EACH_BB_FN (bb, fn) + { + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) + if (auto *call = dyn_cast (insn)) + if (aarch64_switch_pstate_sm_for_call (call)) + bitmap_set_bit (blocks, bb->index); + } + find_many_sub_basic_blocks (blocks); + clear_aux_for_blocks (); + return 0; +} + +} + +rtl_opt_pass * +make_pass_switch_pstate_sm (gcc::context *ctxt) +{ + return new pass_switch_pstate_sm (ctxt); +} + /* Target-specific selftests. */ #if CHECKING_P @@ -28563,6 +29411,9 @@ aarch64_run_selftests (void) #undef TARGET_CALLEE_COPIES #define TARGET_CALLEE_COPIES hook_bool_CUMULATIVE_ARGS_arg_info_false +#undef TARGET_FRAME_POINTER_REQUIRED +#define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE aarch64_can_eliminate diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 0ea8b2d3524..693acde7eb9 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -256,6 +256,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* The current function is a normal non-streaming function. */ #define TARGET_NON_STREAMING (AARCH64_ISA_SM_OFF) +/* The current function has a streaming-compatible body. */ +#define TARGET_STREAMING_COMPATIBLE \ + ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0) + /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO) @@ -477,7 +481,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; 0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* V16 - V23 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* V24 - V31 */ \ - 1, 1, 1, 1, /* SFP, AP, CC, VG */ \ + 1, 1, 1, 0, /* SFP, AP, CC, VG */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* P0 - P7 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* P8 - P15 */ \ 1, 1 /* FFR and FFRT */ \ @@ -814,6 +818,13 @@ struct GTY (()) aarch64_frame vec *saved_fprs; vec *saved_prs; + /* The offset from the base of the frame of a 64-bit slot whose low + bit contains the incoming value of PSTATE.SM. This slot must be + within reach of the hard frame pointer. + + The offset is -1 if such a slot isn't needed. */ + poly_int64 old_svcr_offset; + /* The number of extra stack bytes taken up by register varargs. This area is allocated by the callee at the very top of the frame. This value is rounded up to a multiple of @@ -922,6 +933,12 @@ typedef struct GTY (()) machine_function /* One entry for each general purpose register. */ rtx call_via[SP_REGNUM]; bool label_is_assembled; + + /* True if we've expanded at least one call to a function that changes + PSTATE.SM. This should only be used for saving compile time: false + guarantees that no such mode switch exists. */ + bool call_switches_pstate_sm; + /* A set of all decls that have been passed to a vld1 intrinsic in the current function. This is used to help guide the vector cost model. */ hash_set *vector_load_decls; @@ -990,6 +1007,12 @@ typedef struct stack arg area so far. */ bool silent_p; /* True if we should act silently, rather than raise an error for invalid calls. */ + + /* A list of registers that need to be saved and restored around a + change to PSTATE.SM. An auto_vec would be more convenient, but those + can't be copied. */ + unsigned int num_sme_mode_switch_args; + rtx sme_mode_switch_args[12]; } CUMULATIVE_ARGS; #endif diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 9585879a1b1..9b586b5170b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -956,7 +956,7 @@ (define_expand "tbranch_3" operands[1]); }) -(define_insn "*tb1" +(define_insn "@aarch64_tb" [(set (pc) (if_then_else (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" "r") (const_int 1) @@ -1043,7 +1043,7 @@ (define_expand "call" [(parallel [(call (match_operand 0 "memory_operand") (match_operand 1 "general_operand")) - (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI) + (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))])] "" " @@ -1070,7 +1070,7 @@ (define_expand "call_value" [(set (match_operand 0 "") (call (match_operand 1 "memory_operand") (match_operand 2 "general_operand"))) - (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI) + (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI) (clobber (reg:DI LR_REGNUM))])] "" " @@ -1097,7 +1097,7 @@ (define_expand "sibcall" [(parallel [(call (match_operand 0 "memory_operand") (match_operand 1 "general_operand")) - (unspec:DI [(match_operand 2 "const_int_operand")] UNSPEC_CALLEE_ABI) + (unspec:DI [(match_operand 2)] UNSPEC_CALLEE_ABI) (return)])] "" { @@ -1111,7 +1111,7 @@ (define_expand "sibcall_value" [(set (match_operand 0 "") (call (match_operand 1 "memory_operand") (match_operand 2 "general_operand"))) - (unspec:DI [(match_operand 3 "const_int_operand")] UNSPEC_CALLEE_ABI) + (unspec:DI [(match_operand 3)] UNSPEC_CALLEE_ABI) (return)])] "" { @@ -8048,3 +8048,6 @@ (define_insn "patchable_area" ;; SVE2. (include "aarch64-sve2.md") + +;; SME and extensions +(include "aarch64-sme.md") diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 index a4e0aa03274..cff56dc9f55 100644 --- a/gcc/config/aarch64/t-aarch64 +++ b/gcc/config/aarch64/t-aarch64 @@ -186,9 +186,12 @@ MULTILIB_DIRNAMES = $(subst $(comma), ,$(TM_MULTILIB_CONFIG)) insn-conditions.md: s-check-sve-md s-check-sve-md: $(srcdir)/config/aarch64/check-sve-md.awk \ $(srcdir)/config/aarch64/aarch64-sve.md \ - $(srcdir)/config/aarch64/aarch64-sve2.md + $(srcdir)/config/aarch64/aarch64-sve2.md \ + $(srcdir)/config/aarch64/aarch64-sme.md $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \ $(srcdir)/config/aarch64/aarch64-sve.md $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \ $(srcdir)/config/aarch64/aarch64-sve2.md + $(AWK) -f $(srcdir)/config/aarch64/check-sve-md.awk \ + $(srcdir)/config/aarch64/aarch64-sme.md $(STAMP) s-check-sve-md diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c new file mode 100644 index 00000000000..a2de55773af --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c @@ -0,0 +1,233 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +void ns_callee (); + void s_callee () [[arm::streaming]]; + void sc_callee () [[arm::streaming_compatible]]; + +void ns_callee_stack (int, int, int, int, int, int, int, int, int); + +struct callbacks { + void (*ns_ptr) (); + void (*s_ptr) () [[arm::streaming]]; + void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +/* +** n_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-96\]! +** cntd x16 +** str x16, \[sp, #?16\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mov \1, x0 +** bl ns_callee +** smstart sm +** bl s_callee +** smstop sm +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** blr \2 +** ldr (x[0-9]+), \[\1, #?8\] +** smstart sm +** blr \3 +** smstop sm +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x30, \1, \[sp\], #?96 +** ret +*/ +void +n_caller (struct callbacks *c) +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + c->sc_ptr (); +} + +/* +** s_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-96\]! +** cntd x16 +** str x16, \[sp, #?16\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mov \1, x0 +** smstop sm +** bl ns_callee +** smstart sm +** bl s_callee +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** smstop sm +** blr \2 +** smstart sm +** ldr (x[0-9]+), \[\1, #?8\] +** blr \3 +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x30, \1, \[sp\], #?96 +** ret +*/ +void +s_caller (struct callbacks *c) [[arm::streaming]] +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + c->sc_ptr (); +} + +/* +** sc_caller_sme: +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstop sm +** bl ns_callee +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstart sm +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** smstart sm +** bl s_callee +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** smstop sm +** bl sc_callee +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** ret +*/ +void +sc_caller_sme () [[arm::streaming_compatible]] +{ + ns_callee (); + s_callee (); + sc_callee (); +} + +#pragma GCC target "+nosme" + +/* +** sc_caller: +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** bl __arm_sme_state +** str x0, \[x29, #?16\] +** ... +** bl sc_callee +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** ret +*/ +void +sc_caller () [[arm::streaming_compatible]] +{ + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_x0: +** ... +** mov x10, x0 +** bl __arm_sme_state +** ... +** str wzr, \[x10\] +** ... +*/ +void +sc_caller_x0 (int *ptr) [[arm::streaming_compatible]] +{ + *ptr = 0; + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_x1: +** ... +** mov x10, x0 +** mov x11, x1 +** bl __arm_sme_state +** ... +** str w11, \[x10\] +** ... +*/ +void +sc_caller_x1 (int *ptr, int a) [[arm::streaming_compatible]] +{ + *ptr = a; + ns_callee (); + sc_callee (); +} + +/* +** sc_caller_stack: +** sub sp, sp, #112 +** stp x29, x30, \[sp, #?16\] +** add x29, sp, #?16 +** ... +** stp d8, d9, \[sp, #?48\] +** ... +** bl __arm_sme_state +** str x0, \[x29, #?16\] +** ... +** bl ns_callee_stack +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstart sm +** ... +*/ +void +sc_caller_stack () [[arm::streaming_compatible]] +{ + ns_callee_stack (0, 0, 0, 0, 0, 0, 0, 0, 0); +} + +/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -80\n} } } */ +/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -80\n} } } */ +/* { dg-final { scan-assembler {sc_caller_sme:(?:(?!ret).)*\.cfi_offset 46, -72\n} } } */ +/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -72\n} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c new file mode 100644 index 00000000000..49c5e4a6acb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_10.c @@ -0,0 +1,37 @@ +// { dg-options "" } + +#pragma GCC target "+nosme" + +void ns_callee (); + void s_callee () [[arm::streaming]]; + void sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + void (*ns_ptr) (); + void (*s_ptr) () [[arm::streaming]]; + void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +void +n_caller (struct callbacks *c) +{ + ns_callee (); + s_callee (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" } + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" } + c->sc_ptr (); +} + +void +sc_caller_sme (struct callbacks *c) [[arm::streaming_compatible]] +{ + ns_callee (); + s_callee (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" } + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); // { dg-error "calling a streaming function requires the ISA extension 'sme'" } + c->sc_ptr (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c new file mode 100644 index 00000000000..890fcbc5b1a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_2.c @@ -0,0 +1,43 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } + +void ns_callee (); + void s_callee () [[arm::streaming]]; + void sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + void (*ns_ptr) (); + void (*s_ptr) () [[arm::streaming]]; + void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +void +n_caller (struct callbacks *c) +{ + ns_callee (); + sc_callee (); + + c->ns_ptr (); + c->sc_ptr (); +} + +void +s_caller (struct callbacks *c) [[arm::streaming]] +{ + s_callee (); + sc_callee (); + + c->s_ptr (); + c->sc_ptr (); +} + +void +sc_caller (struct callbacks *c) [[arm::streaming_compatible]] +{ + sc_callee (); + + c->sc_ptr (); +} + +// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } } +// { dg-final { scan-assembler-not {smstart\tsm} } } +// { dg-final { scan-assembler-not {smstop\tsm} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c new file mode 100644 index 00000000000..ed999d08560 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c @@ -0,0 +1,166 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +__attribute__((aarch64_vector_pcs)) void ns_callee (); +__attribute__((aarch64_vector_pcs)) void s_callee () [[arm::streaming]]; +__attribute__((aarch64_vector_pcs)) void sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + __attribute__((aarch64_vector_pcs)) void (*ns_ptr) (); + __attribute__((aarch64_vector_pcs)) void (*s_ptr) () [[arm::streaming]]; + __attribute__((aarch64_vector_pcs)) void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +/* +** n_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-288\]! +** cntd x16 +** str x16, \[sp, #?16\] +** stp q8, q9, \[sp, #?32\] +** stp q10, q11, \[sp, #?64\] +** stp q12, q13, \[sp, #?96\] +** stp q14, q15, \[sp, #?128\] +** stp q16, q17, \[sp, #?160\] +** stp q18, q19, \[sp, #?192\] +** stp q20, q21, \[sp, #?224\] +** stp q22, q23, \[sp, #?256\] +** mov \1, x0 +** bl ns_callee +** smstart sm +** bl s_callee +** smstop sm +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** blr \2 +** ldr (x[0-9]+), \[\1, #?8\] +** smstart sm +** blr \3 +** smstop sm +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldp q8, q9, \[sp, #?32\] +** ldp q10, q11, \[sp, #?64\] +** ldp q12, q13, \[sp, #?96\] +** ldp q14, q15, \[sp, #?128\] +** ldp q16, q17, \[sp, #?160\] +** ldp q18, q19, \[sp, #?192\] +** ldp q20, q21, \[sp, #?224\] +** ldp q22, q23, \[sp, #?256\] +** ldp x30, \1, \[sp\], #?288 +** ret +*/ +void __attribute__((aarch64_vector_pcs)) +n_caller (struct callbacks *c) +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + c->sc_ptr (); +} + +/* +** s_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-288\]! +** cntd x16 +** str x16, \[sp, #?16\] +** stp q8, q9, \[sp, #?32\] +** stp q10, q11, \[sp, #?64\] +** stp q12, q13, \[sp, #?96\] +** stp q14, q15, \[sp, #?128\] +** stp q16, q17, \[sp, #?160\] +** stp q18, q19, \[sp, #?192\] +** stp q20, q21, \[sp, #?224\] +** stp q22, q23, \[sp, #?256\] +** mov \1, x0 +** smstop sm +** bl ns_callee +** smstart sm +** bl s_callee +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** smstop sm +** blr \2 +** smstart sm +** ldr (x[0-9]+), \[\1, #?8\] +** blr \3 +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldp q8, q9, \[sp, #?32\] +** ldp q10, q11, \[sp, #?64\] +** ldp q12, q13, \[sp, #?96\] +** ldp q14, q15, \[sp, #?128\] +** ldp q16, q17, \[sp, #?160\] +** ldp q18, q19, \[sp, #?192\] +** ldp q20, q21, \[sp, #?224\] +** ldp q22, q23, \[sp, #?256\] +** ldp x30, \1, \[sp\], #?288 +** ret +*/ +void __attribute__((aarch64_vector_pcs)) +s_caller (struct callbacks *c) [[arm::streaming]] +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + c->sc_ptr (); +} + +/* +** sc_caller: +** stp x29, x30, \[sp, #?-288\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp q8, q9, \[sp, #?32\] +** stp q10, q11, \[sp, #?64\] +** stp q12, q13, \[sp, #?96\] +** stp q14, q15, \[sp, #?128\] +** stp q16, q17, \[sp, #?160\] +** stp q18, q19, \[sp, #?192\] +** stp q20, q21, \[sp, #?224\] +** stp q22, q23, \[sp, #?256\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstop sm +** bl ns_callee +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstart sm +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** smstart sm +** bl s_callee +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** smstop sm +** bl sc_callee +** ldp q8, q9, \[sp, #?32\] +** ldp q10, q11, \[sp, #?64\] +** ldp q12, q13, \[sp, #?96\] +** ldp q14, q15, \[sp, #?128\] +** ldp q16, q17, \[sp, #?160\] +** ldp q18, q19, \[sp, #?192\] +** ldp q20, q21, \[sp, #?224\] +** ldp q22, q23, \[sp, #?256\] +** ldp x29, x30, \[sp\], #?288 +** ret +*/ +void __attribute__((aarch64_vector_pcs)) +sc_caller () [[arm::streaming_compatible]] +{ + ns_callee (); + s_callee (); + sc_callee (); +} + +/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -272\n} } } */ +/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -272\n} } } */ +/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -264\n} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c new file mode 100644 index 00000000000..f93a67f974a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_4.c @@ -0,0 +1,43 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } + +__attribute__((aarch64_vector_pcs)) void ns_callee (); +__attribute__((aarch64_vector_pcs)) void s_callee () [[arm::streaming]]; +__attribute__((aarch64_vector_pcs)) void sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + __attribute__((aarch64_vector_pcs)) void (*ns_ptr) (); + __attribute__((aarch64_vector_pcs)) void (*s_ptr) () [[arm::streaming]]; + __attribute__((aarch64_vector_pcs)) void (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +void __attribute__((aarch64_vector_pcs)) +n_caller (struct callbacks *c) +{ + ns_callee (); + sc_callee (); + + c->ns_ptr (); + c->sc_ptr (); +} + +void __attribute__((aarch64_vector_pcs)) +s_caller (struct callbacks *c) [[arm::streaming]] +{ + s_callee (); + sc_callee (); + + c->s_ptr (); + c->sc_ptr (); +} + +void __attribute__((aarch64_vector_pcs)) +sc_caller (struct callbacks *c) [[arm::streaming_compatible]] +{ + sc_callee (); + + c->sc_ptr (); +} + +// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } } +// { dg-final { scan-assembler-not {smstart\tsm} } } +// { dg-final { scan-assembler-not {smstop\tsm} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c new file mode 100644 index 00000000000..be9b5cc0410 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c @@ -0,0 +1,318 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +#include + +svbool_t ns_callee (); + svbool_t s_callee () [[arm::streaming]]; + svbool_t sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + svbool_t (*ns_ptr) (); + svbool_t (*s_ptr) () [[arm::streaming]]; + svbool_t (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +/* +** n_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-32\]! +** cntd x16 +** str x16, \[sp, #?16\] +** addvl sp, sp, #-18 +** str p4, \[sp\] +** str p5, \[sp, #1, mul vl\] +** str p6, \[sp, #2, mul vl\] +** str p7, \[sp, #3, mul vl\] +** str p8, \[sp, #4, mul vl\] +** str p9, \[sp, #5, mul vl\] +** str p10, \[sp, #6, mul vl\] +** str p11, \[sp, #7, mul vl\] +** str p12, \[sp, #8, mul vl\] +** str p13, \[sp, #9, mul vl\] +** str p14, \[sp, #10, mul vl\] +** str p15, \[sp, #11, mul vl\] +** str z8, \[sp, #2, mul vl\] +** str z9, \[sp, #3, mul vl\] +** str z10, \[sp, #4, mul vl\] +** str z11, \[sp, #5, mul vl\] +** str z12, \[sp, #6, mul vl\] +** str z13, \[sp, #7, mul vl\] +** str z14, \[sp, #8, mul vl\] +** str z15, \[sp, #9, mul vl\] +** str z16, \[sp, #10, mul vl\] +** str z17, \[sp, #11, mul vl\] +** str z18, \[sp, #12, mul vl\] +** str z19, \[sp, #13, mul vl\] +** str z20, \[sp, #14, mul vl\] +** str z21, \[sp, #15, mul vl\] +** str z22, \[sp, #16, mul vl\] +** str z23, \[sp, #17, mul vl\] +** mov \1, x0 +** bl ns_callee +** smstart sm +** bl s_callee +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** blr \2 +** ldr (x[0-9]+), \[\1, #?8\] +** smstart sm +** blr \3 +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldr z8, \[sp, #2, mul vl\] +** ldr z9, \[sp, #3, mul vl\] +** ldr z10, \[sp, #4, mul vl\] +** ldr z11, \[sp, #5, mul vl\] +** ldr z12, \[sp, #6, mul vl\] +** ldr z13, \[sp, #7, mul vl\] +** ldr z14, \[sp, #8, mul vl\] +** ldr z15, \[sp, #9, mul vl\] +** ldr z16, \[sp, #10, mul vl\] +** ldr z17, \[sp, #11, mul vl\] +** ldr z18, \[sp, #12, mul vl\] +** ldr z19, \[sp, #13, mul vl\] +** ldr z20, \[sp, #14, mul vl\] +** ldr z21, \[sp, #15, mul vl\] +** ldr z22, \[sp, #16, mul vl\] +** ldr z23, \[sp, #17, mul vl\] +** ldr p4, \[sp\] +** ldr p5, \[sp, #1, mul vl\] +** ldr p6, \[sp, #2, mul vl\] +** ldr p7, \[sp, #3, mul vl\] +** ldr p8, \[sp, #4, mul vl\] +** ldr p9, \[sp, #5, mul vl\] +** ldr p10, \[sp, #6, mul vl\] +** ldr p11, \[sp, #7, mul vl\] +** ldr p12, \[sp, #8, mul vl\] +** ldr p13, \[sp, #9, mul vl\] +** ldr p14, \[sp, #10, mul vl\] +** ldr p15, \[sp, #11, mul vl\] +** addvl sp, sp, #18 +** ldp x30, \1, \[sp\], #?32 +** ret +*/ +svbool_t +n_caller (struct callbacks *c) +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + return c->sc_ptr (); +} + +/* +** s_caller: { target lp64 } +** stp x30, (x19|x2[0-8]), \[sp, #?-32\]! +** cntd x16 +** str x16, \[sp, #?16\] +** addvl sp, sp, #-18 +** str p4, \[sp\] +** str p5, \[sp, #1, mul vl\] +** str p6, \[sp, #2, mul vl\] +** str p7, \[sp, #3, mul vl\] +** str p8, \[sp, #4, mul vl\] +** str p9, \[sp, #5, mul vl\] +** str p10, \[sp, #6, mul vl\] +** str p11, \[sp, #7, mul vl\] +** str p12, \[sp, #8, mul vl\] +** str p13, \[sp, #9, mul vl\] +** str p14, \[sp, #10, mul vl\] +** str p15, \[sp, #11, mul vl\] +** str z8, \[sp, #2, mul vl\] +** str z9, \[sp, #3, mul vl\] +** str z10, \[sp, #4, mul vl\] +** str z11, \[sp, #5, mul vl\] +** str z12, \[sp, #6, mul vl\] +** str z13, \[sp, #7, mul vl\] +** str z14, \[sp, #8, mul vl\] +** str z15, \[sp, #9, mul vl\] +** str z16, \[sp, #10, mul vl\] +** str z17, \[sp, #11, mul vl\] +** str z18, \[sp, #12, mul vl\] +** str z19, \[sp, #13, mul vl\] +** str z20, \[sp, #14, mul vl\] +** str z21, \[sp, #15, mul vl\] +** str z22, \[sp, #16, mul vl\] +** str z23, \[sp, #17, mul vl\] +** mov \1, x0 +** smstop sm +** bl ns_callee +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** bl s_callee +** bl sc_callee +** ldr (x[0-9]+), \[\1\] +** smstop sm +** blr \2 +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** ldr (x[0-9]+), \[\1, #?8\] +** blr \3 +** ldr (x[0-9]+), \[\1, #?16\] +** blr \4 +** ldr z8, \[sp, #2, mul vl\] +** ldr z9, \[sp, #3, mul vl\] +** ldr z10, \[sp, #4, mul vl\] +** ldr z11, \[sp, #5, mul vl\] +** ldr z12, \[sp, #6, mul vl\] +** ldr z13, \[sp, #7, mul vl\] +** ldr z14, \[sp, #8, mul vl\] +** ldr z15, \[sp, #9, mul vl\] +** ldr z16, \[sp, #10, mul vl\] +** ldr z17, \[sp, #11, mul vl\] +** ldr z18, \[sp, #12, mul vl\] +** ldr z19, \[sp, #13, mul vl\] +** ldr z20, \[sp, #14, mul vl\] +** ldr z21, \[sp, #15, mul vl\] +** ldr z22, \[sp, #16, mul vl\] +** ldr z23, \[sp, #17, mul vl\] +** ldr p4, \[sp\] +** ldr p5, \[sp, #1, mul vl\] +** ldr p6, \[sp, #2, mul vl\] +** ldr p7, \[sp, #3, mul vl\] +** ldr p8, \[sp, #4, mul vl\] +** ldr p9, \[sp, #5, mul vl\] +** ldr p10, \[sp, #6, mul vl\] +** ldr p11, \[sp, #7, mul vl\] +** ldr p12, \[sp, #8, mul vl\] +** ldr p13, \[sp, #9, mul vl\] +** ldr p14, \[sp, #10, mul vl\] +** ldr p15, \[sp, #11, mul vl\] +** addvl sp, sp, #18 +** ldp x30, \1, \[sp\], #?32 +** ret +*/ +svbool_t +s_caller (struct callbacks *c) [[arm::streaming]] +{ + ns_callee (); + s_callee (); + sc_callee (); + + c->ns_ptr (); + c->s_ptr (); + return c->sc_ptr (); +} + +/* +** sc_caller: +** stp x29, x30, \[sp, #?-32\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** addvl sp, sp, #-18 +** str p4, \[sp\] +** str p5, \[sp, #1, mul vl\] +** str p6, \[sp, #2, mul vl\] +** str p7, \[sp, #3, mul vl\] +** str p8, \[sp, #4, mul vl\] +** str p9, \[sp, #5, mul vl\] +** str p10, \[sp, #6, mul vl\] +** str p11, \[sp, #7, mul vl\] +** str p12, \[sp, #8, mul vl\] +** str p13, \[sp, #9, mul vl\] +** str p14, \[sp, #10, mul vl\] +** str p15, \[sp, #11, mul vl\] +** str z8, \[sp, #2, mul vl\] +** str z9, \[sp, #3, mul vl\] +** str z10, \[sp, #4, mul vl\] +** str z11, \[sp, #5, mul vl\] +** str z12, \[sp, #6, mul vl\] +** str z13, \[sp, #7, mul vl\] +** str z14, \[sp, #8, mul vl\] +** str z15, \[sp, #9, mul vl\] +** str z16, \[sp, #10, mul vl\] +** str z17, \[sp, #11, mul vl\] +** str z18, \[sp, #12, mul vl\] +** str z19, \[sp, #13, mul vl\] +** str z20, \[sp, #14, mul vl\] +** str z21, \[sp, #15, mul vl\] +** str z22, \[sp, #16, mul vl\] +** str z23, \[sp, #17, mul vl\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** smstop sm +** bl ns_callee +** ldr x16, \[x29, #?16\] +** tbz x16, 0, .* +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** smstart sm +** bl s_callee +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, .* +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** bl sc_callee +** ldr z8, \[sp, #2, mul vl\] +** ldr z9, \[sp, #3, mul vl\] +** ldr z10, \[sp, #4, mul vl\] +** ldr z11, \[sp, #5, mul vl\] +** ldr z12, \[sp, #6, mul vl\] +** ldr z13, \[sp, #7, mul vl\] +** ldr z14, \[sp, #8, mul vl\] +** ldr z15, \[sp, #9, mul vl\] +** ldr z16, \[sp, #10, mul vl\] +** ldr z17, \[sp, #11, mul vl\] +** ldr z18, \[sp, #12, mul vl\] +** ldr z19, \[sp, #13, mul vl\] +** ldr z20, \[sp, #14, mul vl\] +** ldr z21, \[sp, #15, mul vl\] +** ldr z22, \[sp, #16, mul vl\] +** ldr z23, \[sp, #17, mul vl\] +** ldr p4, \[sp\] +** ldr p5, \[sp, #1, mul vl\] +** ldr p6, \[sp, #2, mul vl\] +** ldr p7, \[sp, #3, mul vl\] +** ldr p8, \[sp, #4, mul vl\] +** ldr p9, \[sp, #5, mul vl\] +** ldr p10, \[sp, #6, mul vl\] +** ldr p11, \[sp, #7, mul vl\] +** ldr p12, \[sp, #8, mul vl\] +** ldr p13, \[sp, #9, mul vl\] +** ldr p14, \[sp, #10, mul vl\] +** ldr p15, \[sp, #11, mul vl\] +** addvl sp, sp, #18 +** ldp x29, x30, \[sp\], #?32 +** ret +*/ +svbool_t +sc_caller () [[arm::streaming_compatible]] +{ + ns_callee (); + s_callee (); + return sc_callee (); +} + +/* { dg-final { scan-assembler {n_caller:(?:(?!ret).)*\.cfi_offset 46, -16\n} } } */ +/* { dg-final { scan-assembler {s_caller:(?:(?!ret).)*\.cfi_offset 46, -16\n} } } */ +/* { dg-final { scan-assembler {sc_caller:(?:(?!ret).)*\.cfi_offset 46, -8\n} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c new file mode 100644 index 00000000000..0f6bc4f6c9a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_6.c @@ -0,0 +1,45 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } + +#include + +svbool_t ns_callee (); + svbool_t s_callee () [[arm::streaming]]; + svbool_t sc_callee () [[arm::streaming_compatible]]; + +struct callbacks { + svbool_t (*ns_ptr) (); + svbool_t (*s_ptr) () [[arm::streaming]]; + svbool_t (*sc_ptr) () [[arm::streaming_compatible]]; +}; + +svbool_t +n_caller (struct callbacks *c) +{ + ns_callee (); + sc_callee (); + + c->ns_ptr (); + return c->sc_ptr (); +} + +svbool_t +s_caller (struct callbacks *c) [[arm::streaming]] +{ + s_callee (); + sc_callee (); + + c->s_ptr (); + return c->sc_ptr (); +} + +svbool_t +sc_caller (struct callbacks *c) [[arm::streaming_compatible]] +{ + sc_callee (); + + return c->sc_ptr (); +} + +// { dg-final { scan-assembler-not {[dpqz][0-9]+,} } } +// { dg-final { scan-assembler-not {smstart\tsm} } } +// { dg-final { scan-assembler-not {smstop\tsm} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c new file mode 100644 index 00000000000..6482a489fc5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_7.c @@ -0,0 +1,516 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +#include +#include + +double produce_d0 (); +void consume_d0 (double); + +/* +** test_d0: +** ... +** smstop sm +** bl produce_d0 +** fmov x10, d0 +** smstart sm +** fmov d0, x10 +** fmov x10, d0 +** smstop sm +** fmov d0, x10 +** bl consume_d0 +** ... +*/ +void +test_d0 () [[arm::streaming]] +{ + double res = produce_d0 (); + asm volatile (""); + consume_d0 (res); +} + +int8x8_t produce_d0_vec (); +void consume_d0_vec (int8x8_t); + +/* +** test_d0_vec: +** ... +** smstop sm +** bl produce_d0_vec +** ( +** fmov x10, d0 +** | +** umov x10, v0.d\[0\] +** ) +** smstart sm +** fmov d0, x10 +** ( +** fmov x10, d0 +** | +** umov x10, v0.d\[0\] +** ) +** smstop sm +** fmov d0, x10 +** bl consume_d0_vec +** ... +*/ +void +test_d0_vec () [[arm::streaming]] +{ + int8x8_t res = produce_d0_vec (); + asm volatile (""); + consume_d0_vec (res); +} + +int8x16_t produce_q0 (); +void consume_q0 (int8x16_t); + +/* +** test_q0: +** ... +** smstop sm +** bl produce_q0 +** str q0, \[sp, #?-16\]! +** smstart sm +** ldr q0, \[sp\], #?16 +** str q0, \[sp, #?-16\]! +** smstop sm +** ldr q0, \[sp\], #?16 +** bl consume_q0 +** ... +*/ +void +test_q0 () [[arm::streaming]] +{ + int8x16_t res = produce_q0 (); + asm volatile (""); + consume_q0 (res); +} + +int8x16x2_t produce_q1 (); +void consume_q1 (int8x16x2_t); + +/* +** test_q1: +** ... +** smstop sm +** bl produce_q1 +** stp q0, q1, \[sp, #?-32\]! +** smstart sm +** ldp q0, q1, \[sp\], #?32 +** stp q0, q1, \[sp, #?-32\]! +** smstop sm +** ldp q0, q1, \[sp\], #?32 +** bl consume_q1 +** ... +*/ +void +test_q1 () [[arm::streaming]] +{ + int8x16x2_t res = produce_q1 (); + asm volatile (""); + consume_q1 (res); +} + +int8x16x3_t produce_q2 (); +void consume_q2 (int8x16x3_t); + +/* +** test_q2: +** ... +** smstop sm +** bl produce_q2 +** stp q0, q1, \[sp, #?-48\]! +** str q2, \[sp, #?32\] +** smstart sm +** ldr q2, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?48 +** stp q0, q1, \[sp, #?-48\]! +** str q2, \[sp, #?32\] +** smstop sm +** ldr q2, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?48 +** bl consume_q2 +** ... +*/ +void +test_q2 () [[arm::streaming]] +{ + int8x16x3_t res = produce_q2 (); + asm volatile (""); + consume_q2 (res); +} + +int8x16x4_t produce_q3 (); +void consume_q3 (int8x16x4_t); + +/* +** test_q3: +** ... +** smstop sm +** bl produce_q3 +** stp q0, q1, \[sp, #?-64\]! +** stp q2, q3, \[sp, #?32\] +** smstart sm +** ldp q2, q3, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?64 +** stp q0, q1, \[sp, #?-64\]! +** stp q2, q3, \[sp, #?32\] +** smstop sm +** ldp q2, q3, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?64 +** bl consume_q3 +** ... +*/ +void +test_q3 () [[arm::streaming]] +{ + int8x16x4_t res = produce_q3 (); + asm volatile (""); + consume_q3 (res); +} + +svint8_t produce_z0 (); +void consume_z0 (svint8_t); + +/* +** test_z0: +** ... +** smstop sm +** bl produce_z0 +** addvl sp, sp, #-1 +** str z0, \[sp\] +** smstart sm +** ldr z0, \[sp\] +** addvl sp, sp, #1 +** addvl sp, sp, #-1 +** str z0, \[sp\] +** smstop sm +** ldr z0, \[sp\] +** addvl sp, sp, #1 +** bl consume_z0 +** ... +*/ +void +test_z0 () [[arm::streaming]] +{ + svint8_t res = produce_z0 (); + asm volatile (""); + consume_z0 (res); +} + +svint8x4_t produce_z3 (); +void consume_z3 (svint8x4_t); + +/* +** test_z3: +** ... +** smstop sm +** bl produce_z3 +** addvl sp, sp, #-4 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** smstart sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** addvl sp, sp, #4 +** addvl sp, sp, #-4 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** smstop sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** addvl sp, sp, #4 +** bl consume_z3 +** ... +*/ +void +test_z3 () [[arm::streaming]] +{ + svint8x4_t res = produce_z3 (); + asm volatile (""); + consume_z3 (res); +} + +svbool_t produce_p0 (); +void consume_p0 (svbool_t); + +/* +** test_p0: +** ... +** smstop sm +** bl produce_p0 +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** bl consume_p0 +** ... +*/ +void +test_p0 () [[arm::streaming]] +{ + svbool_t res = produce_p0 (); + asm volatile (""); + consume_p0 (res); +} + +void consume_d7 (double, double, double, double, double, double, double, + double); + +/* +** test_d7: +** ... +** fmov x10, d0 +** fmov x11, d1 +** fmov x12, d2 +** fmov x13, d3 +** fmov x14, d4 +** fmov x15, d5 +** fmov x16, d6 +** fmov x17, d7 +** smstop sm +** fmov d0, x10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d3, x13 +** fmov d4, x14 +** fmov d5, x15 +** fmov d6, x16 +** fmov d7, x17 +** bl consume_d7 +** ... +*/ +void +test_d7 () [[arm::streaming]] +{ + consume_d7 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0); +} + +void consume_d7_vec (int8x8_t, int8x8_t, int8x8_t, int8x8_t, int8x8_t, + int8x8_t, int8x8_t, int8x8_t); + +/* +** test_d7_vec: +** ... +** ( +** fmov x10, d0 +** fmov x11, d1 +** fmov x12, d2 +** fmov x13, d3 +** fmov x14, d4 +** fmov x15, d5 +** fmov x16, d6 +** fmov x17, d7 +** | +** umov x10, v0.d\[0\] +** umov x11, v1.d\[0\] +** umov x12, v2.d\[0\] +** umov x13, v3.d\[0\] +** umov x14, v4.d\[0\] +** umov x15, v5.d\[0\] +** umov x16, v6.d\[0\] +** umov x17, v7.d\[0\] +** ) +** smstop sm +** fmov d0, x10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d3, x13 +** fmov d4, x14 +** fmov d5, x15 +** fmov d6, x16 +** fmov d7, x17 +** bl consume_d7_vec +** ... +*/ +void +test_d7_vec (int8x8_t *ptr) [[arm::streaming]] +{ + consume_d7_vec (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr); +} + +void consume_q7 (int8x16_t, int8x16_t, int8x16_t, int8x16_t, int8x16_t, + int8x16_t, int8x16_t, int8x16_t); + +/* +** test_q7: +** ... +** stp q0, q1, \[sp, #?-128\]! +** stp q2, q3, \[sp, #?32\] +** stp q4, q5, \[sp, #?64\] +** stp q6, q7, \[sp, #?96\] +** smstop sm +** ldp q2, q3, \[sp, #?32\] +** ldp q4, q5, \[sp, #?64\] +** ldp q6, q7, \[sp, #?96\] +** ldp q0, q1, \[sp\], #?128 +** bl consume_q7 +** ... +*/ +void +test_q7 (int8x16_t *ptr) [[arm::streaming]] +{ + consume_q7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr); +} + +void consume_z7 (svint8_t, svint8_t, svint8_t, svint8_t, svint8_t, + svint8_t, svint8_t, svint8_t); + +/* +** test_z7: +** ... +** addvl sp, sp, #-8 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** str z4, \[sp, #4, mul vl\] +** str z5, \[sp, #5, mul vl\] +** str z6, \[sp, #6, mul vl\] +** str z7, \[sp, #7, mul vl\] +** smstop sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** ldr z4, \[sp, #4, mul vl\] +** ldr z5, \[sp, #5, mul vl\] +** ldr z6, \[sp, #6, mul vl\] +** ldr z7, \[sp, #7, mul vl\] +** addvl sp, sp, #8 +** bl consume_z7 +** ... +*/ +void +test_z7 (svint8_t *ptr) [[arm::streaming]] +{ + consume_z7 (*ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr, *ptr); +} + +void consume_p3 (svbool_t, svbool_t, svbool_t, svbool_t); + +/* +** test_p3: +** ... +** addvl sp, sp, #-1 +** str p0, \[sp\] +** str p1, \[sp, #1, mul vl\] +** str p2, \[sp, #2, mul vl\] +** str p3, \[sp, #3, mul vl\] +** smstop sm +** ldr p0, \[sp\] +** ldr p1, \[sp, #1, mul vl\] +** ldr p2, \[sp, #2, mul vl\] +** ldr p3, \[sp, #3, mul vl\] +** addvl sp, sp, #1 +** bl consume_p3 +** ... +*/ +void +test_p3 (svbool_t *ptr) [[arm::streaming]] +{ + consume_p3 (*ptr, *ptr, *ptr, *ptr); +} + +void consume_mixed (float, double, float32x4_t, svfloat32_t, + float, double, float64x2_t, svfloat64_t, + svbool_t, svbool_t, svbool_t, svbool_t); + +/* +** test_mixed: +** ... +** addvl sp, sp, #-3 +** str p0, \[sp\] +** str p1, \[sp, #1, mul vl\] +** str p2, \[sp, #2, mul vl\] +** str p3, \[sp, #3, mul vl\] +** str z3, \[sp, #1, mul vl\] +** str z7, \[sp, #2, mul vl\] +** stp q2, q6, \[sp, #?-32\]! +** fmov w10, s0 +** fmov x11, d1 +** fmov w12, s4 +** fmov x13, d5 +** smstop sm +** fmov s0, w10 +** fmov d1, x11 +** fmov s4, w12 +** fmov d5, x13 +** ldp q2, q6, \[sp\], #?32 +** ldr p0, \[sp\] +** ldr p1, \[sp, #1, mul vl\] +** ldr p2, \[sp, #2, mul vl\] +** ldr p3, \[sp, #3, mul vl\] +** ldr z3, \[sp, #1, mul vl\] +** ldr z7, \[sp, #2, mul vl\] +** addvl sp, sp, #3 +** bl consume_mixed +** ... +*/ +void +test_mixed (float32x4_t *float32x4_ptr, + svfloat32_t *svfloat32_ptr, + float64x2_t *float64x2_ptr, + svfloat64_t *svfloat64_ptr, + svbool_t *svbool_ptr) [[arm::streaming]] +{ + consume_mixed (1.0f, 2.0, *float32x4_ptr, *svfloat32_ptr, + 3.0f, 4.0, *float64x2_ptr, *svfloat64_ptr, + *svbool_ptr, *svbool_ptr, *svbool_ptr, *svbool_ptr); +} + +void consume_varargs (float, ...); + +/* +** test_varargs: +** ... +** stp q3, q7, \[sp, #?-32\]! +** fmov w10, s0 +** fmov x11, d1 +** ( +** fmov x12, d2 +** | +** umov x12, v2.d\[0\] +** ) +** fmov x13, d4 +** fmov x14, d5 +** ( +** fmov x15, d6 +** | +** umov x15, v6.d\[0\] +** ) +** smstop sm +** fmov s0, w10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d4, x13 +** fmov d5, x14 +** fmov d6, x15 +** ldp q3, q7, \[sp\], #?32 +** bl consume_varargs +** ... +*/ +void +test_varargs (float32x2_t *float32x2_ptr, + float32x4_t *float32x4_ptr, + float64x1_t *float64x1_ptr, + float64x2_t *float64x2_ptr) [[arm::streaming]] +{ + consume_varargs (1.0f, 2.0, *float32x2_ptr, *float32x4_ptr, + 3.0f, 4.0, *float64x1_ptr, *float64x2_ptr); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c new file mode 100644 index 00000000000..f44724df32f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c @@ -0,0 +1,87 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -msve-vector-bits=128" } +// { dg-final { check-function-bodies "**" "" } } + +#include + +svint8_t produce_z0 (); +void consume_z0 (svint8_t); + +/* +** test_z0: +** ... +** smstop sm +** bl produce_z0 +** str q0, \[sp, #?-16\]! +** smstart sm +** ldr q0, \[sp\], #?16 +** str q0, \[sp, #?-16\]! +** smstop sm +** ldr q0, \[sp\], #?16 +** bl consume_z0 +** ... +*/ +void +test_z0 () [[arm::streaming]] +{ + svint8_t res = produce_z0 (); + asm volatile (""); + consume_z0 (res); +} + +svint8x4_t produce_z3 (); +void consume_z3 (svint8x4_t); + +/* +** test_z3: +** ... +** smstop sm +** bl produce_z3 +** stp q0, q1, \[sp, #?-64\]! +** stp q2, q3, \[sp, #?32\] +** smstart sm +** ldp q2, q3, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?64 +** stp q0, q1, \[sp, #?-64\]! +** stp q2, q3, \[sp, #?32\] +** smstop sm +** ldp q2, q3, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?64 +** bl consume_z3 +** ... +*/ +void +test_z3 () [[arm::streaming]] +{ + svint8x4_t res = produce_z3 (); + asm volatile (""); + consume_z3 (res); +} + +svbool_t produce_p0 (); +void consume_p0 (svbool_t); + +/* +** test_p0: +** ... +** smstop sm +** bl produce_p0 +** sub sp, sp, #?16 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** add sp, sp, #?16 +** sub sp, sp, #?16 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** add sp, sp, #?16 +** bl consume_p0 +** ... +*/ +void +test_p0 () [[arm::streaming]] +{ + svbool_t res = produce_p0 (); + asm volatile (""); + consume_p0 (res); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c new file mode 100644 index 00000000000..83b4073eef3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_9.c @@ -0,0 +1,103 @@ +// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls -msve-vector-bits=256" } +// { dg-final { check-function-bodies "**" "" } } + +#include + +svint8_t produce_z0 (); +void consume_z0 (svint8_t); + +/* +** test_z0: +** ... +** smstop sm +** bl produce_z0 +** sub sp, sp, #?32 +** str z0, \[sp\] +** smstart sm +** ldr z0, \[sp\] +** add sp, sp, #?32 +** sub sp, sp, #?32 +** str z0, \[sp\] +** smstop sm +** ldr z0, \[sp\] +** add sp, sp, #?32 +** bl consume_z0 +** ... +*/ +void +test_z0 () [[arm::streaming]] +{ + svint8_t res = produce_z0 (); + asm volatile (""); + consume_z0 (res); +} + +svint8x4_t produce_z3 (); +void consume_z3 (svint8x4_t); + +/* +** test_z3: +** ... +** smstop sm +** bl produce_z3 +** sub sp, sp, #?128 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** smstart sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** add sp, sp, #?128 +** sub sp, sp, #?128 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** smstop sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** add sp, sp, #?128 +** bl consume_z3 +** ... +*/ +void +test_z3 () [[arm::streaming]] +{ + svint8x4_t res = produce_z3 (); + asm volatile (""); + consume_z3 (res); +} + +svbool_t produce_p0 (); +void consume_p0 (svbool_t); + +/* +** test_p0: +** ... +** smstop sm +** bl produce_p0 +** sub sp, sp, #?32 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** add sp, sp, #?32 +** sub sp, sp, #?32 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** add sp, sp, #?32 +** bl consume_p0 +** ... +*/ +void +test_p0 () [[arm::streaming]] +{ + svbool_t res = produce_p0 (); + asm volatile (""); + consume_p0 (res); +} From patchwork Fri Nov 17 17:26:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865161 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3kB6Yfhz1yS7 for ; Sat, 18 Nov 2023 04:27:14 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 189273858000 for ; Fri, 17 Nov 2023 17:27:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A58963857C69 for ; Fri, 17 Nov 2023 17:26:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A58963857C69 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A58963857C69 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242008; cv=none; b=fTDk9eQxE38BVhoUSE9OemucPp5wXx+al/+gd/c5qmgDkx4TWQEdAge+LVHPIqGkiIz+1L3uASpZLMIz9QgL+tiZwaxhgt5XYODZUswWHr5FOO+LUNAV6P54C+lzQxD/T78/4c8fzlkQXfSf839zUewV2IPsQnXzbYNToTsdezs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242008; c=relaxed/simple; bh=CF3fO1P5gDvTQN5uOLPbK2licnu3l/C4s1zQmGxQK+8=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=vEfOmm+NV89uiUK0ebmwrwLlBZbfIpU4pxIJNy/UvSCl+qqDJNiJLRbayHcJhSmsc6WQUYTRMfLzkaTsiSRcKfDMtmcEnMdh+5m47TJhPOD8r8jY4mNO+hJw01T9B8YZqxshN38AIxDi/TH+BMEp3QV7sD/wIOai43ADkFmcBG0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D3981477 for ; Fri, 17 Nov 2023 09:27:28 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9DF7F3F73F for ; Fri, 17 Nov 2023 09:26:41 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 12/21] aarch64: Add support for SME ZA attributes References: Date: Fri, 17 Nov 2023 17:26:40 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_FILL_THIS_FORM_SHORT, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org SME has an array called ZA that can be enabled and disabled separately from streaming mode. A status bit called PSTATE.ZA indicates whether ZA is currently enabled or not. In C and C++, the state of PSTATE.ZA is controlled using function attributes. There are four attributes that can be attached to function types to indicate that the function shares ZA with its caller. These are: - arm::in("za") - arm::out("za") - arm::inout("za") - arm::preserves("za") If a function's type has one of these shared-ZA attributes, PSTATE.ZA is specified to be 1 on entry to the function and on return from the function. Otherwise, the caller and callee have separate ZA contexts; they do not use ZA to share data. Although normal non-shared-ZA functions have a separate ZA context from their callers, nested uses of ZA are expected to be rare. The ABI therefore defines a cooperative lazy saving scheme that allows saves and restore of ZA to be kept to a minimum. (Callers still have the option of doing a full save and restore if they prefer.) Functions that want to use ZA internally have an arm::new("za") attribute, which tells the compiler to enable PSTATE.ZA for the duration of the function body. It also tells the compiler to commit any lazy save initiated by a caller. The patch uses various abstract hard registers to track dataflow relating to ZA. See the comments in the patch for details. The lazy save scheme is intended be transparent to most normal functions, so that they don't need to be recompiled for SME. This is reflected in the way that most normal functions ignore the new hard registers added in the patch. As with arm::streaming and arm::streaming_compatible, the attributes are also available as __arm_. This has two advantages: it triggers an error on compilers that don't understand the attributes, and it eases use on C, where [[...]] attributes were only added in C23. gcc/ * config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode. * config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p) (aarch64_output_rdsvl, aarch64_optimize_mode_switching) (aarch64_restore_za): Declare. * config/aarch64/constraints.md (UsR): New constraint. * config/aarch64/aarch64.md (LOWERING_REGNUM, TPIDR_BLOCK_REGNUM) (SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, ZA_FREE_REGNUM) (ZA_SAVED_REGNUM, ZA_REGNUM, FIRST_FAKE_REGNUM): New constants. (LAST_FAKE_REGNUM): Likewise. (UNSPEC_SAVE_NZCV, UNSPEC_RESTORE_NZCV, UNSPEC_SME_VQ): New unspecs. (arches): Add sme. (arch_enabled): Handle it. (*cb1): Rename to... (aarch64_cb1): ...this. (*movsi_aarch64): Add an alternative for RDSVL. (*movdi_aarch64): Likewise. (aarch64_save_nzcv, aarch64_restore_nzcv): New insns. * config/aarch64/aarch64-sme.md (UNSPEC_SMSTOP_ZA) (UNSPEC_INITIAL_ZERO_ZA, UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE) (UNSPEC_READ_TPIDR2, UNSPEC_WRITE_TPIDR2, UNSPEC_SETUP_LOCAL_TPIDR2) (UNSPEC_RESTORE_ZA, UNSPEC_START_PRIVATE_ZA_CALL): New unspecs. (UNSPEC_END_PRIVATE_ZA_CALL, UNSPEC_COMMIT_LAZY_SAVE): Likewise. (UNSPECV_ASM_UPDATE_ZA): New unspecv. (aarch64_tpidr2_save, aarch64_smstart_za, aarch64_smstop_za) (aarch64_initial_zero_za, aarch64_setup_local_tpidr2) (aarch64_clear_tpidr2, aarch64_write_tpidr2, aarch64_read_tpidr2) (aarch64_tpidr2_restore, aarch64_restore_za, aarch64_asm_update_za) (aarch64_start_private_za_call, aarch64_end_private_za_call) (aarch64_commit_lazy_save): New patterns. * config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros. (FIXED_REGISTERS, REGISTER_NAMES): Add the new fake ZA registers. (CALL_USED_REGISTERS): Replace with... (CALL_REALLY_USED_REGISTERS): ...this and add the fake ZA registers. (FIRST_PSEUDO_REGISTER): Bump to include the fake ZA registers. (FAKE_REGS): New register class. (REG_CLASS_NAMES): Update accordingly. (REG_CLASS_CONTENTS): Likewise. (machine_function::tpidr2_block): New member variable. (machine_function::tpidr2_block_ptr): Likewise. (machine_function::za_save_buffer): Likewise. (machine_function::next_asm_update_za_id): Likewise. (CUMULATIVE_ARGS::shared_za_flags): Likewise. (aarch64_mode_entity, aarch64_local_sme_state): New enums. (aarch64_tristate_mode): Likewise. (OPTIMIZE_MODE_SWITCHING, NUM_MODES_FOR_MODE_SWITCHING): Define. * config/aarch64/aarch64.cc (AARCH64_STATE_SHARED, AARCH64_STATE_IN) (AARCH64_STATE_OUT): New constants. (aarch64_attribute_shared_state_flags): New function. (aarch64_lookup_shared_state_flags, aarch64_fndecl_has_new_state) (aarch64_check_state_string, cmp_string_csts): Likewise. (aarch64_merge_string_arguments, aarch64_check_arm_new_against_type) (handle_arm_new, handle_arm_shared): Likewise. (handle_arm_new_za_attribute): New (aarch64_arm_attribute_table): Add new, preserves, in, out, and inout. (aarch64_hard_regno_nregs): Handle FAKE_REGS. (aarch64_hard_regno_mode_ok): Likewise. (aarch64_fntype_shared_flags, aarch64_fntype_pstate_za): New functions. (aarch64_fntype_isa_mode): Include aarch64_fntype_pstate_za. (aarch64_fndecl_has_state, aarch64_fndecl_pstate_za): New functions. (aarch64_fndecl_isa_mode): Include aarch64_fndecl_pstate_za. (aarch64_cfun_incoming_pstate_za, aarch64_cfun_shared_flags) (aarch64_cfun_has_new_state, aarch64_cfun_has_state): New functions. (aarch64_sme_vq_immediate, aarch64_sme_vq_unspec_p): Likewise. (aarch64_rdsvl_immediate_p, aarch64_output_rdsvl): Likewise. (aarch64_expand_mov_immediate): Handle RDSVL immediates. (aarch64_function_arg): Add the ZA sharing flags as a third limb of the PARALLEL. (aarch64_init_cumulative_args): Record the ZA sharing flags. (aarch64_extra_live_on_entry): New function. Handle the new ZA-related fake registers. (aarch64_epilogue_uses): Handle the new ZA-related fake registers. (aarch64_cannot_force_const_mem): Handle UNSPEC_SME_VQ constants. (aarch64_get_tpidr2_block, aarch64_get_tpidr2_ptr): New functions. (aarch64_init_tpidr2_block, aarch64_restore_za): Likewise. (aarch64_layout_frame): Check whether the current function creates new ZA state. Record that it clobbers LR if so. (aarch64_expand_prologue): Handle functions that create new ZA state. (aarch64_expand_epilogue): Likewise. (aarch64_create_tpidr2_block): New function. (aarch64_restore_za): Likewise. (aarch64_start_call_args): Disallow calls to shared-ZA functions from functions that have no ZA state. Emit a marker instruction before calls to private-ZA functions from functions that have SME state. (aarch64_expand_call): Add return registers for state that is managed via attributes. Record the use and clobber information for the ZA registers. (aarch64_end_call_args): New function. (aarch64_regno_regclass): Handle FAKE_REGS. (aarch64_class_max_nregs): Likewise. (aarch64_override_options_internal): Require TARGET_SME for functions that have ZA state. (aarch64_conditional_register_usage): Handle FAKE_REGS. (aarch64_mov_operand_p): Handle RDSVL immediates. (aarch64_comp_type_attributes): Check that the ZA sharing flags are equal. (aarch64_merge_decl_attributes): New function. (aarch64_optimize_mode_switching, aarch64_mode_emit_za_save_buffer) (aarch64_mode_emit_local_sme_state, aarch64_mode_emit): Likewise. (aarch64_insn_references_sme_state_p): Likewise. (aarch64_mode_needed_local_sme_state): Likewise. (aarch64_mode_needed_za_save_buffer, aarch64_mode_needed): Likewise. (aarch64_mode_after_local_sme_state, aarch64_mode_after): Likewise. (aarch64_local_sme_confluence, aarch64_mode_confluence): Likewise. (aarch64_one_shot_backprop, aarch64_local_sme_backprop): Likewise. (aarch64_mode_backprop, aarch64_mode_entry): Likewise. (aarch64_mode_exit, aarch64_mode_eh_handler): Likewise. (aarch64_mode_priority, aarch64_md_asm_adjust): Likewise. (TARGET_END_CALL_ARGS, TARGET_MERGE_DECL_ATTRIBUTES): Define. (TARGET_MODE_EMIT, TARGET_MODE_NEEDED, TARGET_MODE_AFTER): Likewise. (TARGET_MODE_CONFLUENCE, TARGET_MODE_BACKPROP): Likewise. (TARGET_MODE_ENTRY, TARGET_MODE_EXIT): Likewise. (TARGET_MODE_EH_HANDLER, TARGET_MODE_PRIORITY): Likewise. (TARGET_EXTRA_LIVE_ON_ENTRY): Likewise. (TARGET_MD_ASM_ADJUST): Use aarch64_md_asm_adjust. * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Define __arm_new, __arm_preserves,__arm_in, __arm_out, and __arm_inout. gcc/testsuite/ * gcc.target/aarch64/sme/za_state_1.c: New test. * gcc.target/aarch64/sme/za_state_2.c: Likewise. * gcc.target/aarch64/sme/za_state_3.c: Likewise. * gcc.target/aarch64/sme/za_state_4.c: Likewise. * gcc.target/aarch64/sme/za_state_5.c: Likewise. * gcc.target/aarch64/sme/za_state_6.c: Likewise. * g++.target/aarch64/sme/exceptions_1.C: Likewise. * gcc.target/aarch64/sme/keyword_macros_1.c: Add ZA macros. * g++.target/aarch64/sme/keyword_macros_1.C: Likewise. --- gcc/config/aarch64/aarch64-c.cc | 32 + gcc/config/aarch64/aarch64-isa-modes.def | 5 + gcc/config/aarch64/aarch64-protos.h | 5 + gcc/config/aarch64/aarch64-sme.md | 287 ++++ gcc/config/aarch64/aarch64.cc | 1371 ++++++++++++++++- gcc/config/aarch64/aarch64.h | 98 +- gcc/config/aarch64/aarch64.md | 81 +- gcc/config/aarch64/constraints.md | 6 + .../g++.target/aarch64/sme/exceptions_1.C | 189 +++ .../g++.target/aarch64/sme/keyword_macros_1.C | 5 + .../gcc.target/aarch64/sme/keyword_macros_1.c | 5 + .../gcc.target/aarch64/sme/za_state_1.c | 154 ++ .../gcc.target/aarch64/sme/za_state_2.c | 73 + .../gcc.target/aarch64/sme/za_state_3.c | 31 + .../gcc.target/aarch64/sme/za_state_4.c | 585 +++++++ .../gcc.target/aarch64/sme/za_state_5.c | 593 +++++++ .../gcc.target/aarch64/sme/za_state_6.c | 23 + 17 files changed, 3521 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/sme/exceptions_1.C create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc index 1603621b30d..9494e560be0 100644 --- a/gcc/config/aarch64/aarch64-c.cc +++ b/gcc/config/aarch64/aarch64-c.cc @@ -73,6 +73,8 @@ aarch64_define_unconditional_macros (cpp_reader *pfile) builtin_define ("__GCC_ASM_FLAG_OUTPUTS__"); + builtin_define ("__ARM_STATE_ZA"); + /* Define keyword attributes like __arm_streaming as macros that expand to the associated [[...]] attribute. Use __extension__ in the attribute for C, since the [[...]] syntax was only added in C23. */ @@ -86,6 +88,36 @@ aarch64_define_unconditional_macros (cpp_reader *pfile) DEFINE_ARM_KEYWORD_MACRO ("streaming_compatible"); #undef DEFINE_ARM_KEYWORD_MACRO + + /* Same for the keyword attributes that take arguments. The snag here + is that some old modes warn about or reject variadic arguments. */ + auto *cpp_opts = cpp_get_options (parse_in); + if (!cpp_opts->traditional) + { + auto old_warn_variadic_macros = cpp_opts->warn_variadic_macros; + auto old_cpp_warn_c90_c99_compat = cpp_opts->cpp_warn_c90_c99_compat; + + cpp_opts->warn_variadic_macros = false; + cpp_opts->cpp_warn_c90_c99_compat = 0; + +#define DEFINE_ARM_KEYWORD_MACRO_ARGS(NAME) \ + builtin_define_with_value ("__arm_" NAME "(...)", \ + lang_GNU_CXX () \ + ? "[[arm::" NAME "(__VA_ARGS__)]]" \ + : "[[__extension__ arm::" NAME \ + "(__VA_ARGS__)]]", 0); + + DEFINE_ARM_KEYWORD_MACRO_ARGS ("new"); + DEFINE_ARM_KEYWORD_MACRO_ARGS ("preserves"); + DEFINE_ARM_KEYWORD_MACRO_ARGS ("in"); + DEFINE_ARM_KEYWORD_MACRO_ARGS ("out"); + DEFINE_ARM_KEYWORD_MACRO_ARGS ("inout"); + +#undef DEFINE_ARM_KEYWORD_MACRO_ARGS + + cpp_opts->warn_variadic_macros = old_warn_variadic_macros; + cpp_opts->cpp_warn_c90_c99_compat = old_cpp_warn_c90_c99_compat; + } } /* Undefine/redefine macros that depend on the current backend state and may diff --git a/gcc/config/aarch64/aarch64-isa-modes.def b/gcc/config/aarch64/aarch64-isa-modes.def index 5915c98a896..c0ada35bd19 100644 --- a/gcc/config/aarch64/aarch64-isa-modes.def +++ b/gcc/config/aarch64/aarch64-isa-modes.def @@ -32,4 +32,9 @@ DEF_AARCH64_ISA_MODE(SM_ON) DEF_AARCH64_ISA_MODE(SM_OFF) +/* Indicates that PSTATE.ZA is known to be 1. The converse is that + PSTATE.ZA might be 0 or 1, depending on whether there is an uncommitted + lazy save. */ +DEF_AARCH64_ISA_MODE(ZA_ON) + #undef DEF_AARCH64_ISA_MODE diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d3a2c693f85..e4aa7009c07 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -804,6 +804,8 @@ bool aarch64_sve_addvl_addpl_immediate_p (rtx); bool aarch64_sve_vector_inc_dec_immediate_p (rtx); int aarch64_add_offset_temporaries (rtx); void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx); +bool aarch64_rdsvl_immediate_p (const_rtx); +char *aarch64_output_rdsvl (const_rtx); bool aarch64_mov_operand_p (rtx, machine_mode); rtx aarch64_reverse_mask (machine_mode, unsigned int); bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64); @@ -1084,4 +1086,7 @@ extern void aarch64_output_patchable_area (unsigned int, bool); extern void aarch64_adjust_reg_alloc_order (); +bool aarch64_optimize_mode_switching (aarch64_mode_entity); +void aarch64_restore_za (rtx); + #endif /* GCC_AARCH64_PROTOS_H */ diff --git a/gcc/config/aarch64/aarch64-sme.md b/gcc/config/aarch64/aarch64-sme.md index 52427b4f17a..d4973098e66 100644 --- a/gcc/config/aarch64/aarch64-sme.md +++ b/gcc/config/aarch64/aarch64-sme.md @@ -23,6 +23,7 @@ ;; == State management ;; ---- Test current state ;; ---- PSTATE.SM management +;; ---- PSTATE.ZA management ;; ========================================================================= ;; == State management @@ -169,3 +170,289 @@ (define_insn "aarch64_smstop_sm" "" "smstop\tsm" ) + +;; ------------------------------------------------------------------------- +;; ---- PSTATE.ZA management +;; ------------------------------------------------------------------------- +;; Includes: +;; - SMSTART ZA +;; - SMSTOP ZA +;; plus calls to support routines. +;; ------------------------------------------------------------------------- + +(define_c_enum "unspec" [ + UNSPEC_SMSTOP_ZA + UNSPEC_INITIAL_ZERO_ZA + UNSPEC_TPIDR2_SAVE + UNSPEC_TPIDR2_RESTORE + UNSPEC_READ_TPIDR2 + UNSPEC_WRITE_TPIDR2 + UNSPEC_SETUP_LOCAL_TPIDR2 + UNSPEC_RESTORE_ZA + UNSPEC_START_PRIVATE_ZA_CALL + UNSPEC_END_PRIVATE_ZA_CALL + UNSPEC_COMMIT_LAZY_SAVE +]) + +(define_c_enum "unspecv" [ + UNSPECV_ASM_UPDATE_ZA +]) + +;; Use the ABI-defined routine to commit an uncommitted lazy save. +;; This relies on the current PSTATE.ZA, so depends on SME_STATE_REGNUM. +;; The fake TPIDR2_SETUP_REGNUM register initially holds the incoming +;; value of the architected TPIDR2_EL0. +(define_insn "aarch64_tpidr2_save" + [(set (reg:DI ZA_FREE_REGNUM) + (unspec:DI [(reg:DI SME_STATE_REGNUM) + (reg:DI TPIDR2_SETUP_REGNUM)] UNSPEC_TPIDR2_SAVE)) + (clobber (reg:DI R14_REGNUM)) + (clobber (reg:DI R15_REGNUM)) + (clobber (reg:DI R16_REGNUM)) + (clobber (reg:DI R17_REGNUM)) + (clobber (reg:DI R18_REGNUM)) + (clobber (reg:DI R30_REGNUM)) + (clobber (reg:CC CC_REGNUM))] + "" + "bl\t__arm_tpidr2_save" +) + +;; Set PSTATE.ZA to 1. If ZA was previously dormant or active, +;; it remains in the same state afterwards, with the same contents. +;; Otherwise, it goes from off to on with zeroed contents. +;; +;; Later writes of TPIDR2_EL0 to a nonzero value must not be moved +;; up past this instruction, since that could create an invalid +;; combination of having an active lazy save while ZA is off. +;; Create an anti-dependence by reading the current contents +;; of TPIDR2_SETUP_REGNUM. +;; +;; Making this depend on ZA_FREE_REGNUM ensures that contents belonging +;; to the caller have already been saved. That isn't necessary for this +;; instruction itself, since PSTATE.ZA is already 1 if it contains data. +;; But doing this here means that other uses of ZA can just depend on +;; SME_STATE_REGNUM, rather than both SME_STATE_REGNUM and ZA_FREE_REGNUM. +(define_insn "aarch64_smstart_za" + [(set (reg:DI SME_STATE_REGNUM) + (const_int 1)) + (use (reg:DI TPIDR2_SETUP_REGNUM)) + (use (reg:DI ZA_FREE_REGNUM))] + "" + "smstart\tza" +) + +;; Disable ZA and discard its current contents. +;; +;; The ABI says that the ZA save buffer must be null whenever PSTATE.ZA +;; is zero, so earlier writes to TPIDR2_EL0 must not be moved down past +;; this instruction. Depend on TPIDR2_SETUP_REGNUM to ensure this. +;; +;; We can only turn ZA off once we know that it is free (i.e. doesn't +;; contain data belonging to the caller). Depend on ZA_FREE_REGNUM +;; to ensure this. +;; +;; We only turn ZA off when the current function's ZA state is dead, +;; or perhaps if we're sure that the contents are saved. Either way, +;; we know whether ZA is saved or not. +(define_insn "aarch64_smstop_za" + [(set (reg:DI SME_STATE_REGNUM) + (const_int 0)) + (set (reg:DI ZA_SAVED_REGNUM) + (unspec:DI [(reg:DI TPIDR2_SETUP_REGNUM) + (reg:DI ZA_FREE_REGNUM)] UNSPEC_SMSTOP_ZA))] + "" + "smstop\tza" +) + +;; Zero ZA after committing a lazy save. The sequencing is enforced +;; by reading ZA_FREE_REGNUM. +(define_insn "aarch64_initial_zero_za" + [(set (reg:DI ZA_REGNUM) + (unspec:DI [(reg:DI SME_STATE_REGNUM) + (reg:DI ZA_FREE_REGNUM)] UNSPEC_INITIAL_ZERO_ZA))] + "" + "zero\t{ za }" +) + +;; Initialize the abstract TPIDR2_BLOCK_REGNUM from the contents of +;; the current function's TPIDR2 block. Other instructions can then +;; depend on TPIDR2_BLOCK_REGNUM rather than on the memory block. +(define_insn "aarch64_setup_local_tpidr2" + [(set (reg:DI TPIDR2_BLOCK_REGNUM) + (unspec:DI [(match_operand:V16QI 0 "memory_operand" "m")] + UNSPEC_SETUP_LOCAL_TPIDR2))] + "" + "" + [(set_attr "type" "no_insn")] +) + +;; Clear TPIDR2_EL0, cancelling any uncommitted lazy save. +(define_insn "aarch64_clear_tpidr2" + [(set (reg:DI TPIDR2_SETUP_REGNUM) + (const_int 0))] + "" + "msr\ttpidr2_el0, xzr" +) + +;; Point TPIDR2_EL0 to the current function's TPIDR2 block, whose address +;; is given by operand 0. TPIDR2_BLOCK_REGNUM represents the contents of the +;; pointed-to block. +(define_insn "aarch64_write_tpidr2" + [(set (reg:DI TPIDR2_SETUP_REGNUM) + (unspec:DI [(match_operand 0 "pmode_register_operand" "r") + (reg:DI TPIDR2_BLOCK_REGNUM)] UNSPEC_WRITE_TPIDR2))] + "" + "msr\ttpidr2_el0, %0" +) + +;; Check whether ZA has been saved. The system depends on the value that +;; we wrote to TPIDR2_EL0 previously, so it depends on TPDIR2_SETUP_REGNUM. +(define_insn "aarch64_read_tpidr2" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(reg:DI TPIDR2_SETUP_REGNUM) + (reg:DI ZA_SAVED_REGNUM)] UNSPEC_READ_TPIDR2))] + "" + "mrs\t%0, tpidr2_el0" +) + +;; Use the ABI-defined routine to restore lazy-saved ZA contents +;; from the TPIDR2 block pointed to by X0. ZA must already be active. +(define_insn "aarch64_tpidr2_restore" + [(set (reg:DI ZA_SAVED_REGNUM) + (unspec:DI [(reg:DI R0_REGNUM)] UNSPEC_TPIDR2_RESTORE)) + (set (reg:DI SME_STATE_REGNUM) + (unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE)) + (clobber (reg:DI R14_REGNUM)) + (clobber (reg:DI R15_REGNUM)) + (clobber (reg:DI R16_REGNUM)) + (clobber (reg:DI R17_REGNUM)) + (clobber (reg:DI R18_REGNUM)) + (clobber (reg:DI R30_REGNUM)) + (clobber (reg:CC CC_REGNUM))] + "" + "bl\t__arm_tpidr2_restore" +) + +;; Check whether a lazy save set up by aarch64_save_za was committed +;; and restore the saved contents if so. +;; +;; Operand 0 is the address of the current function's TPIDR2 block. +(define_insn_and_split "aarch64_restore_za" + [(set (reg:DI ZA_SAVED_REGNUM) + (unspec:DI [(match_operand 0 "pmode_register_operand" "r") + (reg:DI SME_STATE_REGNUM) + (reg:DI TPIDR2_SETUP_REGNUM) + (reg:DI ZA_SAVED_REGNUM)] UNSPEC_RESTORE_ZA)) + (clobber (reg:DI R0_REGNUM)) + (clobber (reg:DI R14_REGNUM)) + (clobber (reg:DI R15_REGNUM)) + (clobber (reg:DI R16_REGNUM)) + (clobber (reg:DI R17_REGNUM)) + (clobber (reg:DI R18_REGNUM)) + (clobber (reg:DI R30_REGNUM)) + (clobber (reg:CC CC_REGNUM))] + "" + "#" + "&& epilogue_completed" + [(const_int 0)] + { + auto label = gen_label_rtx (); + auto tpidr2 = gen_rtx_REG (DImode, R16_REGNUM); + emit_insn (gen_aarch64_read_tpidr2 (tpidr2)); + auto jump = emit_likely_jump_insn (gen_aarch64_cbnedi1 (tpidr2, label)); + JUMP_LABEL (jump) = label; + + aarch64_restore_za (operands[0]); + emit_label (label); + DONE; + } +) + +;; This instruction is emitted after asms that alter ZA, in order to model +;; the effect on dataflow. The asm itself can't have ZA as an input or +;; an output, since there is no associated data type. Instead it retains +;; the original "za" clobber, which on its own would indicate that ZA +;; is dead. +;; +;; The operand is a unique identifier. +(define_insn "aarch64_asm_update_za" + [(set (reg:VNx16QI ZA_REGNUM) + (unspec_volatile:VNx16QI + [(reg:VNx16QI ZA_REGNUM) + (reg:DI SME_STATE_REGNUM) + (match_operand 0 "const_int_operand")] + UNSPECV_ASM_UPDATE_ZA))] + "" + "" + [(set_attr "type" "no_insn")] +) + +;; This pseudo-instruction is emitted as part of a call to a private-ZA +;; function from a function with ZA state. It marks a natural place to set +;; up a lazy save, if that turns out to be necessary. The save itself +;; is managed by the mode-switching pass. +(define_insn "aarch64_start_private_za_call" + [(set (reg:DI LOWERING_REGNUM) + (unspec:DI [(reg:DI LOWERING_REGNUM)] UNSPEC_START_PRIVATE_ZA_CALL))] + "" + "" + [(set_attr "type" "no_insn")] +) + +;; This pseudo-instruction is emitted as part of a call to a private-ZA +;; function from a function with ZA state. It marks a natural place to restore +;; the current function's ZA contents from the lazy save buffer, if that +;; turns out to be necessary. The save itself is managed by the +;; mode-switching pass. +(define_insn "aarch64_end_private_za_call" + [(set (reg:DI LOWERING_REGNUM) + (unspec:DI [(reg:DI LOWERING_REGNUM)] UNSPEC_END_PRIVATE_ZA_CALL))] + "" + "" + [(set_attr "type" "no_insn")] +) + +;; This pseudo-instruction is emitted before a private-ZA function uses +;; PSTATE.ZA state for the first time. The instruction checks whether +;; ZA currently contains data belonging to a caller and commits the +;; lazy save if so. +;; +;; Operand 0 is the incoming value of TPIDR2_EL0. Operand 1 is nonzero +;; if ZA is live, and should therefore be zeroed after committing a save. +;; +;; The instruction is generated by the mode-switching pass. It is a +;; define_insn_and_split rather than a define_expand because of the +;; internal control flow. +(define_insn_and_split "aarch64_commit_lazy_save" + [(set (reg:DI ZA_FREE_REGNUM) + (unspec:DI [(match_operand 0 "pmode_register_operand" "r") + (match_operand 1 "const_int_operand") + (reg:DI SME_STATE_REGNUM) + (reg:DI TPIDR2_SETUP_REGNUM) + (reg:VNx16QI ZA_REGNUM)] UNSPEC_COMMIT_LAZY_SAVE)) + (set (reg:DI ZA_REGNUM) + (unspec:DI [(reg:DI SME_STATE_REGNUM) + (reg:DI ZA_FREE_REGNUM)] UNSPEC_INITIAL_ZERO_ZA)) + (clobber (reg:DI R14_REGNUM)) + (clobber (reg:DI R15_REGNUM)) + (clobber (reg:DI R16_REGNUM)) + (clobber (reg:DI R17_REGNUM)) + (clobber (reg:DI R18_REGNUM)) + (clobber (reg:DI R30_REGNUM)) + (clobber (reg:CC CC_REGNUM))] + "" + "#" + "true" + [(const_int 0)] + { + auto label = gen_label_rtx (); + auto jump = emit_jump_insn (gen_aarch64_cbeqdi1 (operands[0], label)); + JUMP_LABEL (jump) = label; + emit_insn (gen_aarch64_tpidr2_save ()); + emit_insn (gen_aarch64_clear_tpidr2 ()); + if (INTVAL (operands[1]) != 0) + emit_insn (gen_aarch64_initial_zero_za ()); + emit_label (label); + DONE; + } +) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 6d5e9056c65..2782feef0f3 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -94,6 +94,26 @@ /* Defined for convenience. */ #define POINTER_BYTES (POINTER_SIZE / BITS_PER_UNIT) +/* Flags that describe how a function shares certain architectural state + with its callers. + + - AARCH64_STATE_SHARED indicates that the function does share the state + with callers. + + - AARCH64_STATE_IN indicates that the function reads (or might read) the + incoming state. The converse is that the function ignores the incoming + state. + + - AARCH64_STATE_OUT indicates that the function returns new state. + The converse is that the state on return is the same as it was on entry. + + A function that partially modifies the state treats it as both IN + and OUT (because the value on return depends to some extent on the + value on input). */ +constexpr auto AARCH64_STATE_SHARED = 1U << 0; +constexpr auto AARCH64_STATE_IN = 1U << 1; +constexpr auto AARCH64_STATE_OUT = 1U << 2; + /* Information about a legitimate vector immediate operand. */ struct simd_immediate_info { @@ -2812,6 +2832,151 @@ static const struct processor all_cores[] = /* The current tuning set. */ struct tune_params aarch64_tune_params = generic_tunings; +/* If NAME is the name of an arm:: attribute that describes shared state, + return its associated AARCH64_STATE_* flags, otherwise return 0. */ +static unsigned int +aarch64_attribute_shared_state_flags (const char *name) +{ + if (strcmp (name, "in") == 0) + return AARCH64_STATE_SHARED | AARCH64_STATE_IN; + if (strcmp (name, "inout") == 0) + return AARCH64_STATE_SHARED | AARCH64_STATE_IN | AARCH64_STATE_OUT; + if (strcmp (name, "out") == 0) + return AARCH64_STATE_SHARED | AARCH64_STATE_OUT; + if (strcmp (name, "preserves") == 0) + return AARCH64_STATE_SHARED; + return 0; +} + +/* See whether attribute list ATTRS has any sharing information + for state STATE_NAME. Return the associated state flags if so, + otherwise return 0. */ +static unsigned int +aarch64_lookup_shared_state_flags (tree attrs, const char *state_name) +{ + for (tree attr = attrs; attr; attr = TREE_CHAIN (attr)) + { + if (!cxx11_attribute_p (attr)) + continue; + + auto ns = IDENTIFIER_POINTER (TREE_PURPOSE (TREE_PURPOSE (attr))); + if (strcmp (ns, "arm") != 0) + continue; + + auto attr_name = IDENTIFIER_POINTER (TREE_VALUE (TREE_PURPOSE (attr))); + auto flags = aarch64_attribute_shared_state_flags (attr_name); + if (!flags) + continue; + + for (tree arg = TREE_VALUE (attr); arg; arg = TREE_CHAIN (arg)) + { + tree value = TREE_VALUE (arg); + if (TREE_CODE (value) == STRING_CST + && strcmp (TREE_STRING_POINTER (value), state_name) == 0) + return flags; + } + } + return 0; +} + +/* Return true if DECL creates a new scope for state STATE_STRING. */ +static bool +aarch64_fndecl_has_new_state (const_tree decl, const char *state_name) +{ + if (tree attr = lookup_attribute ("arm", "new", DECL_ATTRIBUTES (decl))) + for (tree arg = TREE_VALUE (attr); arg; arg = TREE_CHAIN (arg)) + { + tree value = TREE_VALUE (arg); + if (TREE_CODE (value) == STRING_CST + && strcmp (TREE_STRING_POINTER (value), state_name) == 0) + return true; + } + return false; +} + +/* Return true if attribute argument VALUE is a recognized state string, + otherwise report an error. NAME is the name of the attribute to which + VALUE is being passed. */ +static bool +aarch64_check_state_string (tree name, tree value) +{ + if (TREE_CODE (value) != STRING_CST) + { + error ("the arguments to %qE must be constant strings", name); + return false; + } + + const char *state_name = TREE_STRING_POINTER (value); + if (strcmp (state_name, "za") != 0) + { + error ("unrecognized state string %qs", state_name); + return false; + } + + return true; +} + +/* qsort callback to compare two STRING_CSTs. */ +static int +cmp_string_csts (const void *a, const void *b) +{ + return strcmp (TREE_STRING_POINTER (*(const_tree const *) a), + TREE_STRING_POINTER (*(const_tree const *) b)); +} + +/* Canonicalize a list of state strings. ARGS contains the arguments to + a new attribute while OLD_ATTR, if nonnull, contains a previous attribute + of the same type. If CAN_MERGE_IN_PLACE, it is safe to adjust OLD_ATTR's + arguments and drop the new attribute. Otherwise, the new attribute must + be kept and ARGS must include the information in OLD_ATTR. + + In both cases, the new arguments must be a sorted list of state strings + with duplicates removed. + + Return true if new attribute should be kept, false if it should be + dropped. */ +static bool +aarch64_merge_string_arguments (tree args, tree old_attr, + bool can_merge_in_place) +{ + /* Get a sorted list of all state strings (including duplicates). */ + auto add_args = [](vec &strings, const_tree args) + { + for (const_tree arg = args; arg; arg = TREE_CHAIN (arg)) + if (TREE_CODE (TREE_VALUE (arg)) == STRING_CST) + strings.safe_push (TREE_VALUE (arg)); + }; + auto_vec strings; + add_args (strings, args); + if (old_attr) + add_args (strings, TREE_VALUE (old_attr)); + strings.qsort (cmp_string_csts); + + /* The list can be empty if there was no previous attribute and if all + the new arguments are erroneous. Drop the attribute in that case. */ + if (strings.is_empty ()) + return false; + + /* Destructively modify one of the argument lists, removing duplicates + on the fly. */ + bool use_old_attr = old_attr && can_merge_in_place; + tree *end = use_old_attr ? &TREE_VALUE (old_attr) : &args; + tree prev = NULL_TREE; + for (tree arg : strings) + { + if (prev && simple_cst_equal (arg, prev)) + continue; + prev = arg; + if (!*end) + *end = tree_cons (NULL_TREE, arg, NULL_TREE); + else + TREE_VALUE (*end) = arg; + end = &TREE_CHAIN (*end); + } + *end = NULL_TREE; + return !use_old_attr; +} + /* Check whether an 'aarch64_vector_pcs' attribute is valid. */ static tree @@ -2840,6 +3005,101 @@ handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree, gcc_unreachable (); } +/* Return true if arm::new(ARGS) is compatible with the type of decl DECL, + otherwise report an error. */ +static bool +aarch64_check_arm_new_against_type (tree args, tree decl) +{ + tree type_attrs = TYPE_ATTRIBUTES (TREE_TYPE (decl)); + for (tree arg = args; arg; arg = TREE_CHAIN (arg)) + { + tree value = TREE_VALUE (arg); + if (TREE_CODE (value) == STRING_CST) + { + const char *state_name = TREE_STRING_POINTER (value); + if (aarch64_lookup_shared_state_flags (type_attrs, state_name)) + { + error_at (DECL_SOURCE_LOCATION (decl), + "cannot create a new %qs scope since %qs is shared" + " with callers", state_name, state_name); + return false; + } + } + } + return true; +} + +/* Callback for arm::new attributes. */ +static tree +handle_arm_new (tree *node, tree name, tree args, int, bool *no_add_attrs) +{ + tree decl = *node; + if (TREE_CODE (decl) != FUNCTION_DECL) + { + error ("%qE attribute applies only to function definitions", name); + *no_add_attrs = true; + return NULL_TREE; + } + if (TREE_TYPE (decl) == error_mark_node) + { + *no_add_attrs = true; + return NULL_TREE; + } + + for (tree arg = args; arg; arg = TREE_CHAIN (arg)) + aarch64_check_state_string (name, TREE_VALUE (arg)); + + if (!aarch64_check_arm_new_against_type (args, decl)) + { + *no_add_attrs = true; + return NULL_TREE; + } + + /* If there is an old attribute, we should try to update it in-place, + so that there is only one (definitive) arm::new attribute on the decl. */ + tree old_attr = lookup_attribute ("arm", "new", DECL_ATTRIBUTES (decl)); + if (!aarch64_merge_string_arguments (args, old_attr, true)) + *no_add_attrs = true; + + return NULL_TREE; +} + +/* Callback for arm::{in,out,inout,preserves} attributes. */ +static tree +handle_arm_shared (tree *node, tree name, tree args, + int, bool *no_add_attrs) +{ + tree type = *node; + tree old_attrs = TYPE_ATTRIBUTES (type); + auto flags = aarch64_attribute_shared_state_flags (IDENTIFIER_POINTER (name)); + for (tree arg = args; arg; arg = TREE_CHAIN (arg)) + { + tree value = TREE_VALUE (arg); + if (aarch64_check_state_string (name, value)) + { + const char *state_name = TREE_STRING_POINTER (value); + auto old_flags = aarch64_lookup_shared_state_flags (old_attrs, + state_name); + if (old_flags && old_flags != flags) + { + error ("inconsistent attributes for state %qs", state_name); + *no_add_attrs = true; + return NULL_TREE; + } + } + } + + /* We can't update an old attribute in-place, since types are shared. + Instead make sure that this new attribute contains all the + information, so that the old attribute becomes redundant. */ + tree old_attr = lookup_attribute ("arm", IDENTIFIER_POINTER (name), + old_attrs); + if (!aarch64_merge_string_arguments (args, old_attr, false)) + *no_add_attrs = true; + + return NULL_TREE; +} + /* Mutually-exclusive function type attributes for controlling PSTATE.SM. */ static const struct attribute_spec::exclusions attr_streaming_exclusions[] = { @@ -2876,6 +3136,16 @@ static const attribute_spec aarch64_arm_attributes[] = NULL, attr_streaming_exclusions }, { "streaming_compatible", 0, 0, false, true, true, true, NULL, attr_streaming_exclusions }, + { "new", 1, -1, true, false, false, false, + handle_arm_new, NULL }, + { "preserves", 1, -1, false, true, true, true, + handle_arm_shared, NULL }, + { "in", 1, -1, false, true, true, true, + handle_arm_shared, NULL }, + { "out", 1, -1, false, true, true, true, + handle_arm_shared, NULL }, + { "inout", 1, -1, false, true, true, true, + handle_arm_shared, NULL } }; static const scoped_attribute_specs aarch64_arm_attribute_table = @@ -3990,6 +4260,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode) case PR_HI_REGS: case FFR_REGS: case PR_AND_FFR_REGS: + case FAKE_REGS: return 1; default: return CEIL (lowest_size, UNITS_PER_WORD); @@ -4020,6 +4291,10 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode) if (pr_or_ffr_regnum_p (regno)) return false; + /* These registers are abstract; their modes don't matter. */ + if (FAKE_REGNUM_P (regno)) + return true; + if (regno == SP_REGNUM) /* The purpose of comparing with ptr_mode is to support the global register variable associated with the stack pointer @@ -4140,12 +4415,34 @@ aarch64_fntype_pstate_sm (const_tree fntype) return AARCH64_FL_SM_OFF; } +/* Return state flags that describe whether and how functions of type + FNTYPE share state STATE_NAME with their callers. */ + +static unsigned int +aarch64_fntype_shared_flags (const_tree fntype, const char *state_name) +{ + return aarch64_lookup_shared_state_flags (TYPE_ATTRIBUTES (fntype), + state_name); +} + +/* Return the state of PSTATE.ZA on entry to functions of type FNTYPE. */ + +static aarch64_feature_flags +aarch64_fntype_pstate_za (const_tree fntype) +{ + if (aarch64_fntype_shared_flags (fntype, "za")) + return AARCH64_FL_ZA_ON; + + return 0; +} + /* Return the ISA mode on entry to functions of type FNTYPE. */ static aarch64_feature_flags aarch64_fntype_isa_mode (const_tree fntype) { - return aarch64_fntype_pstate_sm (fntype); + return (aarch64_fntype_pstate_sm (fntype) + | aarch64_fntype_pstate_za (fntype)); } /* Return the state of PSTATE.SM when compiling the body of @@ -4158,13 +4455,37 @@ aarch64_fndecl_pstate_sm (const_tree fndecl) return aarch64_fntype_pstate_sm (TREE_TYPE (fndecl)); } +/* Return true if function FNDECL has state STATE_NAME, either by creating + new state itself or by sharing state with callers. */ + +static bool +aarch64_fndecl_has_state (tree fndecl, const char *state_name) +{ + return (aarch64_fndecl_has_new_state (fndecl, state_name) + || aarch64_fntype_shared_flags (TREE_TYPE (fndecl), + state_name) != 0); +} + +/* Return the state of PSTATE.ZA when compiling the body of function FNDECL. + This might be different from the state of PSTATE.ZA on entry. */ + +static aarch64_feature_flags +aarch64_fndecl_pstate_za (const_tree fndecl) +{ + if (aarch64_fndecl_has_new_state (fndecl, "za")) + return AARCH64_FL_ZA_ON; + + return aarch64_fntype_pstate_za (TREE_TYPE (fndecl)); +} + /* Return the ISA mode that should be used to compile the body of function FNDECL. */ static aarch64_feature_flags aarch64_fndecl_isa_mode (const_tree fndecl) { - return aarch64_fndecl_pstate_sm (fndecl); + return (aarch64_fndecl_pstate_sm (fndecl) + | aarch64_fndecl_pstate_za (fndecl)); } /* Return the state of PSTATE.SM on entry to the current function. @@ -4177,6 +4498,44 @@ aarch64_cfun_incoming_pstate_sm () return aarch64_fntype_pstate_sm (TREE_TYPE (cfun->decl)); } +/* Return the state of PSTATE.ZA on entry to the current function. + This might be different from the state of PSTATE.ZA in the function + body. */ + +static aarch64_feature_flags +aarch64_cfun_incoming_pstate_za () +{ + return aarch64_fntype_pstate_za (TREE_TYPE (cfun->decl)); +} + +/* Return state flags that describe whether and how the current function shares + state STATE_NAME with callers. */ + +static unsigned int +aarch64_cfun_shared_flags (const char *state_name) +{ + return aarch64_fntype_shared_flags (TREE_TYPE (cfun->decl), state_name); +} + +/* Return true if the current function creates new state of type STATE_NAME + (as opposed to sharing the state with its callers or ignoring the state + altogether). */ + +static bool +aarch64_cfun_has_new_state (const char *state_name) +{ + return aarch64_fndecl_has_new_state (cfun->decl, state_name); +} + +/* Return true if the current function has state STATE_NAME, either by + creating new state itself or by sharing state with callers. */ + +static bool +aarch64_cfun_has_state (const char *state_name) +{ + return aarch64_fndecl_has_state (cfun->decl, state_name); +} + /* Return true if a call from the current function to a function with ISA mode CALLEE_MODE would involve a change to PSTATE.SM around the BL instruction. */ @@ -5740,6 +6099,74 @@ aarch64_output_sve_vector_inc_dec (const char *operands, rtx x) factor, nelts_per_vq); } +/* Return a constant that represents FACTOR multiplied by the + number of 128-bit quadwords in an SME vector. ISA_MODE is the + ISA mode in which the calculation is being performed. */ + +static rtx +aarch64_sme_vq_immediate (machine_mode mode, HOST_WIDE_INT factor, + aarch64_feature_flags isa_mode) +{ + gcc_assert (aarch64_sve_rdvl_factor_p (factor)); + if (isa_mode & AARCH64_FL_SM_ON) + /* We're in streaming mode, so we can use normal poly-int values. */ + return gen_int_mode ({ factor, factor }, mode); + + rtvec vec = gen_rtvec (1, gen_int_mode (factor, SImode)); + rtx unspec = gen_rtx_UNSPEC (mode, vec, UNSPEC_SME_VQ); + return gen_rtx_CONST (mode, unspec); +} + +/* Return true if X is a constant that represents some number X + multiplied by the number of quadwords in an SME vector. Store this X + in *FACTOR if so. */ + +static bool +aarch64_sme_vq_unspec_p (const_rtx x, HOST_WIDE_INT *factor) +{ + if (!TARGET_SME || GET_CODE (x) != CONST) + return false; + + x = XEXP (x, 0); + if (GET_CODE (x) != UNSPEC + || XINT (x, 1) != UNSPEC_SME_VQ + || XVECLEN (x, 0) != 1) + return false; + + x = XVECEXP (x, 0, 0); + if (!CONST_INT_P (x)) + return false; + + *factor = INTVAL (x); + return true; +} + +/* Return true if X is a constant that represents some number Y + multiplied by the number of quadwords in an SME vector, and if + that Y is in the range of RDSVL. */ + +bool +aarch64_rdsvl_immediate_p (const_rtx x) +{ + HOST_WIDE_INT factor; + return (aarch64_sme_vq_unspec_p (x, &factor) + && aarch64_sve_rdvl_factor_p (factor)); +} + +/* Return the asm string for an RDSVL instruction that calculates X, + which is a constant that satisfies aarch64_rdsvl_immediate_p. */ + +char * +aarch64_output_rdsvl (const_rtx x) +{ + gcc_assert (aarch64_rdsvl_immediate_p (x)); + static char buffer[sizeof ("rdsvl\t%x0, #-") + 3 * sizeof (int)]; + x = XVECEXP (XEXP (x, 0), 0, 0); + snprintf (buffer, sizeof (buffer), "rdsvl\t%%x0, #%d", + (int) INTVAL (x) / 16); + return buffer; +} + /* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */ static const unsigned HOST_WIDE_INT bitmask_imm_mul[] = @@ -7555,6 +7982,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) return; } + if (aarch64_rdsvl_immediate_p (base)) + { + /* We could handle non-constant offsets if they are ever + generated. */ + gcc_assert (const_offset == 0); + emit_insn (gen_rtx_SET (dest, imm)); + return; + } + sty = aarch64_classify_symbol (base, const_offset); switch (sty) { @@ -8701,8 +9137,10 @@ aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg) rtx abi_cookie = aarch64_gen_callee_cookie (pcum->isa_mode, pcum->pcs_variant); rtx sme_mode_switch_args = aarch64_finish_sme_mode_switch_args (pcum); - return gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, abi_cookie, - sme_mode_switch_args)); + rtx shared_za_flags = gen_int_mode (pcum->shared_za_flags, SImode); + return gen_rtx_PARALLEL (VOIDmode, gen_rtvec (3, abi_cookie, + sme_mode_switch_args, + shared_za_flags)); } aarch64_layout_arg (pcum_v, arg); @@ -8713,7 +9151,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum, const_tree fntype, rtx libname ATTRIBUTE_UNUSED, - const_tree fndecl ATTRIBUTE_UNUSED, + const_tree fndecl, unsigned n_named ATTRIBUTE_UNUSED, bool silent_p) { @@ -8738,6 +9176,8 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum, pcum->aapcs_stack_words = 0; pcum->aapcs_stack_size = 0; pcum->silent_p = silent_p; + pcum->shared_za_flags + = (fntype ? aarch64_fntype_shared_flags (fntype, "za") : 0U); pcum->num_sme_mode_switch_args = 0; if (!silent_p @@ -10830,14 +11270,31 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, } } +/* Implement TARGET_EXTRA_LIVE_ON_ENTRY. */ + +void +aarch64_extra_live_on_entry (bitmap regs) +{ + if (TARGET_ZA) + { + bitmap_set_bit (regs, LOWERING_REGNUM); + bitmap_set_bit (regs, SME_STATE_REGNUM); + bitmap_set_bit (regs, TPIDR2_SETUP_REGNUM); + bitmap_set_bit (regs, ZA_FREE_REGNUM); + bitmap_set_bit (regs, ZA_SAVED_REGNUM); + + /* The only time ZA can't have live contents on entry is when + the function explicitly treats it as a pure output. */ + auto za_flags = aarch64_cfun_shared_flags ("za"); + if (za_flags != (AARCH64_STATE_SHARED | AARCH64_STATE_OUT)) + bitmap_set_bit (regs, ZA_REGNUM); + } +} + /* Return 1 if the register is used by the epilogue. We need to say the return register is used, but only after epilogue generation is complete. Note that in the case of sibcalls, the values "used by the epilogue" are - considered live at the start of the called function. - - For SIMD functions we need to return 1 for FP registers that are saved and - restored by a function but are not zero in call_used_regs. If we do not do - this optimizations may remove the restore of the register. */ + considered live at the start of the called function. */ int aarch64_epilogue_uses (int regno) @@ -10847,6 +11304,18 @@ aarch64_epilogue_uses (int regno) if (regno == LR_REGNUM) return 1; } + if (regno == LOWERING_REGNUM && TARGET_ZA) + return 1; + if (regno == SME_STATE_REGNUM && TARGET_ZA) + return 1; + if (regno == TPIDR2_SETUP_REGNUM && TARGET_ZA) + return 1; + /* If the function shares SME state with its caller, ensure that that + data is not in the lazy save buffer on exit. */ + if (regno == ZA_SAVED_REGNUM && aarch64_cfun_incoming_pstate_za () != 0) + return 1; + if (regno == ZA_REGNUM && aarch64_cfun_shared_flags ("za") != 0) + return 1; return 0; } @@ -11528,8 +11997,10 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) /* There's no way to calculate VL-based values using relocations. */ subrtx_iterator::array_type array; + HOST_WIDE_INT factor; FOR_EACH_SUBRTX (iter, array, x, ALL) - if (GET_CODE (*iter) == CONST_POLY_INT) + if (GET_CODE (*iter) == CONST_POLY_INT + || aarch64_sme_vq_unspec_p (x, &factor)) return true; poly_int64 offset; @@ -12392,6 +12863,72 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2) return true; } +/* Return a fresh memory reference to the current function's TPIDR2 block, + creating a block if necessary. */ + +static rtx +aarch64_get_tpidr2_block () +{ + if (!cfun->machine->tpidr2_block) + /* The TPIDR2 block is 16 bytes in size and must be aligned to a 128-bit + boundary. */ + cfun->machine->tpidr2_block = assign_stack_local (V16QImode, 16, 128); + return copy_rtx (cfun->machine->tpidr2_block); +} + +/* Return a fresh register that points to the current function's + TPIDR2 block, creating a block if necessary. */ + +static rtx +aarch64_get_tpidr2_ptr () +{ + rtx block = aarch64_get_tpidr2_block (); + return force_reg (Pmode, XEXP (block, 0)); +} + +/* Emit instructions to allocate a ZA lazy save buffer and initialize the + current function's TPIDR2 block. */ + +static void +aarch64_init_tpidr2_block () +{ + rtx block = aarch64_get_tpidr2_block (); + + /* The ZA save buffer is SVL.B*SVL.B bytes in size. */ + rtx svl_bytes = aarch64_sme_vq_immediate (Pmode, 16, AARCH64_ISA_MODE); + rtx svl_bytes_reg = force_reg (DImode, svl_bytes); + rtx za_size = expand_simple_binop (Pmode, MULT, svl_bytes_reg, + svl_bytes_reg, NULL, 0, OPTAB_LIB_WIDEN); + rtx za_save_buffer = allocate_dynamic_stack_space (za_size, 128, + BITS_PER_UNIT, -1, true); + za_save_buffer = force_reg (Pmode, za_save_buffer); + cfun->machine->za_save_buffer = za_save_buffer; + + /* The first word of the block points to the save buffer and the second + word is the number of ZA slices to save. */ + rtx block_0 = adjust_address (block, DImode, 0); + rtx block_8 = adjust_address (block, DImode, 8); + emit_insn (gen_store_pair_dw_didi (block_0, za_save_buffer, + block_8, svl_bytes_reg)); + + if (!memory_operand (block, V16QImode)) + block = replace_equiv_address (block, force_reg (Pmode, XEXP (block, 0))); + emit_insn (gen_aarch64_setup_local_tpidr2 (block)); +} + +/* Restore the contents of ZA from the lazy save buffer, given that + register TPIDR2_BLOCK points to the current function's TPIDR2 block. + PSTATE.ZA is known to be 0 and TPIDR2_EL0 is known to be null. */ + +void +aarch64_restore_za (rtx tpidr2_block) +{ + emit_insn (gen_aarch64_smstart_za ()); + if (REGNO (tpidr2_block) != R0_REGNUM) + emit_move_insn (gen_rtx_REG (Pmode, R0_REGNUM), tpidr2_block); + emit_insn (gen_aarch64_tpidr2_restore ()); +} + /* Implement TARGET_START_CALL_ARGS. */ static void @@ -12407,6 +12944,20 @@ aarch64_start_call_args (cumulative_args_t ca_v) " option %<-march%>, or by using the %" " attribute or pragma", "sme"); } + + if ((ca->shared_za_flags & (AARCH64_STATE_IN | AARCH64_STATE_OUT)) + && !aarch64_cfun_has_state ("za")) + error ("call to a function that shares %qs state from a function" + " that has no %qs state", "za", "za"); + else if (!TARGET_ZA && (ca->isa_mode & AARCH64_FL_ZA_ON)) + error ("call to a function that shares SME state from a function" + " that has no SME state"); + + /* If this is a call to a private ZA function, emit a marker to + indicate where any necessary set-up code could be inserted. + The code itself is inserted by the mode-switching pass. */ + if (TARGET_ZA && !(ca->isa_mode & AARCH64_FL_ZA_ON)) + emit_insn (gen_aarch64_start_private_za_call ()); } /* This function is used by the call expanders of the machine description. @@ -12419,6 +12970,8 @@ aarch64_start_call_args (cumulative_args_t ca_v) The second element is a PARALLEL that lists all the argument registers that need to be saved and restored around a change in PSTATE.SM, or const0_rtx if no such switch is needed. + The third element is a const_int that contains the sharing flags + for ZA. SIBCALL indicates whether this function call is normal call or sibling call. It will generate different pattern accordingly. */ @@ -12431,10 +12984,12 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall) rtx callee_abi = cookie; rtx sme_mode_switch_args = const0_rtx; + unsigned int shared_za_flags = 0; if (GET_CODE (cookie) == PARALLEL) { callee_abi = XVECEXP (cookie, 0, 0); sme_mode_switch_args = XVECEXP (cookie, 0, 1); + shared_za_flags = INTVAL (XVECEXP (cookie, 0, 2)); } gcc_assert (CONST_INT_P (callee_abi)); @@ -12454,6 +13009,41 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall) : !REG_P (callee)) XEXP (mem, 0) = force_reg (mode, callee); + /* Accumulate the return values, including state that is shared via + attributes. */ + auto_vec return_values; + if (result) + { + if (GET_CODE (result) == PARALLEL) + for (int i = 0; i < XVECLEN (result, 0); ++i) + return_values.safe_push (XVECEXP (result, 0, i)); + else + return_values.safe_push (result); + } + unsigned int orig_num_return_values = return_values.length (); + if (shared_za_flags & AARCH64_STATE_OUT) + return_values.safe_push (gen_rtx_REG (VNx16BImode, ZA_REGNUM)); + /* When calling private-ZA functions from functions with ZA state, + we want to know whether the call committed a lazy save. */ + if (TARGET_ZA && !shared_za_flags) + return_values.safe_push (gen_rtx_REG (VNx16BImode, ZA_SAVED_REGNUM)); + + /* Create the new return value, if necessary. */ + if (orig_num_return_values != return_values.length ()) + { + if (return_values.length () == 1) + result = return_values[0]; + else + { + for (rtx &x : return_values) + if (GET_CODE (x) != EXPR_LIST) + x = gen_rtx_EXPR_LIST (VOIDmode, x, const0_rtx); + rtvec v = gen_rtvec_v (return_values.length (), + return_values.address ()); + result = gen_rtx_PARALLEL (VOIDmode, v); + } + } + call = gen_rtx_CALL (VOIDmode, mem, const0_rtx); if (result != NULL_RTX) @@ -12520,6 +13110,50 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, bool sibcall) cfun->machine->call_switches_pstate_sm = true; } + + /* Add any ZA-related information. + ZA_REGNUM represents the current function's ZA state, rather than + the contents of the ZA register itself. We ensure that the function's + ZA state is preserved by private-ZA call sequences, so the call itself + does not use or clobber ZA_REGNUM. */ + if (TARGET_ZA) + { + /* The callee requires ZA to be active if the callee is shared-ZA, + otherwise it requires ZA to be dormant or off. The state of ZA is + captured by a combination of SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, + and ZA_SAVED_REGNUM. */ + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (DImode, SME_STATE_REGNUM)); + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (DImode, TPIDR2_SETUP_REGNUM)); + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (VNx16BImode, ZA_SAVED_REGNUM)); + + /* Keep the aarch64_start/end_private_za_call markers live. */ + if (!(callee_isa_mode & AARCH64_FL_ZA_ON)) + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (VNx16BImode, LOWERING_REGNUM)); + + /* If the callee is a shared-ZA function, record whether it uses the + current value of ZA. */ + if (shared_za_flags & AARCH64_STATE_IN) + use_reg (&CALL_INSN_FUNCTION_USAGE (call_insn), + gen_rtx_REG (VNx16BImode, ZA_REGNUM)); + } +} + +/* Implement TARGET_END_CALL_ARGS. */ + +static void +aarch64_end_call_args (cumulative_args_t ca_v) +{ + CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v); + + /* If this is a call to a private ZA function, emit a marker to + indicate where any necessary restoration code could be inserted. + The code itself is inserted by the mode-switching pass. */ + if (TARGET_ZA && !(ca->isa_mode & AARCH64_FL_ZA_ON)) + emit_insn (gen_aarch64_end_private_za_call ()); } /* Emit call insn with PAT and do aarch64-specific handling. */ @@ -13757,6 +14391,9 @@ aarch64_regno_regclass (unsigned regno) if (regno == FFR_REGNUM || regno == FFRT_REGNUM) return FFR_REGS; + if (FAKE_REGNUM_P (regno)) + return FAKE_REGS; + return NO_REGS; } @@ -14112,12 +14749,14 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) return (vec_flags & VEC_ADVSIMD ? CEIL (lowest_size, UNITS_PER_VREG) : CEIL (lowest_size, UNITS_PER_WORD)); + case STACK_REG: case PR_REGS: case PR_LO_REGS: case PR_HI_REGS: case FFR_REGS: case PR_AND_FFR_REGS: + case FAKE_REGS: return 1; case NO_REGS: @@ -19288,10 +19927,14 @@ aarch64_override_options_internal (struct gcc_options *opts) && !fixed_regs[R18_REGNUM]) error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>"); - if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON) + if ((opts->x_aarch64_isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON)) && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME)) { - error ("streaming functions require the ISA extension %qs", "sme"); + if (opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON) + error ("streaming functions require the ISA extension %qs", "sme"); + else + error ("functions with SME state require the ISA extension %qs", + "sme"); inform (input_location, "you can enable %qs using the command-line" " option %<-march%>, or by using the %" " attribute or pragma", "sme"); @@ -21539,6 +22182,8 @@ aarch64_conditional_register_usage (void) CLEAR_HARD_REG_BIT (operand_reg_set, VG_REGNUM); CLEAR_HARD_REG_BIT (operand_reg_set, FFR_REGNUM); CLEAR_HARD_REG_BIT (operand_reg_set, FFRT_REGNUM); + for (int i = FIRST_FAKE_REGNUM; i <= LAST_FAKE_REGNUM; ++i) + CLEAR_HARD_REG_BIT (operand_reg_set, i); /* When tracking speculation, we need a couple of call-clobbered registers to track the speculation state. It would be nice to just use @@ -23003,6 +23648,9 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) || aarch64_sve_rdvl_immediate_p (x))) return true; + if (aarch64_rdsvl_immediate_p (x)) + return true; + return aarch64_classify_symbolic_expression (x) == SYMBOL_TINY_ABSOLUTE; } @@ -28680,9 +29328,45 @@ aarch64_comp_type_attributes (const_tree type1, const_tree type2) return 0; if (!check_attr ("arm", "streaming_compatible")) return 0; + if (aarch64_lookup_shared_state_flags (TYPE_ATTRIBUTES (type1), "za") + != aarch64_lookup_shared_state_flags (TYPE_ATTRIBUTES (type2), "za")) + return 0; return 1; } +/* Implement TARGET_MERGE_DECL_ATTRIBUTES. */ + +static tree +aarch64_merge_decl_attributes (tree olddecl, tree newdecl) +{ + tree old_attrs = DECL_ATTRIBUTES (olddecl); + tree old_new = lookup_attribute ("arm", "new", old_attrs); + + tree new_attrs = DECL_ATTRIBUTES (newdecl); + tree new_new = lookup_attribute ("arm", "new", new_attrs); + + if (DECL_INITIAL (olddecl) && new_new) + { + error ("cannot apply attribute %qs to %q+D after the function" + " has been defined", "new", newdecl); + inform (DECL_SOURCE_LOCATION (olddecl), "%q+D defined here", + newdecl); + } + else + { + if (old_new && new_new) + { + old_attrs = remove_attribute ("arm", "new", old_attrs); + TREE_VALUE (new_new) = chainon (TREE_VALUE (new_new), + TREE_VALUE (old_new)); + } + if (new_new) + aarch64_check_arm_new_against_type (TREE_VALUE (new_new), newdecl); + } + + return merge_attributes (old_attrs, new_attrs); +} + /* Implement TARGET_GET_MULTILIB_ABI_NAME */ static const char * @@ -29107,6 +29791,629 @@ aarch64_pars_overlap_p (rtx par1, rtx par2) return false; } +/* Implement OPTIMIZE_MODE_SWITCHING. */ + +bool +aarch64_optimize_mode_switching (aarch64_mode_entity entity) +{ + bool have_sme_state = (aarch64_cfun_incoming_pstate_za () != 0 + || (aarch64_cfun_has_new_state ("za") + && df_regs_ever_live_p (ZA_REGNUM))); + + if (have_sme_state && nonlocal_goto_handler_labels) + { + static bool reported; + if (!reported) + { + sorry ("non-local gotos in functions with SME state"); + reported = true; + } + } + + switch (entity) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + case aarch64_mode_entity::LOCAL_SME_STATE: + return have_sme_state && !nonlocal_goto_handler_labels; + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_EMIT for ZA_SAVE_BUFFER. */ + +static void +aarch64_mode_emit_za_save_buffer (aarch64_tristate_mode mode, + aarch64_tristate_mode prev_mode) +{ + if (mode == aarch64_tristate_mode::YES) + { + gcc_assert (prev_mode == aarch64_tristate_mode::NO); + aarch64_init_tpidr2_block (); + } + else + gcc_unreachable (); +} + +/* Implement TARGET_MODE_EMIT for LOCAL_SME_STATE. */ + +static void +aarch64_mode_emit_local_sme_state (aarch64_local_sme_state mode, + aarch64_local_sme_state prev_mode) +{ + /* Back-propagation should ensure that we're always starting from + a known mode. */ + gcc_assert (prev_mode != aarch64_local_sme_state::ANY); + + if (prev_mode == aarch64_local_sme_state::INACTIVE_CALLER) + { + /* Commit any uncommitted lazy save. This leaves ZA either active + and zero (lazy save case) or off (normal case). + + The sequence is: + + mrs , tpidr2_el0 + cbz , no_save + bl __arm_tpidr2_save + msr tpidr2_el0, xzr + zero { za } // Only if ZA is live + no_save: */ + bool is_active = (mode == aarch64_local_sme_state::ACTIVE_LIVE + || mode == aarch64_local_sme_state::ACTIVE_DEAD); + auto tmp_reg = gen_reg_rtx (DImode); + auto active_flag = gen_int_mode (is_active, DImode); + emit_insn (gen_aarch64_read_tpidr2 (tmp_reg)); + emit_insn (gen_aarch64_commit_lazy_save (tmp_reg, active_flag)); + } + + if (mode == aarch64_local_sme_state::ACTIVE_LIVE + || mode == aarch64_local_sme_state::ACTIVE_DEAD) + { + if (prev_mode == aarch64_local_sme_state::INACTIVE_LOCAL) + { + /* Make ZA active after being inactive. + + First handle the case in which the lazy save we set up was + committed by a callee. If the function's source-level ZA state + is live then we must conditionally restore it from the lazy + save buffer. Otherwise we can just force PSTATE.ZA to 1. */ + if (mode == aarch64_local_sme_state::ACTIVE_LIVE) + emit_insn (gen_aarch64_restore_za (aarch64_get_tpidr2_ptr ())); + else + emit_insn (gen_aarch64_smstart_za ()); + + /* Now handle the case in which the lazy save was not committed. + In that case, ZA still contains the current function's ZA state, + and we just need to cancel the lazy save. */ + emit_insn (gen_aarch64_clear_tpidr2 ()); + return; + } + + if (prev_mode == aarch64_local_sme_state::SAVED_LOCAL) + { + /* Retrieve the current function's ZA state from the lazy save + buffer. */ + aarch64_restore_za (aarch64_get_tpidr2_ptr ()); + return; + } + + if (prev_mode == aarch64_local_sme_state::INACTIVE_CALLER + || prev_mode == aarch64_local_sme_state::OFF) + { + /* INACTIVE_CALLER means that we are enabling ZA for the first + time in this function. The code above means that ZA is either + active and zero (if we committed a lazy save) or off. Handle + the latter case by forcing ZA on. + + OFF means that PSTATE.ZA is guaranteed to be 0. We just need + to force it to 1. + + Both cases leave ZA zeroed. */ + emit_insn (gen_aarch64_smstart_za ()); + return; + } + + if (prev_mode == aarch64_local_sme_state::ACTIVE_DEAD + || prev_mode == aarch64_local_sme_state::ACTIVE_LIVE) + /* A simple change in liveness, such as in a CFG structure where + ZA is only conditionally defined. No code is needed. */ + return; + + gcc_unreachable (); + } + + if (mode == aarch64_local_sme_state::INACTIVE_LOCAL) + { + if (prev_mode == aarch64_local_sme_state::ACTIVE_LIVE + || prev_mode == aarch64_local_sme_state::ACTIVE_DEAD + || prev_mode == aarch64_local_sme_state::INACTIVE_CALLER) + { + /* A transition from ACTIVE_LIVE to INACTIVE_LOCAL is the usual + case of setting up a lazy save buffer before a call. + A transition from INACTIVE_CALLER is similar, except that + the contents of ZA are known to be zero. + + A transition from ACTIVE_DEAD means that ZA is live at the + point of the transition, but is dead on at least one incoming + edge. (That is, ZA is only conditionally initialized.) + For efficiency, we want to set up a lazy save even for + dead contents, since forcing ZA off would make later code + restore ZA from the lazy save buffer. */ + emit_insn (gen_aarch64_write_tpidr2 (aarch64_get_tpidr2_ptr ())); + return; + } + + if (prev_mode == aarch64_local_sme_state::SAVED_LOCAL + || prev_mode == aarch64_local_sme_state::OFF) + /* We're simply discarding the information about which inactive + state applies. */ + return; + + gcc_unreachable (); + } + + if (mode == aarch64_local_sme_state::INACTIVE_CALLER + || mode == aarch64_local_sme_state::OFF) + { + /* The transition to INACTIVE_CALLER is used before returning from + new("za") functions. Any state in ZA belongs to the current + function rather than a caller, but that state is no longer + needed. Clear any pending lazy save and turn ZA off. + + The transition to OFF is used before calling a private-ZA function. + We committed any incoming lazy save above, so at this point any + contents in ZA belong to the current function. */ + if (prev_mode == aarch64_local_sme_state::INACTIVE_LOCAL) + emit_insn (gen_aarch64_clear_tpidr2 ()); + + if (prev_mode != aarch64_local_sme_state::OFF + && prev_mode != aarch64_local_sme_state::SAVED_LOCAL) + emit_insn (gen_aarch64_smstop_za ()); + + return; + } + + if (mode == aarch64_local_sme_state::SAVED_LOCAL) + { + /* This is a transition to an exception handler. */ + gcc_assert (prev_mode == aarch64_local_sme_state::OFF + || prev_mode == aarch64_local_sme_state::INACTIVE_LOCAL); + return; + } + + gcc_unreachable (); +} + +/* Implement TARGET_MODE_EMIT. */ + +static void +aarch64_mode_emit (int entity, int mode, int prev_mode, HARD_REG_SET live) +{ + if (mode == prev_mode) + return; + + start_sequence (); + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + aarch64_mode_emit_za_save_buffer (aarch64_tristate_mode (mode), + aarch64_tristate_mode (prev_mode)); + break; + + case aarch64_mode_entity::LOCAL_SME_STATE: + aarch64_mode_emit_local_sme_state (aarch64_local_sme_state (mode), + aarch64_local_sme_state (prev_mode)); + break; + } + rtx_insn *seq = get_insns (); + end_sequence (); + + /* Get the set of clobbered registers that are currently live. */ + HARD_REG_SET clobbers = {}; + for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn)) + { + vec_rtx_properties properties; + properties.add_insn (insn, false); + for (rtx_obj_reference ref : properties.refs ()) + if (ref.is_write () && HARD_REGISTER_NUM_P (ref.regno)) + SET_HARD_REG_BIT (clobbers, ref.regno); + } + clobbers &= live; + + /* Emit instructions to save clobbered registers to pseudos. Queue + instructions to restore the registers afterwards. + + This should only needed in rare situations. */ + auto_vec after; + for (unsigned int regno = R0_REGNUM; regno < R30_REGNUM; ++regno) + if (TEST_HARD_REG_BIT (clobbers, regno)) + { + rtx hard_reg = gen_rtx_REG (DImode, regno); + rtx pseudo_reg = gen_reg_rtx (DImode); + emit_move_insn (pseudo_reg, hard_reg); + after.quick_push (gen_move_insn (hard_reg, pseudo_reg)); + } + if (TEST_HARD_REG_BIT (clobbers, CC_REGNUM)) + { + rtx pseudo_reg = gen_reg_rtx (DImode); + emit_insn (gen_aarch64_save_nzcv (pseudo_reg)); + after.quick_push (gen_aarch64_restore_nzcv (pseudo_reg)); + } + + /* Emit the transition instructions themselves. */ + emit_insn (seq); + + /* Restore the clobbered registers. */ + for (auto *insn : after) + emit_insn (insn); +} + +/* Return true if INSN references the SME state represented by hard register + REGNO. */ + +static bool +aarch64_insn_references_sme_state_p (rtx_insn *insn, unsigned int regno) +{ + df_ref ref; + FOR_EACH_INSN_DEF (ref, insn) + if (!DF_REF_FLAGS_IS_SET (ref, DF_REF_MUST_CLOBBER) + && DF_REF_REGNO (ref) == regno) + return true; + FOR_EACH_INSN_USE (ref, insn) + if (DF_REF_REGNO (ref) == regno) + return true; + return false; +} + +/* Implement TARGET_MODE_NEEDED for LOCAL_SME_STATE. */ + +static aarch64_local_sme_state +aarch64_mode_needed_local_sme_state (rtx_insn *insn, HARD_REG_SET live) +{ + if (!CALL_P (insn) + && find_reg_note (insn, REG_EH_REGION, NULL_RTX)) + { + static bool reported; + if (!reported) + { + sorry ("catching non-call exceptions in functions with SME state"); + reported = true; + } + /* Aim for graceful error recovery by picking the value that is + least likely to generate an ICE. */ + return aarch64_local_sme_state::INACTIVE_LOCAL; + } + + /* A non-local goto is equivalent to a return. We disallow non-local + receivers in functions with SME state, so we know that the target + expects ZA to be dormant or off. */ + if (JUMP_P (insn) + && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX)) + return aarch64_local_sme_state::INACTIVE_CALLER; + + /* start_private_za_call and end_private_za_call bracket a sequence + that calls a private-ZA function. Force ZA to be turned off if the + function doesn't have any live ZA state, otherwise require ZA to be + inactive. */ + auto icode = recog_memoized (insn); + if (icode == CODE_FOR_aarch64_start_private_za_call + || icode == CODE_FOR_aarch64_end_private_za_call) + return (TEST_HARD_REG_BIT (live, ZA_REGNUM) + ? aarch64_local_sme_state::INACTIVE_LOCAL + : aarch64_local_sme_state::OFF); + + /* Force ZA to contain the current function's ZA state if INSN wants + to access it. */ + if (aarch64_insn_references_sme_state_p (insn, ZA_REGNUM)) + return (TEST_HARD_REG_BIT (live, ZA_REGNUM) + ? aarch64_local_sme_state::ACTIVE_LIVE + : aarch64_local_sme_state::ACTIVE_DEAD); + + return aarch64_local_sme_state::ANY; +} + +/* Implement TARGET_MODE_NEEDED for ZA_SAVE_BUFFER. */ + +static aarch64_tristate_mode +aarch64_mode_needed_za_save_buffer (rtx_insn *insn, HARD_REG_SET live) +{ + /* We need to set up a lazy save buffer no later than the first + transition to INACTIVE_LOCAL (which involves setting up a lazy save). */ + if (aarch64_mode_needed_local_sme_state (insn, live) + == aarch64_local_sme_state::INACTIVE_LOCAL) + return aarch64_tristate_mode::YES; + + /* Also make sure that the lazy save buffer is set up before the first + insn that throws internally. The exception handler will sometimes + load from it. */ + if (find_reg_note (insn, REG_EH_REGION, NULL_RTX)) + return aarch64_tristate_mode::YES; + + return aarch64_tristate_mode::MAYBE; +} + +/* Implement TARGET_MODE_NEEDED. */ + +static int +aarch64_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET live) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return int (aarch64_mode_needed_za_save_buffer (insn, live)); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_mode_needed_local_sme_state (insn, live)); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_AFTER for LOCAL_SME_STATE. */ + +static aarch64_local_sme_state +aarch64_mode_after_local_sme_state (aarch64_local_sme_state mode, + HARD_REG_SET live) +{ + /* Note places where ZA dies, so that we can try to avoid saving and + restoring state that isn't needed. */ + if (mode == aarch64_local_sme_state::ACTIVE_LIVE + && !TEST_HARD_REG_BIT (live, ZA_REGNUM)) + return aarch64_local_sme_state::ACTIVE_DEAD; + + /* Note where ZA is born, e.g. when moving past an __arm_out("za") + function. */ + if (mode == aarch64_local_sme_state::ACTIVE_DEAD + && TEST_HARD_REG_BIT (live, ZA_REGNUM)) + return aarch64_local_sme_state::ACTIVE_LIVE; + + return mode; +} + +/* Implement TARGET_MODE_AFTER. */ + +static int +aarch64_mode_after (int entity, int mode, rtx_insn *, HARD_REG_SET live) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return mode; + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_mode_after_local_sme_state + (aarch64_local_sme_state (mode), live)); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_CONFLUENCE for LOCAL_SME_STATE. */ + +static aarch64_local_sme_state +aarch64_local_sme_confluence (aarch64_local_sme_state mode1, + aarch64_local_sme_state mode2) +{ + /* Perform a symmetrical check for two values. */ + auto is_pair = [&](aarch64_local_sme_state val1, + aarch64_local_sme_state val2) + { + return ((mode1 == val1 && mode2 == val2) + || (mode1 == val2 && mode2 == val1)); + }; + + /* INACTIVE_CALLER means ZA is off or it has dormant contents belonging + to a caller. OFF is one of the options. */ + if (is_pair (aarch64_local_sme_state::INACTIVE_CALLER, + aarch64_local_sme_state::OFF)) + return aarch64_local_sme_state::INACTIVE_CALLER; + + /* Similarly for dormant contents belonging to the current function. */ + if (is_pair (aarch64_local_sme_state::INACTIVE_LOCAL, + aarch64_local_sme_state::OFF)) + return aarch64_local_sme_state::INACTIVE_LOCAL; + + /* Treat a conditionally-initialized value as a fully-initialized value. */ + if (is_pair (aarch64_local_sme_state::ACTIVE_LIVE, + aarch64_local_sme_state::ACTIVE_DEAD)) + return aarch64_local_sme_state::ACTIVE_LIVE; + + return aarch64_local_sme_state::ANY; +} + +/* Implement TARGET_MODE_CONFLUENCE. */ + +static int +aarch64_mode_confluence (int entity, int mode1, int mode2) +{ + gcc_assert (mode1 != mode2); + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return int (aarch64_tristate_mode::MAYBE); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_local_sme_confluence + (aarch64_local_sme_state (mode1), + aarch64_local_sme_state (mode2))); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_BACKPROP for an entity that either stays + NO throughput, or makes one transition from NO to YES. */ + +static aarch64_tristate_mode +aarch64_one_shot_backprop (aarch64_tristate_mode mode1, + aarch64_tristate_mode mode2) +{ + /* Keep bringing the transition forward until it starts from NO. */ + if (mode1 == aarch64_tristate_mode::MAYBE + && mode2 == aarch64_tristate_mode::YES) + return mode2; + + return aarch64_tristate_mode::MAYBE; +} + +/* Implement TARGET_MODE_BACKPROP for LOCAL_SME_STATE. */ + +static aarch64_local_sme_state +aarch64_local_sme_backprop (aarch64_local_sme_state mode1, + aarch64_local_sme_state mode2) +{ + /* We always need to know what the current state is when transitioning + to a new state. Force any location with indeterminate starting state + to be active. */ + if (mode1 == aarch64_local_sme_state::ANY) + switch (mode2) + { + case aarch64_local_sme_state::INACTIVE_CALLER: + case aarch64_local_sme_state::OFF: + case aarch64_local_sme_state::ACTIVE_DEAD: + /* The current function's ZA state is not live. */ + return aarch64_local_sme_state::ACTIVE_DEAD; + + case aarch64_local_sme_state::INACTIVE_LOCAL: + case aarch64_local_sme_state::ACTIVE_LIVE: + /* The current function's ZA state is live. */ + return aarch64_local_sme_state::ACTIVE_LIVE; + + case aarch64_local_sme_state::SAVED_LOCAL: + /* This is a transition to an exception handler. Since we don't + support non-call exceptions for SME functions, the source of + the transition must be known. We'll assert later if that's + not the case. */ + return aarch64_local_sme_state::ANY; + + case aarch64_local_sme_state::ANY: + return aarch64_local_sme_state::ANY; + } + + return aarch64_local_sme_state::ANY; +} + +/* Implement TARGET_MODE_BACKPROP. */ + +static int +aarch64_mode_backprop (int entity, int mode1, int mode2) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return int (aarch64_one_shot_backprop (aarch64_tristate_mode (mode1), + aarch64_tristate_mode (mode2))); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_local_sme_backprop + (aarch64_local_sme_state (mode1), + aarch64_local_sme_state (mode2))); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_ENTRY. */ + +static int +aarch64_mode_entry (int entity) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return int (aarch64_tristate_mode::NO); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_cfun_shared_flags ("za") != 0 + ? aarch64_local_sme_state::ACTIVE_LIVE + : aarch64_local_sme_state::INACTIVE_CALLER); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_EXIT. */ + +static int +aarch64_mode_exit (int entity) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + return int (aarch64_tristate_mode::MAYBE); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_cfun_shared_flags ("za") != 0 + ? aarch64_local_sme_state::ACTIVE_LIVE + : aarch64_local_sme_state::INACTIVE_CALLER); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_EH_HANDLER. */ + +static int +aarch64_mode_eh_handler (int entity) +{ + switch (aarch64_mode_entity (entity)) + { + case aarch64_mode_entity::HAVE_ZA_SAVE_BUFFER: + /* Require a lazy save buffer to be allocated before the first + insn that can throw. */ + return int (aarch64_tristate_mode::YES); + + case aarch64_mode_entity::LOCAL_SME_STATE: + return int (aarch64_local_sme_state::SAVED_LOCAL); + } + gcc_unreachable (); +} + +/* Implement TARGET_MODE_PRIORITY. */ + +static int +aarch64_mode_priority (int, int n) +{ + return n; +} + +/* Implement TARGET_MD_ASM_ADJUST. */ + +static rtx_insn * +aarch64_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, + vec &uses, vec &clobbers, + HARD_REG_SET &clobbered_regs, location_t loc) +{ + rtx_insn *seq = arm_md_asm_adjust (outputs, inputs, input_modes, constraints, + uses, clobbers, clobbered_regs, loc); + + /* "za" in the clobber list of a function with ZA state is defined to + mean that the asm can read from and write to ZA. We can model the + read using a USE, but unfortunately, it's not possible to model the + write directly. Use a separate insn to model the effect. + + We must ensure that ZA is active on entry, which is enforced by using + SME_STATE_REGNUM. The asm must ensure that ZA is active on return. */ + if (TARGET_ZA) + for (unsigned int i = clobbers.length (); i-- > 0; ) + { + rtx x = clobbers[i]; + if (REG_P (x) && REGNO (x) == ZA_REGNUM) + { + auto id = cfun->machine->next_asm_update_za_id++; + + start_sequence (); + if (seq) + emit_insn (seq); + emit_insn (gen_aarch64_asm_update_za (gen_int_mode (id, SImode))); + seq = get_insns (); + end_sequence (); + + uses.safe_push (gen_rtx_REG (VNx16QImode, ZA_REGNUM)); + uses.safe_push (gen_rtx_REG (DImode, SME_STATE_REGNUM)); + + clobbers.ordered_remove (i); + CLEAR_HARD_REG_BIT (clobbered_regs, ZA_REGNUM); + } + } + return seq; +} + /* If CALL involves a change in PSTATE.SM, emit the instructions needed to switch to the new mode and the instructions needed to restore the original mode. Return true if something changed. */ @@ -29495,6 +30802,9 @@ aarch64_run_selftests (void) #undef TARGET_START_CALL_ARGS #define TARGET_START_CALL_ARGS aarch64_start_call_args +#undef TARGET_END_CALL_ARGS +#define TARGET_END_CALL_ARGS aarch64_end_call_args + #undef TARGET_GIMPLE_FOLD_BUILTIN #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin @@ -29863,6 +31173,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_COMP_TYPE_ATTRIBUTES #define TARGET_COMP_TYPE_ATTRIBUTES aarch64_comp_type_attributes +#undef TARGET_MERGE_DECL_ATTRIBUTES +#define TARGET_MERGE_DECL_ATTRIBUTES aarch64_merge_decl_attributes + #undef TARGET_GET_MULTILIB_ABI_NAME #define TARGET_GET_MULTILIB_ABI_NAME aarch64_get_multilib_abi_name @@ -29883,8 +31196,35 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_STRICT_ARGUMENT_NAMING #define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true +#undef TARGET_MODE_EMIT +#define TARGET_MODE_EMIT aarch64_mode_emit + +#undef TARGET_MODE_NEEDED +#define TARGET_MODE_NEEDED aarch64_mode_needed + +#undef TARGET_MODE_AFTER +#define TARGET_MODE_AFTER aarch64_mode_after + +#undef TARGET_MODE_CONFLUENCE +#define TARGET_MODE_CONFLUENCE aarch64_mode_confluence + +#undef TARGET_MODE_BACKPROP +#define TARGET_MODE_BACKPROP aarch64_mode_backprop + +#undef TARGET_MODE_ENTRY +#define TARGET_MODE_ENTRY aarch64_mode_entry + +#undef TARGET_MODE_EXIT +#define TARGET_MODE_EXIT aarch64_mode_exit + +#undef TARGET_MODE_EH_HANDLER +#define TARGET_MODE_EH_HANDLER aarch64_mode_eh_handler + +#undef TARGET_MODE_PRIORITY +#define TARGET_MODE_PRIORITY aarch64_mode_priority + #undef TARGET_MD_ASM_ADJUST -#define TARGET_MD_ASM_ADJUST arm_md_asm_adjust +#define TARGET_MD_ASM_ADJUST aarch64_md_asm_adjust #undef TARGET_ASM_FILE_END #define TARGET_ASM_FILE_END aarch64_asm_file_end @@ -29898,6 +31238,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CONST_ANCHOR #define TARGET_CONST_ANCHOR 0x1000000 +#undef TARGET_EXTRA_LIVE_ON_ENTRY +#define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_extra_live_on_entry + #undef TARGET_EMIT_EPILOGUE_FOR_SIBCALL #define TARGET_EMIT_EPILOGUE_FOR_SIBCALL aarch64_expand_epilogue diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 693acde7eb9..dc544273d32 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -207,6 +207,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* Macros to test ISA flags. */ #define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF) +#define AARCH64_ISA_ZA_ON (aarch64_isa_flags & AARCH64_FL_ZA_ON) #define AARCH64_ISA_MODE (aarch64_isa_flags & AARCH64_FL_ISA_MODES) #define AARCH64_ISA_CRC (aarch64_isa_flags & AARCH64_FL_CRC) #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO) @@ -260,6 +261,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define TARGET_STREAMING_COMPATIBLE \ ((aarch64_isa_flags & AARCH64_FL_SM_STATE) == 0) +/* PSTATE.ZA is enabled in the current function body. */ +#define TARGET_ZA (AARCH64_ISA_ZA_ON) + /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO) @@ -461,7 +465,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; 1, 1, 1, 1, /* SFP, AP, CC, VG */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* P0 - P7 */ \ 0, 0, 0, 0, 0, 0, 0, 0, /* P8 - P15 */ \ - 1, 1 /* FFR and FFRT */ \ + 1, 1, /* FFR and FFRT */ \ + 1, 1, 1, 1, 1, 1, 1 /* Fake registers */ \ } /* X30 is marked as caller-saved which is in line with regular function call @@ -471,7 +476,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; true but not until function epilogues have been generated. This ensures that X30 is available for use in leaf functions if needed. */ -#define CALL_USED_REGISTERS \ +#define CALL_REALLY_USED_REGISTERS \ { \ 1, 1, 1, 1, 1, 1, 1, 1, /* R0 - R7 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* R8 - R15 */ \ @@ -484,7 +489,8 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; 1, 1, 1, 0, /* SFP, AP, CC, VG */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* P0 - P7 */ \ 1, 1, 1, 1, 1, 1, 1, 1, /* P8 - P15 */ \ - 1, 1 /* FFR and FFRT */ \ + 1, 1, /* FFR and FFRT */ \ + 0, 0, 0, 0, 0, 0, 0 /* Fake registers */ \ } #define REGISTER_NAMES \ @@ -500,7 +506,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; "sfp", "ap", "cc", "vg", \ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7", \ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", \ - "ffr", "ffrt" \ + "ffr", "ffrt", \ + "lowering", "tpidr2_block", "sme_state", "tpidr2_setup", \ + "za_free", "za_saved", "za" \ } /* Generate the register aliases for core register N */ @@ -549,7 +557,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define FRAME_POINTER_REGNUM SFP_REGNUM #define STACK_POINTER_REGNUM SP_REGNUM #define ARG_POINTER_REGNUM AP_REGNUM -#define FIRST_PSEUDO_REGISTER (FFRT_REGNUM + 1) +#define FIRST_PSEUDO_REGISTER (LAST_FAKE_REGNUM + 1) /* The number of argument registers available for each class. */ #define NUM_ARG_REGS 8 @@ -669,6 +677,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; #define FP_SIMD_SAVED_REGNUM_P(REGNO) \ (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM)) + +#define FAKE_REGNUM_P(REGNO) \ + IN_RANGE (REGNO, FIRST_FAKE_REGNUM, LAST_FAKE_REGNUM) /* Register and constant classes. */ @@ -689,6 +700,7 @@ enum reg_class PR_REGS, FFR_REGS, PR_AND_FFR_REGS, + FAKE_REGS, ALL_REGS, LIM_REG_CLASSES /* Last */ }; @@ -712,6 +724,7 @@ enum reg_class "PR_REGS", \ "FFR_REGS", \ "PR_AND_FFR_REGS", \ + "FAKE_REGS", \ "ALL_REGS" \ } @@ -732,6 +745,7 @@ enum reg_class { 0x00000000, 0x00000000, 0x000ffff0 }, /* PR_REGS */ \ { 0x00000000, 0x00000000, 0x00300000 }, /* FFR_REGS */ \ { 0x00000000, 0x00000000, 0x003ffff0 }, /* PR_AND_FFR_REGS */ \ + { 0x00000000, 0x00000000, 0x1fc00000 }, /* FAKE_REGS */ \ { 0xffffffff, 0xffffffff, 0x000fffff } /* ALL_REGS */ \ } @@ -932,6 +946,15 @@ typedef struct GTY (()) machine_function bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; /* One entry for each general purpose register. */ rtx call_via[SP_REGNUM]; + + /* A pseudo register that points to the function's TPIDR2 block, or null + if the function doesn't have a TPIDR2 block. */ + rtx tpidr2_block; + + /* A pseudo register that points to the function's ZA save buffer, + or null if none. */ + rtx za_save_buffer; + bool label_is_assembled; /* True if we've expanded at least one call to a function that changes @@ -939,6 +962,10 @@ typedef struct GTY (()) machine_function guarantees that no such mode switch exists. */ bool call_switches_pstate_sm; + /* Used to generated unique identifiers for each update to ZA by an + asm statement. */ + unsigned int next_asm_update_za_id; + /* A set of all decls that have been passed to a vld1 intrinsic in the current function. This is used to help guide the vector cost model. */ hash_set *vector_load_decls; @@ -1008,6 +1035,10 @@ typedef struct bool silent_p; /* True if we should act silently, rather than raise an error for invalid calls. */ + /* AARCH64_STATE_* flags that describe whether the function shares ZA + with its callers. */ + unsigned int shared_za_flags; + /* A list of registers that need to be saved and restored around a change to PSTATE.SM. An auto_vec would be more convenient, but those can't be copied. */ @@ -1379,4 +1410,61 @@ extern poly_uint16 aarch64_sve_vg; || ((T) == US_TRUNCATE && (S) == LSHIFTRT) \ || ((T) == SS_TRUNCATE && (S) == ASHIFTRT)) +#ifndef USED_FOR_TARGET + +/* Enumerates the mode-switching "entities" for AArch64. */ +enum class aarch64_mode_entity : int +{ + /* An aarch64_tristate_mode that says whether we have created a local + save buffer for the current function's ZA state. The only transition + is from NO to YES. */ + HAVE_ZA_SAVE_BUFFER, + + /* An aarch64_local_sme_state that reflects the state of all data + controlled by PSTATE.ZA. */ + LOCAL_SME_STATE +}; + +/* Describes the state of all data controlled by PSTATE.ZA */ +enum class aarch64_local_sme_state : int +{ + /* ZA is in the off or dormant state. If it is dormant, the contents + of ZA belong to a caller. */ + INACTIVE_CALLER, + + /* ZA is in the off state: PSTATE.ZA is 0 and TPIDR2_EL0 is null. */ + OFF, + + /* ZA is in the off or dormant state. If it is dormant, the contents + of ZA belong to the current function. */ + INACTIVE_LOCAL, + + /* ZA is in the off state and the current function's ZA contents are + stored in the lazy save buffer. This is the state on entry to + exception handlers. */ + SAVED_LOCAL, + + /* ZA is in the active state: PSTATE.ZA is 1 and TPIDR2_EL0 is null. + The contents of ZA are live. */ + ACTIVE_LIVE, + + /* ZA is in the active state: PSTATE.ZA is 1 and TPIDR2_EL0 is null. + The contents of ZA are dead. */ + ACTIVE_DEAD, + + /* ZA could be in multiple states. */ + ANY +}; + +enum class aarch64_tristate_mode : int { NO, YES, MAYBE }; + +#define OPTIMIZE_MODE_SWITCHING(ENTITY) \ + aarch64_optimize_mode_switching (aarch64_mode_entity (ENTITY)) + +#define NUM_MODES_FOR_MODE_SWITCHING \ + { int (aarch64_tristate_mode::MAYBE), \ + int (aarch64_local_sme_state::ANY) } + +#endif + #endif /* GCC_AARCH64_H */ diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 9b586b5170b..02dade93fea 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -111,6 +111,56 @@ (define_constants ;; "FFR token": a fake register used for representing the scheduling ;; restrictions on FFR-related operations. (FFRT_REGNUM 85) + + ;; ---------------------------------------------------------------- + ;; Fake registers + ;; ---------------------------------------------------------------- + ;; These registers represent abstract things, rather than real + ;; architected registers. + + ;; Sometimes we use placeholder instructions to mark where later + ;; ABI-related lowering is needed. These placeholders read and + ;; write this register. Instructions that depend on the lowering + ;; read the register. + (LOWERING_REGNUM 86) + + ;; Represents the contents of the current function's TPIDR2 block, + ;; in abstract form. + (TPIDR2_BLOCK_REGNUM 87) + + ;; Holds the value that the current function wants PSTATE.ZA to be. + ;; The actual value can sometimes vary, because it does not track + ;; changes to PSTATE.ZA that happen during a lazy save and restore. + ;; Those effects are instead tracked by ZA_SAVED_REGNUM. + (SME_STATE_REGNUM 88) + + ;; Instructions write to this register if they set TPIDR2_EL0 to a + ;; well-defined value. Instructions read from the register if they + ;; depend on the result of such writes. + ;; + ;; The register does not model the architected TPIDR2_ELO, just the + ;; current function's management of it. + (TPIDR2_SETUP_REGNUM 89) + + ;; Represents the property "has an incoming lazy save been committed?". + (ZA_FREE_REGNUM 90) + + ;; Represents the property "are the current function's ZA contents + ;; stored in the lazy save buffer, rather than in ZA itself?". + (ZA_SAVED_REGNUM 91) + + ;; Represents the contents of the current function's ZA state in + ;; abstract form. At various times in the function, these contents + ;; might be stored in ZA itself, or in the function's lazy save buffer. + ;; + ;; The contents persist even when the architected ZA is off. Private-ZA + ;; functions have no effect on its contents. + (ZA_REGNUM 92) + ;; ---------------------------------------------------------------- + (FIRST_FAKE_REGNUM LOWERING_REGNUM) + (LAST_FAKE_REGNUM ZA_REGNUM) + ;; ---------------------------------------------------------------- + ;; The pair of scratch registers used for stack probing with -fstack-check. ;; Leave R9 alone as a possible choice for the static chain. ;; Note that the use of these registers is mutually exclusive with the use @@ -294,7 +344,12 @@ (define_c_enum "unspec" [ UNSPEC_TAG_SPACE ; Translate address to MTE tag address space. UNSPEC_LD1RO UNSPEC_SALT_ADDR + UNSPEC_SAVE_NZCV + UNSPEC_RESTORE_NZCV UNSPECV_PATCHABLE_AREA + ;; Wraps a constant integer that should be multiplied by the number + ;; of quadwords in an SME vector. + UNSPEC_SME_VQ ]) (define_c_enum "unspecv" [ @@ -367,7 +422,7 @@ (define_constants ;; Q registers and is equivalent to "simd". (define_enum "arches" [any rcpc8_4 fp fp_q base_simd nobase_simd - simd nosimd sve fp16]) + simd nosimd sve fp16 sme]) (define_enum_attr "arch" "arches" (const_string "any")) @@ -411,7 +466,10 @@ (define_attr "arch_enabled" "no,yes" (match_test "TARGET_FP_F16INST")) (and (eq_attr "arch" "sve") - (match_test "TARGET_SVE"))) + (match_test "TARGET_SVE")) + + (and (eq_attr "arch" "sme") + (match_test "TARGET_SME"))) (const_string "yes") (const_string "no"))) @@ -914,7 +972,7 @@ (define_insn "simple_return" (set_attr "sls_length" "retbr")] ) -(define_insn "*cb1" +(define_insn "aarch64_cb1" [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") (const_int 0)) (label_ref (match_operand 1 "" "")) @@ -1298,6 +1356,7 @@ (define_insn_and_split "*movsi_aarch64" /* The "mov_imm" type for CNT is just a placeholder. */ [r , Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); [r , Usr; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); + [r , UsR; mov_imm , sme, 4] << aarch64_output_rdsvl (operands[1]); [r , m ; load_4 , * , 4] ldr\t%w0, %1 [w , m ; load_4 , fp , 4] ldr\t%s0, %1 [m , r Z; store_4 , * , 4] str\t%w1, %0 @@ -1334,6 +1393,7 @@ (define_insn_and_split "*movdi_aarch64" /* The "mov_imm" type for CNT is just a placeholder. */ [r, Usv; mov_imm , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); [r, Usr; mov_imm , sve, 4] << aarch64_output_sve_rdvl (operands[1]); + [r, UsR; mov_imm , sme, 4] << aarch64_output_rdsvl (operands[1]); [r, m ; load_8 , * , 4] ldr\t%x0, %1 [w, m ; load_8 , fp , 4] ldr\t%d0, %1 [m, r Z; store_8 , * , 4] str\t%x1, %0 @@ -8034,6 +8094,21 @@ (define_insn "patchable_area" [(set (attr "length") (symbol_ref "INTVAL (operands[0])"))] ) +(define_insn "aarch64_save_nzcv" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(reg:CC CC_REGNUM)] UNSPEC_SAVE_NZCV))] + "" + "mrs\t%0, nzcv" +) + +(define_insn "aarch64_restore_nzcv" + [(set (reg:CC CC_REGNUM) + (unspec:CC [(match_operand:DI 0 "register_operand" "r")] + UNSPEC_RESTORE_NZCV))] + "" + "msr\tnzcv, %0" +) + ;; AdvSIMD Stuff (include "aarch64-simd.md") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 5c02d15c77a..5dd50218b9f 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -225,6 +225,12 @@ (define_constraint "Usr" (and (match_code "const_poly_int") (match_test "aarch64_sve_rdvl_immediate_p (op)"))) +(define_constraint "UsR" + "@internal + A constraint that matches a value produced by RDSVL." + (and (match_code "const") + (match_test "aarch64_rdsvl_immediate_p (op)"))) + (define_constraint "Usv" "@internal A constraint that matches a VG-based constant that can be loaded by diff --git a/gcc/testsuite/g++.target/aarch64/sme/exceptions_1.C b/gcc/testsuite/g++.target/aarch64/sme/exceptions_1.C new file mode 100644 index 00000000000..a245546d8b1 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/exceptions_1.C @@ -0,0 +1,189 @@ +// { dg-options "-O -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +void callee_inout() __arm_inout("za"); +void callee_in() noexcept __arm_in("za"); +void callee_out() noexcept __arm_out("za"); +void callee_normal(); + +/* +** _Z5test1v: +** ... +** bl __arm_tpidr2_save +** ... +** bl __cxa_begin_catch +** bl __cxa_end_catch +** mov w0, #?2 +** ... +*/ +__arm_new("za") int +test1 () +{ + try + { + callee_inout(); + return 1; + } + catch (...) + { + return 2; + } +} + +/* +** _Z5test2v: +** ... +** bl __arm_tpidr2_save +** ... +** bl __cxa_begin_catch +** smstart za +** bl _Z10callee_outv +** bl _Z9callee_inv +** smstop za +** bl __cxa_end_catch +** mov w0, #?2 +** ... +*/ +__arm_new("za") int +test2 () +{ + try + { + callee_inout(); + return 1; + } + catch (...) + { + callee_out(); + callee_in(); + return 2; + } +} + +/* +** _Z5test3v: +** ... +** bl __arm_tpidr2_save +** ... +** smstop za +** ... +** bl _Z13callee_normalv +** ... +** bl __cxa_begin_catch +** smstart za +** bl _Z10callee_outv +** bl _Z9callee_inv +** smstop za +** bl __cxa_end_catch +** mov w0, #?2 +** ... +*/ +__arm_new("za") int +test3 () +{ + try + { + callee_normal(); + return 1; + } + catch (...) + { + callee_out(); + callee_in(); + return 2; + } +} + +__arm_new("za") int +test4 () +{ + try + { + // No lazy save set up because this is a shared-ZA function. + callee_inout(); + return 1; + } + catch (...) + { + callee_inout(); + return 2; + } +} +// { dg-final { scan-assembler {_Z5test4v:(?:(?!msr\ttpidr2_el0, x[0-9]+).)*\tret} } } + +/* +** _Z5test5v: +** ... +** bl __arm_tpidr2_save +** ... +** smstart za +** ... +** bl _Z12callee_inoutv +** add (x[0-9]+), [^\n]+ +** msr tpidr2_el0, \1 +** bl _Z13callee_normalv +** msr tpidr2_el0, xzr +** smstop za +** ... +** bl __cxa_begin_catch +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** msr tpidr2_el0, xzr +** bl _Z12callee_inoutv +** smstop za +** bl __cxa_end_catch +** mov w0, #?2 +** ... +*/ +__arm_new("za") int +test5 () +{ + try + { + callee_inout(); + callee_normal(); + return 1; + } + catch (...) + { + callee_inout(); + return 2; + } +} + +/* +** _Z5test6v: +** ... +** msr tpidr2_el0, x[0-9]+ +** bl _Z13callee_normalv +** msr tpidr2_el0, xzr +** ... +** bl __cxa_begin_catch +** bl __cxa_end_catch +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** msr tpidr2_el0, xzr +** ... +*/ +int +test6 () __arm_inout("za") +{ + try + { + callee_normal(); + callee_out(); + return 1; + } + catch (...) + { + return 2; + } +} diff --git a/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C index 032485adf95..8b0755014cc 100644 --- a/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C +++ b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C @@ -2,3 +2,8 @@ void f1 () __arm_streaming; void f2 () __arm_streaming_compatible; +void f3 () __arm_in("za"); +void f4 () __arm_out("za"); +void f5 () __arm_inout("za"); +void f6 () __arm_preserves("za"); +__arm_new("za") void f7 () {} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c index 8f1b836764e..fcabe3edc55 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c @@ -2,3 +2,8 @@ void f1 () __arm_streaming; void f2 () __arm_streaming_compatible; +void f3 () __arm_in("za"); +void f4 () __arm_out("za"); +void f5 () __arm_inout("za"); +void f6 () __arm_preserves("za"); +__arm_new("za") void f7 () {} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c new file mode 100644 index 00000000000..856880e2109 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_1.c @@ -0,0 +1,154 @@ +// { dg-options "" } + +void shared_a () [[arm::inout("za")]]; +void shared_a (); // { dg-error "conflicting types" } + +void shared_b (); +void shared_b () [[arm::inout("za")]]; // { dg-error "conflicting types" } + +void shared_c () [[arm::inout("za")]]; +void shared_c () {} // Inherits attribute from declaration (confusingly). + +void shared_d (); +void shared_d () [[arm::inout("za")]] {} // { dg-error "conflicting types" } + +void shared_e () [[arm::inout("za")]] {} +void shared_e (); // { dg-error "conflicting types" } + +void shared_f () {} +void shared_f () [[arm::inout("za")]]; // { dg-error "conflicting types" } + +extern void (*shared_g) (); +extern void (*shared_g) () [[arm::inout("za")]]; // { dg-error "conflicting types" } + +extern void (*shared_h) () [[arm::inout("za")]]; +extern void (*shared_h) (); // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void preserved_a () [[arm::preserves("za")]]; +void preserved_a (); // { dg-error "conflicting types" } + +void preserved_b (); +void preserved_b () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +void preserved_c () [[arm::preserves("za")]]; +void preserved_c () {} // Inherits attribute from declaration (confusingly). + +void preserved_d (); +void preserved_d () [[arm::preserves("za")]] {} // { dg-error "conflicting types" } + +void preserved_e () [[arm::preserves("za")]] {} +void preserved_e (); // { dg-error "conflicting types" } + +void preserved_f () {} +void preserved_f () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +extern void (*preserved_g) (); +extern void (*preserved_g) () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +extern void (*preserved_h) () [[arm::preserves("za")]]; +extern void (*preserved_h) (); // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void replicated_1 () [[arm::in("za", "za"), arm::in("za")]]; +void replicated_2 () [[arm::out("za", "za"), arm::out("za")]]; +void replicated_3 () [[arm::inout("za", "za"), arm::inout("za")]]; +void replicated_4 () [[arm::preserves("za", "za"), arm::preserves("za")]]; + +//---------------------------------------------------------------------------- + +void invalid_1 () [[arm::in]]; // { dg-error "wrong number of arguments" } +void invalid_2 () [[arm::in()]]; // { dg-error "parentheses must be omitted" } + // { dg-error "wrong number of arguments" "" { target *-*-* } .-1 } +void invalid_3 () [[arm::in("")]]; // { dg-error "unrecognized state string ''" } +void invalid_4 () [[arm::in("foo")]]; // { dg-error "unrecognized state string 'foo'" } +void invalid_5 () [[arm::in(42)]]; // { dg-error "the arguments to 'in' must be constant strings" } +void invalid_6 () [[arm::in(*(int *)0 ? "za" : "za")]]; // { dg-error "the arguments to 'in' must be constant strings" } + +//---------------------------------------------------------------------------- + +void mixed_a () [[arm::preserves("za")]]; +void mixed_a () [[arm::inout("za")]]; // { dg-error "conflicting types" } + +void mixed_b () [[arm::inout("za")]]; +void mixed_b () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +void mixed_c () [[arm::preserves("za")]]; +void mixed_c () [[arm::in("za")]] {} // { dg-error "conflicting types" } + +void mixed_d () [[arm::inout("za")]]; +void mixed_d () [[arm::in("za")]] {} // { dg-error "conflicting types" } + +void mixed_e () [[arm::out("za")]] {} +void mixed_e () [[arm::in("za")]]; // { dg-error "conflicting types" } + +void mixed_f () [[arm::inout("za")]] {} +void mixed_f () [[arm::out("za")]]; // { dg-error "conflicting types" } + +extern void (*mixed_g) () [[arm::in("za")]]; +extern void (*mixed_g) () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +extern void (*mixed_h) () [[arm::preserves("za")]]; +extern void (*mixed_h) () [[arm::out("za")]]; // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +void contradiction_1 () [[arm::preserves("za"), arm::inout("za")]]; // { dg-error "inconsistent attributes for state 'za'" } +void contradiction_2 () [[arm::inout("za"), arm::preserves("za")]]; // { dg-error "inconsistent attributes for state 'za'" } + +int [[arm::inout("za")]] int_attr; // { dg-warning "only applies to function types" } +void *[[arm::preserves("za")]] ptr_attr; // { dg-warning "only applies to function types" } + +typedef void preserved_callback () [[arm::preserves("za")]]; +typedef void shared_callback () [[arm::inout("za")]]; + +void (*preserved_callback_ptr) () [[arm::preserves("za")]]; +void (*shared_callback_ptr) () [[arm::inout("za")]]; + +typedef void contradiction_callback_1 () [[arm::preserves("za"), arm::inout("za")]]; // { dg-error "inconsistent attributes for state 'za'" } +typedef void contradiction_callback_2 () [[arm::inout("za"), arm::preserves("za")]]; // { dg-error "inconsistent attributes for state 'za'" } + +void (*contradiction_callback_ptr_1) () [[arm::preserves("za"), arm::inout("za")]]; // { dg-error "inconsistent attributes for state 'za'" } +void (*contradiction_callback_ptr_2) () [[arm::inout("za"), arm::preserves("za")]]; // { dg-error "inconsistent attributes for state 'za'" } + +struct s { + void (*contradiction_callback_ptr_1) () [[arm::preserves("za"), arm::inout("za")]]; // { dg-error "inconsistent attributes for state 'za'" } + void (*contradiction_callback_ptr_2) () [[arm::inout("za"), arm::preserves("za")]]; // { dg-error "inconsistent attributes for state 'za'" } +}; + +//---------------------------------------------------------------------------- + +void keyword_ok_1 () __arm_inout("za"); +void keyword_ok_1 () __arm_inout("za"); + +void keyword_ok_2 () __arm_in("za"); +void keyword_ok_2 () [[arm::in("za")]]; + +void keyword_ok_3 () [[arm::out("za")]]; +void keyword_ok_3 () __arm_out("za"); + +void keyword_ok_4 () __arm_inout("za") [[arm::inout("za")]]; + +void keyword_ok_5 () __arm_preserves("za"); +void keyword_ok_5 () [[arm::preserves("za")]]; + +__arm_new("za") void keyword_ok_6 () {} + +//---------------------------------------------------------------------------- + +void keyword_conflict_1 () __arm_inout("za"); +void keyword_conflict_1 (); // { dg-error "conflicting types" } + +void keyword_conflict_2 (); +void keyword_conflict_2 () __arm_inout("za"); // { dg-error "conflicting types" } + +void keyword_conflict_3 () __arm_inout("za"); +void keyword_conflict_3 () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +void keyword_conflict_4 () [[arm::preserves("za")]]; +void keyword_conflict_4 () __arm_inout("za"); // { dg-error "conflicting types" } + +__arm_new("za") void keyword_conflict_5 () __arm_inout("za") {} // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } +__arm_new("za") void keyword_conflict_6 () __arm_preserves("za") {} // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c new file mode 100644 index 00000000000..572ff309f8d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_2.c @@ -0,0 +1,73 @@ +// { dg-options "" } + +[[arm::new("za")]] void new_za_a (); +void new_za_a (); + +void new_za_b (); +[[arm::new("za")]] void new_za_b (); + +[[arm::new("za")]] void new_za_c (); +void new_za_c () {} + +void new_za_d (); +[[arm::new("za")]] void new_za_d () {} + +[[arm::new("za")]] void new_za_e () {} +void new_za_e (); + +void new_za_f () {} +[[arm::new("za")]] void new_za_f (); // { dg-error "cannot apply attribute 'new' to 'new_za_f' after the function has been defined" } + +//---------------------------------------------------------------------------- + +[[arm::new("za")]] void shared_a (); +void shared_a () [[arm::inout("za")]]; // { dg-error "conflicting types" } + +void shared_b () [[arm::inout("za")]]; +[[arm::new("za")]] void shared_b (); // { dg-error "conflicting types" } + +[[arm::new("za")]] void shared_c (); +void shared_c () [[arm::in("za")]] {} // { dg-error "conflicting types" } + +void shared_d () [[arm::in("za")]]; +[[arm::new("za")]] void shared_d () {} // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } + +[[arm::new("za")]] void shared_e () {} +void shared_e () [[arm::out("za")]]; // { dg-error "conflicting types" } + +void shared_f () [[arm::out("za")]] {} +[[arm::new("za")]] void shared_f (); // { dg-error "conflicting types" } + +[[arm::new("za")]] void shared_g () {} +void shared_g () [[arm::preserves("za")]]; // { dg-error "conflicting types" } + +void shared_h () [[arm::preserves("za")]] {} +[[arm::new("za")]] void shared_h (); // { dg-error "conflicting types" } + +//---------------------------------------------------------------------------- + +[[arm::new("za")]] void contradiction_1 () [[arm::inout("za")]]; // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } +void contradiction_2 [[arm::new("za")]] () [[arm::inout("za")]]; // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } +[[arm::new("za")]] void contradiction_3 () [[arm::preserves("za")]]; // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } +void contradiction_4 [[arm::new("za")]] () [[arm::preserves("za")]]; // { dg-error "cannot create a new 'za' scope since 'za' is shared with callers" } + +int [[arm::new("za")]] int_attr; // { dg-warning "does not apply to types" } +[[arm::new("za")]] int int_var_attr; // { dg-error "applies only to function definitions" } +typedef void new_za_callback () [[arm::new("za")]]; // { dg-warning "does not apply to types" } +[[arm::new("za")]] void (*new_za_var_callback) (); // { dg-error "applies only to function definitions" } + +//---------------------------------------------------------------------------- + +[[arm::new("za")]] void complementary_1 () [[arm::streaming]] {} +void complementary_2 [[arm::new("za")]] () [[arm::streaming]] {} +[[arm::new("za")]] void complementary_3 () [[arm::streaming_compatible]] {} +void complementary_4 [[arm::new("za")]] () [[arm::streaming_compatible]] {} + +//---------------------------------------------------------------------------- + +#pragma GCC target "+nosme" + +[[arm::new("za")]] void bereft_1 (); +[[arm::new("za")]] void bereft_2 () {} // { dg-error "functions with SME state require the ISA extension 'sme'" } +void bereft_3 () [[arm::inout("za")]]; +void bereft_4 () [[arm::inout("za")]] {} // { dg-error "functions with SME state require the ISA extension 'sme'" } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c new file mode 100644 index 00000000000..203f6ae8a07 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_3.c @@ -0,0 +1,31 @@ +// { dg-options "" } + +void normal_callee (); +void in_callee () [[arm::in("za")]]; +void out_callee () [[arm::out("za")]]; +void inout_callee () [[arm::inout("za")]]; +void preserves_callee () [[arm::preserves("za")]]; + +struct callbacks { + void (*normal_ptr) (); + void (*in_ptr) () [[arm::in("za")]]; + void (*out_ptr) () [[arm::out("za")]]; + void (*inout_ptr) () [[arm::inout("za")]]; + void (*preserves_ptr) () [[arm::preserves("za")]]; +}; + +void +normal_caller (struct callbacks *c) +{ + normal_callee (); + in_callee (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + out_callee (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + inout_callee (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + preserves_callee (); // { dg-error {call to a function that shares SME state from a function that has no SME state} } + + c->normal_ptr (); + c->in_ptr (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + c->out_ptr (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + c->inout_ptr (); // { dg-error {call to a function that shares 'za' state from a function that has no 'za' state} } + c->preserves_ptr (); // { dg-error {call to a function that shares SME state from a function that has no SME state} } +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c new file mode 100644 index 00000000000..cec0abf0ea9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_4.c @@ -0,0 +1,585 @@ +// { dg-options "-O -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +void private_za(); +void out_za() __arm_out("za"); +void in_za() __arm_in("za"); +void inout_za() __arm_inout("za"); +void preserves_za() __arm_preserves("za"); + +/* +** test1: +** ret +*/ +__arm_new("za") void test1() +{ +} + +/* +** test2: +** ldr w0, \[x0\] +** ret +*/ +__arm_new("za") int test2(int *ptr) +{ + return *ptr; +} + +/* +** test3: +** stp [^\n]+ +** mov x29, sp +** bl private_za +** ( +** mov w0, 0 +** ldp [^\n]+ +** | +** ldp [^\n]+ +** mov w0, 0 +** ) +** ret +*/ +__arm_new("za") int test3() +{ + private_za(); + return 0; +} + +/* +** test4: +** ... +** mrs x0, tpidr2_el0 +** cbz x0, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** zero { za } +** smstart za +** bl in_za +** smstop za +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test4() +{ + in_za(); // Uses zeroed contents. +} + +/* +** test5: +** ... +** mrs x0, tpidr2_el0 +** cbz x0, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** smstop za +** bl private_za +** smstart za +** bl out_za +** bl in_za +** smstop za +** bl private_za +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test5() +{ + private_za(); + out_za(); + in_za(); + private_za(); +} + +// Despite the long test, there shouldn't be too much scope for variation +// here. The point is both to test correctness and code quality. +/* +** test6: +** stp [^\n]+ +** mov x29, sp +** mrs x0, tpidr2_el0 +** cbz x0, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** smstart za +** bl out_za +** rdsvl (x[0-9]+), #1 +** mul (x[0-9]+), \1, \1 +** sub sp, sp, \2 +** mov (x[0-9]+), sp +** stp \3, \1, \[x29, #?16\] +** add (x[0-9]+), x29, #?16 +** msr tpidr2_el0, \4 +** bl private_za +** ( +** add (x[0-9]+), x29, #?16 +** mrs (x[0-9]+), tpidr2_el0 +** cbnz \6, [^\n]+ +** smstart za +** mov x0, \5 +** | +** add x0, x29, #?16 +** mrs (x[0-9]+), tpidr2_el0 +** cbnz \6, [^\n]+ +** smstart za +** ) +** bl __arm_tpidr2_restore +** msr tpidr2_el0, xzr +** bl in_za +** smstop za +** mov sp, x29 +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test6() +{ + out_za(); + private_za(); + in_za(); +} + +// Rely on previous tests for the part leading up to the smstart. +/* +** test7: +** ... +** smstart za +** bl out_za +** bl in_za +** smstop za +** bl private_za +** smstart za +** bl out_za +** bl in_za +** smstop za +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test7() +{ + out_za(); + in_za(); + private_za(); + out_za(); + in_za(); +} + +/* +** test8: +** ... +** smstart za +** bl out_za +** bl in_za +** smstop za +** bl private_za +** smstart za +** bl out_za +** bl in_za +** smstop za +** bl private_za +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test8() +{ + out_za(); + in_za(); + private_za(); + out_za(); + in_za(); + private_za(); +} + +/* +** test9: +** ... +** msr tpidr2_el0, x[0-9]+ +** bl private_za +** bl private_za +** bl private_za +** bl private_za +** add x[0-9]+, x29, #?16 +** mrs x[0-9]+, tpidr2_el0 +** ... +*/ +__arm_new("za") void test9() +{ + out_za(); + private_za(); + private_za(); + private_za(); + private_za(); + in_za(); +} + +/* +** test10: +** ldr (w[0-9]+), \[x0\] +** cbz \1, [^\n]+ +** ldr [^\n]+ +** add [^\n]+ +** str [^\n]+ +** ret +** ... +*/ +__arm_new("za") void test10(volatile int *ptr) +{ + if (__builtin_expect (*ptr != 0, 1)) + *ptr = *ptr + 1; + else + inout_za(); +} + +/* +** test11: +** ... +** ldr w[0-9]+, [^\n]+ +** add (w[0-9]+), [^\n]+ +** str \1, [^\n]+ +** ... +** ret +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** bl inout_za +** ldr (w[0-9]+), [^\n]+ +** cbnz \2, [^\n]+ +** smstop za +** ... +*/ +__arm_new("za") void test11(volatile int *ptr) +{ + if (__builtin_expect (*ptr == 0, 0)) + do + inout_za(); + while (*ptr); + else + *ptr += 1; +} + +__arm_new("za") void test12(volatile int *ptr) +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); + out_za(); + in_za(); +} + +/* +** test13: +** stp [^\n]+ +** ... +** stp [^\n]+ +** ... +** bl __arm_tpidr2_save +** ... +** msr tpidr2_el0, x[0-9]+ +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** cbnz [^\n]+ +** smstart za +** msr tpidr2_el0, xzr +** bl out_za +** bl in_za +** ... +** smstop za +** ... +*/ +__arm_new("za") void test13(volatile int *ptr) +{ + do + { + private_za(); + inout_za(); + private_za(); + } + while (*ptr); + out_za(); + in_za(); +} + +/* +** test14: +** ... +** bl __arm_tpidr2_save +** ... +** smstart za +** bl inout_za +** ldr [^\n]+ +** cbnz [^\n]+ +** bl out_za +** bl in_za +** smstop za +** ... +*/ +__arm_new("za") void test14(volatile int *ptr) +{ + do + inout_za(); + while (*ptr); + out_za(); + in_za(); +} + +/* +** test15: +** ... +** bl __arm_tpidr2_save +** ... +** smstart za +** bl out_za +** bl in_za +** ldr [^\n]+ +** cbnz [^\n]+ +** smstop za +** bl private_za +** ldr [^\n]+ +** ldp [^\n]+ +** ret +*/ +__arm_new("za") void test15(volatile int *ptr) +{ + do + { + out_za(); + in_za(); + } + while (*ptr); + private_za(); +} + +/* +** test16: +** ... +** bl __arm_tpidr2_save +** ... +** smstart za +** b [^\n]+ +-- loop: +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** msr tpidr2_el0, xzr +-- loop_entry: +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** bl private_za +** ldr [^\n]+ +** cbnz [^\n]+ +** msr tpidr2_el0, xzr +** smstop za +** bl private_za +** ... +*/ +__arm_new("za") void test16(volatile int *ptr) +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); + private_za(); +} + +/* +** test17: +** ... +** bl private_za +** ldr [^\n]+ +** cbnz [^\n]+ +** ... +** msr tpidr2_el0, xzr +** ... +** smstop za +** ... +*/ +__arm_new("za") void test17(volatile int *ptr) +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); +} + +/* +** test18: +** ldr w[0-9]+, [^\n]+ +** cbnz w[0-9]+, [^\n]+ +** ret +** ... +** smstop za +** bl private_za +** ... +*/ +__arm_new("za") void test18(volatile int *ptr) +{ + if (__builtin_expect (*ptr, 0)) + { + out_za(); + in_za(); + private_za(); + } +} + +/* +** test19: +** ... +** ldr w[0-9]+, [^\n]+ +** cbz w[0-9]+, [^\n]+ +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstop za +** bl private_za +** ... +*/ +__arm_new("za") void test19(volatile int *ptr) +{ + if (__builtin_expect (*ptr != 0, 1)) + private_za(); + else + do + { + inout_za(); + private_za(); + } + while (*ptr); +} + +/* +** test20: +** ... +** bl a20 +** (?:(?!x0).)* +** bl b20 +** ... +** mov ([wx][0-9]+), [wx]0 +** ... +** bl __arm_tpidr2_restore +** ... +** mov [wx]0, \1 +** ... +** bl c20 +** ... +*/ +__arm_new("za") void test20() +{ + extern int a20() __arm_inout("za"); + extern int b20(int); + extern void c20(int) __arm_inout("za"); + c20(b20(a20())); +} + +/* +** test21: +** ... +** bl a21 +** (?:(?!x0).)* +** bl b21 +** ... +** mov (x[0-9]+), x0 +** ... +** bl __arm_tpidr2_restore +** ... +** mov x0, \1 +** ... +** bl c21 +** ... +*/ +__arm_new("za") void test21() +{ + extern __UINT64_TYPE__ a21() __arm_inout("za"); + extern __UINT64_TYPE__ b21(__UINT64_TYPE__); + extern void c21(__UINT64_TYPE__) __arm_inout("za"); + c21(b21(a21())); +} + +/* +** test22: +** (?:(?!rdsvl).)* +** rdsvl x[0-9]+, #1 +** (?:(?!rdsvl).)* +*/ +__arm_new("za") void test22(volatile int *ptr) +{ + inout_za(); + if (*ptr) + *ptr += 1; + else + private_za(); + private_za(); + in_za(); +} + +/* +** test23: +** (?:(?!__arm_tpidr2_save).)* +** bl __arm_tpidr2_save +** (?:(?!__arm_tpidr2_save).)* +*/ +__arm_new("za") void test23(volatile int *ptr) +{ + if (*ptr) + *ptr += 1; + else + inout_za(); + inout_za(); +} + +/* +** test24: +** ... +** bl in_za +** ... +** incb x1 +** ... +** bl out_za +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** incb x1 +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** incb x1 +** ... +** smstop za +** ... +** bl private_za +** ... +** ret +*/ +__arm_new("za") void test24() +{ + in_za(); + asm ("incb\tx1" ::: "x1", "za"); + out_za(); + inout_za(); + private_za(); + asm ("incb\tx1" ::: "x1", "za"); + private_za(); + asm ("incb\tx1" ::: "x1", "za"); + in_za(); + private_za(); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c new file mode 100644 index 00000000000..491a49acab8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_5.c @@ -0,0 +1,593 @@ +// { dg-options "-O2 -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +void private_za(); +void out_za() __arm_out("za"); +void in_za() __arm_in("za"); +void inout_za() __arm_inout("za"); +void preserves_za() __arm_preserves("za"); + +/* +** test1: +** ret +*/ +void test1() __arm_inout("za") +{ +} + +/* +** test2: +** ldr w0, \[x0\] +** ret +*/ +int test2(int *ptr) __arm_inout("za") +{ + return *ptr; +} + +/* +** test3: +** ... +** sub sp, sp, x[0-9]+ +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +int test3() __arm_inout("za") +{ + private_za(); + return 0; +} + +/* +** test4: +** stp [^\n]+ +** [^\n]+ +** bl in_za +** ldp [^\n]+ +** ret +*/ +void test4() __arm_inout("za") +{ + in_za(); +} + +/* +** test5: +** ... +** smstop za +** ... +** bl private_za +** smstart za +** bl out_za +** bl in_za +** ... +** sub sp, sp, x[0-9]+ +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +void test5() __arm_inout("za") +{ + private_za(); + out_za(); + in_za(); + private_za(); +} + +/* +** test6: +** ... +** bl out_za +** ... +** sub sp, sp, x[0-9]+ +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +** bl in_za +** ... +*/ +void test6() __arm_inout("za") +{ + out_za(); + private_za(); + in_za(); +} + +/* +** test7: +** stp [^\n]+ +** [^\n]+ +** bl out_za +** bl in_za +** smstop za +** bl private_za +** smstart za +** bl out_za +** bl in_za +** ldp [^\n]+ +** ret +*/ +void test7() __arm_inout("za") +{ + out_za(); + in_za(); + private_za(); + out_za(); + in_za(); +} + +/* +** test8: +** stp [^\n]+ +** [^\n]+ +** bl out_za +** bl in_za +** smstop za +** bl private_za +** smstart za +** bl out_za +** bl in_za +** ... +** sub sp, sp, x[0-9]+ +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +** ret +*/ +void test8() __arm_inout("za") +{ + out_za(); + in_za(); + private_za(); + out_za(); + in_za(); + private_za(); +} + +/* +** test9: +** stp [^\n]+ +** [^\n]+ +** bl out_za +** ... +** msr tpidr2_el0, x[0-9]+ +** bl private_za +** bl private_za +** bl private_za +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +void test9() __arm_inout("za") +{ + out_za(); + private_za(); + private_za(); + private_za(); + private_za(); + in_za(); +} + +/* +** test10: +** ldr (w[0-9]+), \[x0\] +** cbz \1, [^\n]+ +** ldr [^\n]+ +** add [^\n]+ +** str [^\n]+ +** ret +** ... +*/ +void test10(volatile int *ptr) __arm_inout("za") +{ + if (__builtin_expect (*ptr != 0, 1)) + *ptr = *ptr + 1; + else + inout_za(); +} + +/* +** test11: +** (?!.*(\t__arm|\tza|tpidr2_el0)).* +*/ +void test11(volatile int *ptr) __arm_inout("za") +{ + if (__builtin_expect (*ptr == 0, 0)) + do + inout_za(); + while (*ptr); + else + *ptr += 1; +} + +void test12(volatile int *ptr) __arm_inout("za") +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); + out_za(); + in_za(); +} + +/* +** test13: +** stp [^\n]+ +** ... +** stp [^\n]+ +** ... +-- loop: +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ldr [^\n]+ +** cbnz [^\n]+ +** smstart za +** msr tpidr2_el0, xzr +** bl out_za +** bl in_za +** [^\n]+ +** [^\n]+ +** ldp [^\n]+ +** ret +*/ +void test13(volatile int *ptr) __arm_inout("za") +{ + do + { + private_za(); + inout_za(); + private_za(); + } + while (*ptr); + out_za(); + in_za(); +} + +/* +** test14: +** ... +** bl inout_za +** ldr [^\n]+ +** cbnz [^\n]+ +** bl out_za +** bl in_za +** ... +*/ +void test14(volatile int *ptr) __arm_inout("za") +{ + do + inout_za(); + while (*ptr); + out_za(); + in_za(); +} + +/* +** test15: +** ... +** bl out_za +** bl in_za +** ldr [^\n]+ +** cbnz [^\n]+ +** ... +** stp [^\n]+ +** ... +** msr tpidr2_el0, [^\n]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +void test15(volatile int *ptr) __arm_inout("za") +{ + do + { + out_za(); + in_za(); + } + while (*ptr); + private_za(); +} + +/* +** test16: +** stp [^\n]+ +** ... +** stp [^\n]+ +** ... +** b [^\n]+ +-- loop: +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** msr tpidr2_el0, xzr +-- loop_entry: +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +void test16(volatile int *ptr) __arm_inout("za") +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); + private_za(); +} + +/* +** test17: +** ... +-- loop: +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** smstart za +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +** cbnz [^\n]+ +** [^\n]+ +** [^\n]+ +** ldp [^\n]+ +** ret +*/ +void test17(volatile int *ptr) __arm_inout("za") +{ + do + { + inout_za(); + private_za(); + } + while (*ptr); +} + +/* +** test18: +** ldr w[0-9]+, [^\n]+ +** cbnz w[0-9]+, [^\n]+ +** ret +** ... +** bl out_za +** bl in_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** bl __arm_tpidr2_restore +** ... +** msr tpidr2_el0, xzr +** ... +*/ +void test18(volatile int *ptr) __arm_inout("za") +{ + if (__builtin_expect (*ptr, 0)) + { + out_za(); + in_za(); + private_za(); + } +} + +void test19(volatile int *ptr) __arm_inout("za") +{ + if (__builtin_expect (*ptr != 0, 1)) + private_za(); + else + do + { + inout_za(); + private_za(); + } + while (*ptr); +} + +/* +** test20: +** ... +** bl a20 +** (?:(?!x0).)* +** bl b20 +** ... +** mov ([wx][0-9]+), [wx]0 +** ... +** bl __arm_tpidr2_restore +** ... +** mov [wx]0, \1 +** ... +** bl c20 +** ... +*/ +void test20() __arm_inout("za") +{ + extern int a20() __arm_inout("za"); + extern int b20(int); + extern void c20(int) __arm_inout("za"); + c20(b20(a20())); +} + +/* +** test21: +** ... +** bl a21 +** (?:(?!x0).)* +** bl b21 +** ... +** mov (x[0-9]+), x0 +** ... +** bl __arm_tpidr2_restore +** ... +** mov x0, \1 +** ... +** bl c21 +** ... +*/ +void test21() __arm_inout("za") +{ + extern __UINT64_TYPE__ a21() __arm_inout("za"); + extern __UINT64_TYPE__ b21(__UINT64_TYPE__); + extern void c21(__UINT64_TYPE__) __arm_inout("za"); + c21(b21(a21())); +} + +/* +** test22: +** (?:(?!rdsvl).)* +** rdsvl x[0-9]+, #1 +** (?:(?!rdsvl).)* +*/ +void test22(volatile int *ptr) __arm_inout("za") +{ + inout_za(); + if (*ptr) + *ptr += 1; + else + private_za(); + private_za(); + in_za(); +} + +void test23(volatile int *ptr) __arm_inout("za") +{ + if (*ptr) + *ptr += 1; + else + inout_za(); + inout_za(); +} + +/* +** test24: +** ... +** bl in_za +** ... +** incb x1 +** ... +** bl out_za +** bl inout_za +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** incb x1 +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** incb x1 +** ... +** msr tpidr2_el0, x[0-9]+ +** ... +** bl private_za +** ... +** mrs x[0-9]+, tpidr2_el0 +** ... +** ret +*/ +void test24() __arm_inout("za") +{ + in_za(); + asm ("incb\tx1" ::: "x1", "za"); + out_za(); + inout_za(); + private_za(); + asm ("incb\tx1" ::: "x1", "za"); + private_za(); + asm ("incb\tx1" ::: "x1", "za"); + in_za(); + private_za(); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c b/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c new file mode 100644 index 00000000000..d5b226ae158 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_6.c @@ -0,0 +1,23 @@ +// { dg-options "-O -fno-optimize-sibling-calls -fomit-frame-pointer" } + +void private_za(); +void out_za() __arm_out("za"); +void in_za() __arm_in("za"); + +__arm_new("za") void test20(volatile int *ptr) +{ + if (*ptr) + out_za(); + else + *ptr += 1; + *ptr += 1; + if (*ptr) + in_za(); + else + *ptr += 1; +} + +// { dg-final { scan-assembler {\tbl\t__arm_tpidr2_save\n} } } +// { dg-final { scan-assembler {\tsmstart\tza\n} } } +// { dg-final { scan-assembler {\tsmstop\tza\n} } } +// { dg-final { scan-assembler-not {\tsub\tsp, sp, x[0-9]+\n} } } From patchwork Fri Nov 17 17:26:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865165 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3lD4fv3z1yS7 for ; Sat, 18 Nov 2023 04:28:08 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD52B385C6DD for ; Fri, 17 Nov 2023 17:28:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id AEBB33858022 for ; Fri, 17 Nov 2023 17:26:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AEBB33858022 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AEBB33858022 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242019; cv=none; b=OaZ/KQ/6LDTtHrJ35GOkAlg8OPu7wntlaQqYVOsPorCwTK6eZRAWaj76Bl6+rHZkmaT3p29dr6yLqGdCMIKJg0SSckJxIY6ldIUoB17W+gdXQ7Fbb2kBqHiv11PtF4kCKMjCFReJ0vJiHG9NJnJQ4AU/sU0Y2eFwCPAePL+7laM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242019; c=relaxed/simple; bh=0esWr7GyrXvLbjZuCSM7JcBA9pRgVVZbgD/ahAuEDdQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=VdH874TFcZjr76qJL+Ir1/V8dxoJdpFJ9v/6BBZJus0/jz67lwIZ+zmCTYFkmEuINpG+Rm6h0G6+AML/iCLzRlaz1ZGDr/NTBHKBVLGh8QPpl1yGlh0IkW1i/2TwqdWFOYAYIkBF6Ig8AvN0osGXAgiRAlaQUAjkP6hEYyIcQT4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 894101477 for ; Fri, 17 Nov 2023 09:27:41 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2515F3F73F for ; Fri, 17 Nov 2023 09:26:55 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 13/21] aarch64: Add a register class for w12-w15 References: Date: Fri, 17 Nov 2023 17:26:53 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Some SME instructions use w12-w15 to index ZA. This patch adds a register class for that range. gcc/ * config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro. (W12_W15_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it. * config/aarch64/aarch64.cc (aarch64_regno_regclass) (aarch64_class_max_nregs, aarch64_register_move_cost): Handle W12_W15_REGS. --- gcc/config/aarch64/aarch64.cc | 12 +++++++----- gcc/config/aarch64/aarch64.h | 6 ++++++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 2782feef0f3..1e4d1b03c0a 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -14368,6 +14368,9 @@ aarch64_label_mentioned_p (rtx x) enum reg_class aarch64_regno_regclass (unsigned regno) { + if (W12_W15_REGNUM_P (regno)) + return W12_W15_REGS; + if (STUB_REGNUM_P (regno)) return STUB_REGS; @@ -14732,6 +14735,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) unsigned int nregs, vec_flags; switch (regclass) { + case W12_W15_REGS: case STUB_REGS: case TAILCALL_ADDR_REGS: case POINTER_REGS: @@ -17090,13 +17094,11 @@ aarch64_register_move_cost (machine_mode mode, const struct cpu_regmove_cost *regmove_cost = aarch64_tune_params.regmove_cost; - /* Caller save and pointer regs are equivalent to GENERAL_REGS. */ - if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS - || to == STUB_REGS) + /* Trest any subset of POINTER_REGS as though it were GENERAL_REGS. */ + if (reg_class_subset_p (to, POINTER_REGS)) to = GENERAL_REGS; - if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS - || from == STUB_REGS) + if (reg_class_subset_p (from, POINTER_REGS)) from = GENERAL_REGS; /* Make RDFFR very expensive. In particular, if we know that the FFR diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index dc544273d32..83bd8ebdad7 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -660,6 +660,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; && (REGNO) != R17_REGNUM \ && (REGNO) != R30_REGNUM) \ +#define W12_W15_REGNUM_P(REGNO) \ + IN_RANGE (REGNO, R12_REGNUM, R15_REGNUM) + #define FP_REGNUM_P(REGNO) \ (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM)) @@ -686,6 +689,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; enum reg_class { NO_REGS, + W12_W15_REGS, TAILCALL_ADDR_REGS, STUB_REGS, GENERAL_REGS, @@ -710,6 +714,7 @@ enum reg_class #define REG_CLASS_NAMES \ { \ "NO_REGS", \ + "W12_W15_REGS", \ "TAILCALL_ADDR_REGS", \ "STUB_REGS", \ "GENERAL_REGS", \ @@ -731,6 +736,7 @@ enum reg_class #define REG_CLASS_CONTENTS \ { \ { 0x00000000, 0x00000000, 0x00000000 }, /* NO_REGS */ \ + { 0x0000f000, 0x00000000, 0x00000000 }, /* W12_W15_REGS */ \ { 0x00030000, 0x00000000, 0x00000000 }, /* TAILCALL_ADDR_REGS */\ { 0x3ffcffff, 0x00000000, 0x00000000 }, /* STUB_REGS */ \ { 0x7fffffff, 0x00000000, 0x00000003 }, /* GENERAL_REGS */ \ From patchwork Fri Nov 17 17:27:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865166 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3ls5kMXz1yS7 for ; Sat, 18 Nov 2023 04:28:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D03093858015 for ; Fri, 17 Nov 2023 17:28:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F30893857C58 for ; Fri, 17 Nov 2023 17:27:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F30893857C58 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F30893857C58 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242031; cv=none; b=h7Tw9CCS06Qhsq9UJSfvRip9VC/HyiwRXqqHXQeFHm+i0z5dfJMYiOtTWrwovoihVtnolHltGLBGwPthGOrnzHxLJKhHXCyNWHFBn3l6VYpqIRvEdUm428NQdge7m/xmWuKh7mvVpu8CLz1sfyxbzzTINgtd8QQUtt0TmrMr9Jc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242031; c=relaxed/simple; bh=624MomvQ7LB083fYrfoqRvL1vzdrCBvP8KfbEWDDAk8=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=cJKKuXiG//M6JXheoZbVGNJkockkV7RVMA8i7N4TaKeysJz9oBurJW+ETqaqORHHTH5tqZr5cF4sSrVTVIqlILUyDazY/ZUWzN8bhvzL/7LZRniGNcdn9jpVhiFuRFZYIc2Q1AaoXAIIyKymOVagqXqdTUVn4Vs6hiqUzvE9XaM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D11FA1477 for ; Fri, 17 Nov 2023 09:27:54 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6CEF53F73F for ; Fri, 17 Nov 2023 09:27:08 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 14/21] aarch64: Add a VNx1TI mode References: Date: Fri, 17 Nov 2023 17:27:07 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Although TI isn't really a native SVE element mode, it's convenient for SME if we define VNx1TI anyway, so that it can be used to distinguish .Q ZA operations from others. It's purely an RTL convenience and isn't (yet) a valid storage mode. gcc/ * config/aarch64/aarch64-modes.def: Add VNx1TI. --- gcc/config/aarch64/aarch64-modes.def | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 6b4f4e17dd5..a3efc5b8484 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -156,7 +156,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2) for 8-bit, 16-bit, 32-bit and 64-bit elements respectively. It isn't strictly necessary to set the alignment here, since the default would be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer. */ -#define SVE_MODES(NVECS, VB, VH, VS, VD) \ +#define SVE_MODES(NVECS, VB, VH, VS, VD, VT) \ VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, NVECS == 1 ? 1 : 4); \ VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, NVECS == 1 ? 1 : 4); \ \ @@ -164,6 +164,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2) ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \ ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \ ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \ + ADJUST_NUNITS (VT##TI, exact_div (aarch64_sve_vg * NVECS, 2)); \ ADJUST_NUNITS (VH##BF, aarch64_sve_vg * NVECS * 4); \ ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \ ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \ @@ -173,17 +174,23 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2) ADJUST_ALIGNMENT (VH##HI, 16); \ ADJUST_ALIGNMENT (VS##SI, 16); \ ADJUST_ALIGNMENT (VD##DI, 16); \ + ADJUST_ALIGNMENT (VT##TI, 16); \ ADJUST_ALIGNMENT (VH##BF, 16); \ ADJUST_ALIGNMENT (VH##HF, 16); \ ADJUST_ALIGNMENT (VS##SF, 16); \ ADJUST_ALIGNMENT (VD##DF, 16); -/* Give SVE vectors the names normally used for 256-bit vectors. - The actual number depends on command-line flags. */ -SVE_MODES (1, VNx16, VNx8, VNx4, VNx2) -SVE_MODES (2, VNx32, VNx16, VNx8, VNx4) -SVE_MODES (3, VNx48, VNx24, VNx12, VNx6) -SVE_MODES (4, VNx64, VNx32, VNx16, VNx8) +/* Give SVE vectors names of the form VNxX, where X describes what is + stored in each 128-bit unit. The actual size of the mode depends + on command-line flags. + + VNx1TI isn't really a native SVE mode, but it can be useful in some + limited situations. */ +VECTOR_MODE_WITH_PREFIX (VNx, INT, TI, 1, 1); +SVE_MODES (1, VNx16, VNx8, VNx4, VNx2, VNx1) +SVE_MODES (2, VNx32, VNx16, VNx8, VNx4, VNx2) +SVE_MODES (3, VNx48, VNx24, VNx12, VNx6, VNx3) +SVE_MODES (4, VNx64, VNx32, VNx16, VNx8, VNx4) /* Partial SVE vectors: From patchwork Fri Nov 17 17:27:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865162 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3kj2ySxz1yS7 for ; Sat, 18 Nov 2023 04:27:41 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A1DD2385800C for ; Fri, 17 Nov 2023 17:27:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CF8E83858004 for ; Fri, 17 Nov 2023 17:27:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF8E83858004 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CF8E83858004 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242047; cv=none; b=cDSXWIR7iXWSLRRDLZY+FF2+l6+H0P62fDzrHDXBsDwVzh1Pm2bOHgNEG9PPkhI/SOi6p4dp1SJ70f2Oi2n/tAyLE0bsgPCq8/03OC786/YNanyso2nOsE3xS5t/mXsr1oNt7fcge0uz4DOEQUBAluSBhE+WKfWeLZAGEOyNadk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242047; c=relaxed/simple; bh=ZX1X2RJGxu3F7uTrr2VOS91rb1yd2eAVuZJCjZflXIo=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=FHOst9VYPFaFkac2yA8pFnKqXPG/jP74C4IRcqihCAn3gUnUXf1ZsLnj7ixTD8VJyj9SopEgcEyFCDJDJ37h8BdSTP7a8aqEh2+BC2cTvs7TN3E7cBrRIYIUJZXVBOzE+QY4YwvtXNZtOpAuA7JcxOSz3OlLDnfWXqXm3MwZVAU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A44791477 for ; Fri, 17 Nov 2023 09:28:11 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F9C03F73F for ; Fri, 17 Nov 2023 09:27:25 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 15/21] aarch64: Generalise unspec_based_function_base References: Date: Fri, 17 Nov 2023 17:27:24 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Until now, SVE intrinsics that map directly to unspecs have always used type suffix 0 to distinguish between signed integers, unsigned integers, and floating-point values. SME adds functions that need to use type suffix 1 instead. This patch generalises the classes accordingly. gcc/ * config/aarch64/aarch64-sve-builtins-functions.h (unspec_based_function_base): Allow type suffix 1 to determine the mode of the operation. (unspec_based_function): Update accordingly. (unspec_based_fused_function): Likewise. (unspec_based_fused_lane_function): Likewise. --- .../aarch64/aarch64-sve-builtins-functions.h | 29 ++++++++++++------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h index 4a10102038a..be2561620f4 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h +++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h @@ -234,18 +234,21 @@ class unspec_based_function_base : public function_base public: CONSTEXPR unspec_based_function_base (int unspec_for_sint, int unspec_for_uint, - int unspec_for_fp) + int unspec_for_fp, + unsigned int suffix_index = 0) : m_unspec_for_sint (unspec_for_sint), m_unspec_for_uint (unspec_for_uint), - m_unspec_for_fp (unspec_for_fp) + m_unspec_for_fp (unspec_for_fp), + m_suffix_index (suffix_index) {} /* Return the unspec code to use for INSTANCE, based on type suffix 0. */ int unspec_for (const function_instance &instance) const { - return (!instance.type_suffix (0).integer_p ? m_unspec_for_fp - : instance.type_suffix (0).unsigned_p ? m_unspec_for_uint + auto &suffix = instance.type_suffix (m_suffix_index); + return (!suffix.integer_p ? m_unspec_for_fp + : suffix.unsigned_p ? m_unspec_for_uint : m_unspec_for_sint); } @@ -254,6 +257,9 @@ public: int m_unspec_for_sint; int m_unspec_for_uint; int m_unspec_for_fp; + + /* Which type suffix is used to choose between the unspecs. */ + unsigned int m_suffix_index; }; /* A function_base for functions that have an associated unspec code. @@ -306,7 +312,8 @@ public: rtx expand (function_expander &e) const override { - return e.use_exact_insn (CODE (unspec_for (e), e.vector_mode (0))); + return e.use_exact_insn (CODE (unspec_for (e), + e.vector_mode (m_suffix_index))); } }; @@ -360,16 +367,16 @@ public: { int unspec = unspec_for (e); insn_code icode; - if (e.type_suffix (0).float_p) + if (e.type_suffix (m_suffix_index).float_p) { /* Put the operands in the normal (fma ...) order, with the accumulator last. This fits naturally since that's also the unprinted operand in the asm output. */ e.rotate_inputs_left (0, e.pred != PRED_none ? 4 : 3); - icode = code_for_aarch64_sve (unspec, e.vector_mode (0)); + icode = code_for_aarch64_sve (unspec, e.vector_mode (m_suffix_index)); } else - icode = INT_CODE (unspec, e.vector_mode (0)); + icode = INT_CODE (unspec, e.vector_mode (m_suffix_index)); return e.use_exact_insn (icode); } }; @@ -390,16 +397,16 @@ public: { int unspec = unspec_for (e); insn_code icode; - if (e.type_suffix (0).float_p) + if (e.type_suffix (m_suffix_index).float_p) { /* Put the operands in the normal (fma ...) order, with the accumulator last. This fits naturally since that's also the unprinted operand in the asm output. */ e.rotate_inputs_left (0, e.pred != PRED_none ? 5 : 4); - icode = code_for_aarch64_lane (unspec, e.vector_mode (0)); + icode = code_for_aarch64_lane (unspec, e.vector_mode (m_suffix_index)); } else - icode = INT_CODE (unspec, e.vector_mode (0)); + icode = INT_CODE (unspec, e.vector_mode (m_suffix_index)); return e.use_exact_insn (icode); } }; From patchwork Fri Nov 17 17:27:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865164 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3kz6Fxvz1yS7 for ; Sat, 18 Nov 2023 04:27:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63085385AC3F for ; Fri, 17 Nov 2023 17:27:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0BC46385800E for ; Fri, 17 Nov 2023 17:27:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BC46385800E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0BC46385800E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242062; cv=none; b=uPuc7Pnn4ao+HoHhOLjajy7ZNh7beeNb6wrbZq0at4dTfEvBUCJw6dJ3KD9aTR2ng0uiX+undqVe+V/G23R/TwC4k69lzHgaCuCpqt0FyLwNE/cisdrbqIegVJiBmuHeqdVE+ST22rlUyAccdyMI9P3Nz8Whb55Wq45RCT4xjxY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242062; c=relaxed/simple; bh=PM/MEQ90CTj/R2gh1fKf5dlvXwHXmO7TZeRcoP0EezA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=UO0Q6gPBhkmhYoqKxXFpt5zuB0nLU/HIE6K4sEovtFs8V4sDg0zuVeN8HB4dYepAuzsmj9nllKYxlQI4DFjmGVDiXEDqUbt5Qf5R+ncs5ku7f/WGoVUKFLp7NBI21XPX89pUIkSFwBmQ+e+u9LxPJ0ps2kBU9yGjCpoi7Pb6Pxs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CA6D91477 for ; Fri, 17 Nov 2023 09:28:26 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 663E13F73F for ; Fri, 17 Nov 2023 09:27:40 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 16/21] aarch64: Generalise _m rules for SVE intrinsics References: Date: Fri, 17 Nov 2023 17:27:39 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org In SVE there was a simple rule that unary merging (_m) intrinsics had a separate initial argument to specify the values of inactive lanes, whereas other merging functions took inactive lanes from the first operand to the operation. That rule began to break down in SVE2, and it continues to do so in SME. This patch therefore adds a virtual function to specify whether the separate initial argument is present or not. The old rule is still the default. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_shape::has_merge_argument_p): New member function. * config/aarch64/aarch64-sve-builtins.cc: (function_resolver::check_gp_argument): Use it. (function_expander::get_fallback_value): Likewise. * config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication): Likewise. (unary_convert_narrowt_def::has_merge_argument_p): New function. --- gcc/config/aarch64/aarch64-sve-builtins-shapes.cc | 10 ++++++++-- gcc/config/aarch64/aarch64-sve-builtins.cc | 4 ++-- gcc/config/aarch64/aarch64-sve-builtins.h | 13 +++++++++++++ 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc index aa5dbb5df9d..8f6c0515ed6 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc @@ -66,8 +66,8 @@ apply_predication (const function_instance &instance, tree return_type, the same type as the result. For unary_convert_narrowt it also provides the "bottom" half of active elements, and is present for all types of predication. */ - if ((argument_types.length () == 2 && instance.pred == PRED_m) - || instance.shape == shapes::unary_convert_narrowt) + auto nargs = argument_types.length () - 1; + if (instance.shape->has_merge_argument_p (instance, nargs)) argument_types.quick_insert (0, return_type); } } @@ -3273,6 +3273,12 @@ SHAPE (unary_convert) predicate. */ struct unary_convert_narrowt_def : public overloaded_base<1> { + bool + has_merge_argument_p (const function_instance &, unsigned int) const override + { + return true; + } + void build (function_builder &b, const function_group_info &group) const override { diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 41e7d88bffa..b2d16c318e9 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -2230,7 +2230,7 @@ function_resolver::check_gp_argument (unsigned int nops, if (pred != PRED_none) { /* Unary merge operations should use resolve_unary instead. */ - gcc_assert (nops != 1 || pred != PRED_m); + gcc_assert (!shape->has_merge_argument_p (*this, nops)); nargs = nops + 1; if (!check_num_arguments (nargs) || !require_vector_type (i, VECTOR_TYPE_svbool_t)) @@ -2874,7 +2874,7 @@ function_expander::get_fallback_value (machine_mode mode, unsigned int nops, gcc_assert (pred == PRED_m || pred == PRED_x); if (merge_argno == DEFAULT_MERGE_ARGNO) - merge_argno = nops == 1 && pred == PRED_m ? 0 : 1; + merge_argno = shape->has_merge_argument_p (*this, nops) ? 0 : 1; if (merge_argno == 0) return args[argno++]; diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index 981a57d82d2..c65c1f6e959 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -676,6 +676,9 @@ public: class function_shape { public: + virtual bool has_merge_argument_p (const function_instance &, + unsigned int) const; + virtual bool explicit_type_suffix_p (unsigned int) const = 0; /* True if the group suffix is present in overloaded names. @@ -948,6 +951,16 @@ function_base::vectors_per_tuple (const function_instance &instance) const return instance.group_suffix ().vectors_per_tuple; } +/* Return true if INSTANCE (which has NARGS arguments) has an initial + vector argument whose only purpose is to specify the values of + inactive lanes. */ +inline bool +function_shape::has_merge_argument_p (const function_instance &instance, + unsigned int nargs) const +{ + return nargs == 1 && instance.pred == PRED_m; +} + /* Return the mode of the result of a call. */ inline machine_mode function_expander::result_mode () const From patchwork Fri Nov 17 17:30:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865168 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3pb4jQsz1yRV for ; Sat, 18 Nov 2023 04:31:03 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E1DAF3856DEC for ; Fri, 17 Nov 2023 17:31:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9B66338582A9 for ; Fri, 17 Nov 2023 17:30:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B66338582A9 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9B66338582A9 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242220; cv=none; b=Nzu3P8ifXRE5kK/l3OGYQTy/c/GhFcOHVYI2fGwYvJ6anvmH+3AR46qqkzi6XZhhPcsfzHfrKiG+ks0HuNoC6BhzAZyO9dUc7RJSWkK6Z+ZsZ72VnprxwmFlbh/EKB/u2ft+6agWE7Fzhlelw0iNmFe3t6CfS8kKTafWfVk5UJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242220; c=relaxed/simple; bh=Yw7XSs5SONNn/4CG16XcR47TqgtMSlNLvklJSsMOBpw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=oGkAbsm7Exv1QOSBK9CCAeKCjXpQAO55f9MLIFXsChRVsbu++QVccuCsPuYSGl24bIYC7mnamHOHfAXgbb3VXhfHemR/zt4YpJq4COttsuONF6xAEA7h5aOcZL+teJu2PWiaWlJ0eyUbqUm67fs0p4/CX5Il+NqJsMVWsl6nsrc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 784E21477 for ; Fri, 17 Nov 2023 09:30:58 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D0C963F73F for ; Fri, 17 Nov 2023 09:30:11 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 18/21] aarch64: Add support for __arm_locally_streaming References: Date: Fri, 17 Nov 2023 17:30:10 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_FILL_THIS_FORM_SHORT, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds support for the __arm_locally_streaming attribute, which allows a function to use SME internally without changing the function's ABI. The attribute is valid but redundant for __arm_streaming functions. gcc/ * config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add arm::locally_streaming. (aarch64_fndecl_is_locally_streaming): New function. (aarch64_fndecl_sm_state): Handle locally-streaming functions. (aarch64_cfun_enables_pstate_sm): New function. (aarch64_add_offset): Add an argument that specifies whether the streaming vector length should be used instead of the prevailing one. (aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise. (aarch64_allocate_and_probe_stack_space): Likewise. (aarch64_expand_mov_immediate): Update calls accordingly. (aarch64_need_old_pstate_sm): Return true for locally-streaming streaming-compatible functions. (aarch64_layout_frame): Force all call-preserved Z and P registers to be saved and restored if the function switches PSTATE.SM in the prologue. (aarch64_get_separate_components): Disable shrink-wrapping of such Z and P saves and restores. (aarch64_use_late_prologue_epilogue): New function. (aarch64_expand_prologue): Measure SVE lengths in the streaming vector length for locally-streaming functions, then emit code to enable streaming mode. (aarch64_expand_epilogue): Likewise in reverse. (TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define. * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Define __arm_locally_streaming. gcc/testsuite/ * gcc.target/aarch64/sme/locally_streaming_1.c: New test. * gcc.target/aarch64/sme/locally_streaming_2.c: Likewise. * gcc.target/aarch64/sme/locally_streaming_3.c: Likewise. * gcc.target/aarch64/sme/locally_streaming_4.c: Likewise. * gcc.target/aarch64/sme/keyword_macros_1.c: Add __arm_locally_streaming. * g++.target/aarch64/sme/keyword_macros_1.C: Likewise. --- gcc/config/aarch64/aarch64-c.cc | 1 + gcc/config/aarch64/aarch64.cc | 233 +++++++-- .../g++.target/aarch64/sme/keyword_macros_1.C | 1 + .../gcc.target/aarch64/sme/keyword_macros_1.c | 1 + .../aarch64/sme/locally_streaming_1.c | 466 ++++++++++++++++++ .../aarch64/sme/locally_streaming_2.c | 177 +++++++ .../aarch64/sme/locally_streaming_3.c | 273 ++++++++++ .../aarch64/sme/locally_streaming_4.c | 145 ++++++ 8 files changed, 1259 insertions(+), 38 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc index f2fa5df1b82..2a8ca46987a 100644 --- a/gcc/config/aarch64/aarch64-c.cc +++ b/gcc/config/aarch64/aarch64-c.cc @@ -86,6 +86,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile) DEFINE_ARM_KEYWORD_MACRO ("streaming"); DEFINE_ARM_KEYWORD_MACRO ("streaming_compatible"); + DEFINE_ARM_KEYWORD_MACRO ("locally_streaming"); #undef DEFINE_ARM_KEYWORD_MACRO diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 12753ac133e..6ad29a3a84f 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -3136,6 +3136,7 @@ static const attribute_spec aarch64_arm_attributes[] = NULL, attr_streaming_exclusions }, { "streaming_compatible", 0, 0, false, true, true, true, NULL, attr_streaming_exclusions }, + { "locally_streaming", 0, 0, true, false, false, false, NULL, NULL }, { "new", 1, -1, true, false, false, false, handle_arm_new, NULL }, { "preserves", 1, -1, false, true, true, true, @@ -4445,6 +4446,16 @@ aarch64_fntype_isa_mode (const_tree fntype) | aarch64_fntype_pstate_za (fntype)); } +/* Return true if FNDECL uses streaming mode internally, as an + implementation choice. */ + +static bool +aarch64_fndecl_is_locally_streaming (const_tree fndecl) +{ + return lookup_attribute ("arm", "locally_streaming", + DECL_ATTRIBUTES (fndecl)); +} + /* Return the state of PSTATE.SM when compiling the body of function FNDECL. This might be different from the state of PSTATE.SM on entry. */ @@ -4452,6 +4463,9 @@ aarch64_fntype_isa_mode (const_tree fntype) static aarch64_feature_flags aarch64_fndecl_pstate_sm (const_tree fndecl) { + if (aarch64_fndecl_is_locally_streaming (fndecl)) + return AARCH64_FL_SM_ON; + return aarch64_fntype_pstate_sm (TREE_TYPE (fndecl)); } @@ -4527,6 +4541,16 @@ aarch64_cfun_has_new_state (const char *state_name) return aarch64_fndecl_has_new_state (cfun->decl, state_name); } +/* Return true if PSTATE.SM is 1 in the body of the current function, + but is not guaranteed to be 1 on entry. */ + +static bool +aarch64_cfun_enables_pstate_sm () +{ + return (aarch64_fndecl_is_locally_streaming (cfun->decl) + && aarch64_cfun_incoming_pstate_sm () != AARCH64_FL_SM_ON); +} + /* Return true if the current function has state STATE_NAME, either by creating new state itself or by sharing state with callers. */ @@ -6768,6 +6792,10 @@ aarch64_add_offset_temporaries (rtx x) TEMP2, if nonnull, is a second temporary register that doesn't overlap either DEST or REG. + FORCE_ISA_MODE is AARCH64_FL_SM_ON if any variable component of OFFSET + is measured relative to the SME vector length instead of the current + prevailing vector length. It is 0 otherwise. + Since this function may be used to adjust the stack pointer, we must ensure that it cannot cause transient stack deallocation (for example by first incrementing SP and then decrementing when adjusting by a @@ -6776,6 +6804,7 @@ aarch64_add_offset_temporaries (rtx x) static void aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, poly_int64 offset, rtx temp1, rtx temp2, + aarch64_feature_flags force_isa_mode, bool frame_related_p, bool emit_move_imm = true) { gcc_assert (emit_move_imm || temp1 != NULL_RTX); @@ -6788,9 +6817,18 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, /* Try using ADDVL or ADDPL to add the whole value. */ if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (offset)) { - rtx offset_rtx = gen_int_mode (offset, mode); + gcc_assert (offset.coeffs[0] == offset.coeffs[1]); + rtx offset_rtx; + if (force_isa_mode == 0) + offset_rtx = gen_int_mode (offset, mode); + else + offset_rtx = aarch64_sme_vq_immediate (mode, offset.coeffs[0], 0); rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx)); RTX_FRAME_RELATED_P (insn) = frame_related_p; + if (frame_related_p && (force_isa_mode & AARCH64_FL_SM_ON)) + add_reg_note (insn, REG_CFA_ADJUST_CFA, + gen_rtx_SET (dest, plus_constant (Pmode, src, + offset))); return; } @@ -6806,11 +6844,19 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (poly_offset)) { - rtx offset_rtx = gen_int_mode (poly_offset, mode); + rtx offset_rtx; + if (force_isa_mode == 0) + offset_rtx = gen_int_mode (poly_offset, mode); + else + offset_rtx = aarch64_sme_vq_immediate (mode, factor, 0); if (frame_related_p) { rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx)); RTX_FRAME_RELATED_P (insn) = true; + if (force_isa_mode & AARCH64_FL_SM_ON) + add_reg_note (insn, REG_CFA_ADJUST_CFA, + gen_rtx_SET (dest, plus_constant (Pmode, src, + poly_offset))); src = dest; } else @@ -6841,9 +6887,19 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, rtx val; if (IN_RANGE (rel_factor, -32, 31)) { + if (force_isa_mode & AARCH64_FL_SM_ON) + { + /* Try to use an unshifted RDSVL, otherwise fall back on + a shifted RDSVL #1. */ + if (aarch64_sve_rdvl_addvl_factor_p (factor)) + shift = 0; + else + factor = rel_factor * 16; + val = aarch64_sme_vq_immediate (mode, factor, 0); + } /* Try to use an unshifted CNT[BHWD] or RDVL. */ - if (aarch64_sve_cnt_factor_p (factor) - || aarch64_sve_rdvl_addvl_factor_p (factor)) + else if (aarch64_sve_cnt_factor_p (factor) + || aarch64_sve_rdvl_addvl_factor_p (factor)) { val = gen_int_mode (poly_int64 (factor, factor), mode); shift = 0; @@ -6873,11 +6929,18 @@ aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src, a shift and add sequence for the multiplication. If CNTB << SHIFT is out of range, stick with the current shift factor. */ - if (IN_RANGE (low_bit, 2, 16 * 16)) + if (force_isa_mode == 0 + && IN_RANGE (low_bit, 2, 16 * 16)) { val = gen_int_mode (poly_int64 (low_bit, low_bit), mode); shift = 0; } + else if ((force_isa_mode & AARCH64_FL_SM_ON) + && aarch64_sve_rdvl_addvl_factor_p (low_bit)) + { + val = aarch64_sme_vq_immediate (mode, low_bit, 0); + shift = 0; + } else val = gen_int_mode (BYTES_PER_SVE_VECTOR, mode); @@ -6965,30 +7028,34 @@ aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src, rtx offset_rtx, rtx temp1, rtx temp2) { aarch64_add_offset (mode, dest, src, rtx_to_poly_int64 (offset_rtx), - temp1, temp2, false); + temp1, temp2, 0, false); } /* Add DELTA to the stack pointer, marking the instructions frame-related. - TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false - if TEMP1 already contains abs (DELTA). */ + TEMP1 is available as a temporary if nonnull. FORCE_ISA_MODE is as + for aarch64_add_offset. EMIT_MOVE_IMM is false if TEMP1 already + contains abs (DELTA). */ static inline void -aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm) +aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, + aarch64_feature_flags force_isa_mode, bool emit_move_imm) { aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta, - temp1, temp2, true, emit_move_imm); + temp1, temp2, force_isa_mode, true, emit_move_imm); } /* Subtract DELTA from the stack pointer, marking the instructions - frame-related if FRAME_RELATED_P. TEMP1 is available as a temporary - if nonnull. */ + frame-related if FRAME_RELATED_P. FORCE_ISA_MODE is as for + aarch64_add_offset. TEMP1 is available as a temporary if nonnull. */ static inline void -aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p, - bool emit_move_imm = true) +aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, + aarch64_feature_flags force_isa_mode, + bool frame_related_p, bool emit_move_imm = true) { aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta, - temp1, temp2, frame_related_p, emit_move_imm); + temp1, temp2, force_isa_mode, frame_related_p, + emit_move_imm); } /* A streaming-compatible function needs to switch temporarily to the known @@ -8014,11 +8081,11 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) { base = aarch64_force_temporary (int_mode, dest, base); aarch64_add_offset (int_mode, dest, base, offset, - NULL_RTX, NULL_RTX, false); + NULL_RTX, NULL_RTX, 0, false); } else aarch64_add_offset (int_mode, dest, base, offset, - dest, NULL_RTX, false); + dest, NULL_RTX, 0, false); } return; } @@ -8045,7 +8112,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) gcc_assert (can_create_pseudo_p ()); base = aarch64_force_temporary (int_mode, dest, base); aarch64_add_offset (int_mode, dest, base, const_offset, - NULL_RTX, NULL_RTX, false); + NULL_RTX, NULL_RTX, 0, false); return; } @@ -8085,7 +8152,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) gcc_assert(can_create_pseudo_p ()); base = aarch64_force_temporary (int_mode, dest, base); aarch64_add_offset (int_mode, dest, base, const_offset, - NULL_RTX, NULL_RTX, false); + NULL_RTX, NULL_RTX, 0, false); return; } /* FALLTHRU */ @@ -9728,6 +9795,9 @@ aarch64_need_old_pstate_sm () if (aarch64_cfun_incoming_pstate_sm () != 0) return false; + if (aarch64_cfun_enables_pstate_sm ()) + return true; + if (cfun->machine->call_switches_pstate_sm) for (auto insn = get_insns (); insn; insn = NEXT_INSN (insn)) if (auto *call = dyn_cast (insn)) @@ -9754,6 +9824,7 @@ aarch64_layout_frame (void) bool frame_related_fp_reg_p = false; aarch64_frame &frame = cfun->machine->frame; poly_int64 top_of_locals = -1; + bool enables_pstate_sm = aarch64_cfun_enables_pstate_sm (); vec_safe_truncate (frame.saved_gprs, 0); vec_safe_truncate (frame.saved_fprs, 0); @@ -9791,7 +9862,7 @@ aarch64_layout_frame (void) frame.reg_offset[regno] = SLOT_REQUIRED; for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) - if (df_regs_ever_live_p (regno) + if ((enables_pstate_sm || df_regs_ever_live_p (regno)) && !fixed_regs[regno] && !crtl->abi->clobbers_full_reg_p (regno)) { @@ -9820,7 +9891,7 @@ aarch64_layout_frame (void) } for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++) - if (df_regs_ever_live_p (regno) + if ((enables_pstate_sm || df_regs_ever_live_p (regno)) && !fixed_regs[regno] && !crtl->abi->clobbers_full_reg_p (regno)) frame.reg_offset[regno] = SLOT_REQUIRED; @@ -9937,7 +10008,8 @@ aarch64_layout_frame (void) /* If the current function changes the SVE vector length, ensure that the old value of the DWARF VG register is saved and available in the CFI, so that outer frames with VL-sized offsets can be processed correctly. */ - if (cfun->machine->call_switches_pstate_sm) + if (cfun->machine->call_switches_pstate_sm + || aarch64_cfun_enables_pstate_sm ()) { frame.reg_offset[VG_REGNUM] = offset; offset += UNITS_PER_WORD; @@ -10776,9 +10848,16 @@ aarch64_get_separate_components (void) bitmap_clear (components); /* The registers we need saved to the frame. */ + bool enables_pstate_sm = aarch64_cfun_enables_pstate_sm (); for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) if (aarch64_register_saved_on_entry (regno)) { + /* Disallow shrink wrapping for registers that will be clobbered + by an SMSTART SM in the prologue. */ + if (enables_pstate_sm + && (FP_REGNUM_P (regno) || PR_REGNUM_P (regno))) + continue; + /* Punt on saves and restores that use ST1D and LD1D. We could try to be smarter, but it would involve making sure that the spare predicate register itself is safe to use at the save @@ -11097,11 +11176,16 @@ aarch64_emit_stack_tie (rtx reg) events, e.g. if we were to allow the stack to be dropped by more than a page and then have multiple probes up and we take a signal somewhere in between then the signal handler doesn't know the state of the stack and can make no - assumptions about which pages have been probed. */ + assumptions about which pages have been probed. + + FORCE_ISA_MODE is AARCH64_FL_SM_ON if any variable component of POLY_SIZE + is measured relative to the SME vector length instead of the current + prevailing vector length. It is 0 otherwise. */ static void aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, poly_int64 poly_size, + aarch64_feature_flags force_isa_mode, bool frame_related_p, bool final_adjustment_p) { @@ -11143,7 +11227,8 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, if (known_lt (poly_size, min_probe_threshold) || !flag_stack_clash_protection) { - aarch64_sub_sp (temp1, temp2, poly_size, frame_related_p); + aarch64_sub_sp (temp1, temp2, poly_size, force_isa_mode, + frame_related_p); return; } @@ -11160,7 +11245,8 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, /* First calculate the amount of bytes we're actually spilling. */ aarch64_add_offset (Pmode, temp1, CONST0_RTX (Pmode), - poly_size, temp1, temp2, false, true); + poly_size, temp1, temp2, force_isa_mode, + false, true); rtx_insn *insn = get_last_insn (); @@ -11218,7 +11304,7 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, { for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size) { - aarch64_sub_sp (NULL, temp2, guard_size, true); + aarch64_sub_sp (NULL, temp2, guard_size, force_isa_mode, true); emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx, guard_used_by_caller)); emit_insn (gen_blockage ()); @@ -11229,7 +11315,7 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, { /* Compute the ending address. */ aarch64_add_offset (Pmode, temp1, stack_pointer_rtx, -rounded_size, - temp1, NULL, false, true); + temp1, NULL, force_isa_mode, false, true); rtx_insn *insn = get_last_insn (); /* For the initial allocation, we don't have a frame pointer @@ -11295,7 +11381,7 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, if (final_adjustment_p && rounded_size != 0) min_probe_threshold = 0; - aarch64_sub_sp (temp1, temp2, residual, frame_related_p); + aarch64_sub_sp (temp1, temp2, residual, force_isa_mode, frame_related_p); if (residual >= min_probe_threshold) { if (dump_file) @@ -11360,6 +11446,14 @@ aarch64_epilogue_uses (int regno) return 0; } +/* Implement TARGET_USE_LATE_PROLOGUE_EPILOGUE. */ + +static bool +aarch64_use_late_prologue_epilogue () +{ + return aarch64_cfun_enables_pstate_sm (); +} + /* The current function's frame has a save slot for the incoming state of SVCR. Return a legitimate memory for the slot, based on the hard frame pointer. */ @@ -11496,6 +11590,9 @@ aarch64_expand_prologue (void) unsigned reg2 = frame.wb_push_candidate2; bool emit_frame_chain = frame.emit_frame_chain; rtx_insn *insn; + aarch64_feature_flags force_isa_mode = 0; + if (aarch64_cfun_enables_pstate_sm ()) + force_isa_mode = AARCH64_FL_SM_ON; if (flag_stack_clash_protection && known_eq (callee_adjust, 0)) { @@ -11557,7 +11654,7 @@ aarch64_expand_prologue (void) less the amount of the guard reserved for use by the caller's outgoing args. */ aarch64_allocate_and_probe_stack_space (tmp0_rtx, tmp1_rtx, initial_adjust, - true, false); + force_isa_mode, true, false); if (callee_adjust != 0) aarch64_push_regs (reg1, reg2, callee_adjust); @@ -11580,7 +11677,8 @@ aarch64_expand_prologue (void) gcc_assert (known_eq (chain_offset, 0)); aarch64_add_offset (Pmode, hard_frame_pointer_rtx, stack_pointer_rtx, chain_offset, - tmp1_rtx, tmp0_rtx, frame_pointer_needed); + tmp1_rtx, tmp0_rtx, force_isa_mode, + frame_pointer_needed); if (frame_pointer_needed && !frame_size.is_constant ()) { /* Variable-sized frames need to describe the save slot @@ -11627,6 +11725,7 @@ aarch64_expand_prologue (void) || known_eq (initial_adjust, 0)); aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, sve_callee_adjust, + force_isa_mode, !frame_pointer_needed, false); bytes_below_sp -= sve_callee_adjust; } @@ -11639,12 +11738,15 @@ aarch64_expand_prologue (void) that is assumed by the called. */ gcc_assert (known_eq (bytes_below_sp, final_adjust)); aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx, final_adjust, + force_isa_mode, !frame_pointer_needed, true); if (emit_frame_chain && maybe_ne (final_adjust, 0)) aarch64_emit_stack_tie (hard_frame_pointer_rtx); - /* Save the incoming value of PSTATE.SM, if required. */ - if (known_ge (frame.old_svcr_offset, 0)) + /* Save the incoming value of PSTATE.SM, if required. Code further + down does this for locally-streaming functions. */ + if (known_ge (frame.old_svcr_offset, 0) + && !aarch64_cfun_enables_pstate_sm ()) { rtx mem = aarch64_old_svcr_mem (); MEM_VOLATILE_P (mem) = 1; @@ -11676,6 +11778,34 @@ aarch64_expand_prologue (void) emit_move_insn (gen_rtx_REG (DImode, R1_REGNUM), old_r1); } } + + /* Enable PSTATE.SM, if required. */ + if (aarch64_cfun_enables_pstate_sm ()) + { + rtx_insn *guard_label = nullptr; + if (known_ge (cfun->machine->frame.old_svcr_offset, 0)) + { + /* The current function is streaming-compatible. Save the + original state of PSTATE.SM. */ + rtx svcr = gen_rtx_REG (DImode, IP0_REGNUM); + emit_insn (gen_aarch64_read_svcr (svcr)); + emit_move_insn (aarch64_old_svcr_mem (), svcr); + guard_label = aarch64_guard_switch_pstate_sm (svcr, + aarch64_isa_flags); + } + aarch64_sme_mode_switch_regs args_switch; + auto &args = crtl->args.info; + for (unsigned int i = 0; i < args.num_sme_mode_switch_args; ++i) + { + rtx x = args.sme_mode_switch_args[i]; + args_switch.add_reg (GET_MODE (x), REGNO (x)); + } + args_switch.emit_prologue (); + emit_insn (gen_aarch64_smstart_sm ()); + args_switch.emit_epilogue (); + if (guard_label) + emit_label (guard_label); + } } /* Return TRUE if we can use a simple_return insn. @@ -11722,6 +11852,9 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) HOST_WIDE_INT guard_size = 1 << param_stack_clash_protection_guard_size; HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD; + aarch64_feature_flags force_isa_mode = 0; + if (aarch64_cfun_enables_pstate_sm ()) + force_isa_mode = AARCH64_FL_SM_ON; /* We can re-use the registers when: @@ -11746,6 +11879,24 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) = maybe_ne (get_frame_size () + frame.saved_varargs_size, 0); + /* Reset PSTATE.SM, if required. */ + if (aarch64_cfun_enables_pstate_sm ()) + { + rtx_insn *guard_label = nullptr; + if (known_ge (cfun->machine->frame.old_svcr_offset, 0)) + guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, + aarch64_isa_flags); + aarch64_sme_mode_switch_regs return_switch; + if (crtl->return_rtx && REG_P (crtl->return_rtx)) + return_switch.add_reg (GET_MODE (crtl->return_rtx), + REGNO (crtl->return_rtx)); + return_switch.emit_prologue (); + emit_insn (gen_aarch64_smstop_sm ()); + return_switch.emit_epilogue (); + if (guard_label) + emit_label (guard_label); + } + /* Emit a barrier to prevent loads from a deallocated stack. */ if (maybe_gt (final_adjust, crtl->outgoing_args_size) || cfun->calls_alloca @@ -11766,19 +11917,21 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) aarch64_add_offset (Pmode, stack_pointer_rtx, hard_frame_pointer_rtx, -bytes_below_hard_fp + final_adjust, - tmp1_rtx, tmp0_rtx, callee_adjust == 0); + tmp1_rtx, tmp0_rtx, force_isa_mode, + callee_adjust == 0); else /* The case where we need to re-use the register here is very rare, so avoid the complicated condition and just always emit a move if the immediate doesn't fit. */ - aarch64_add_sp (tmp1_rtx, tmp0_rtx, final_adjust, true); + aarch64_add_sp (tmp1_rtx, tmp0_rtx, final_adjust, force_isa_mode, true); /* Restore the vector registers before the predicate registers, so that we can use P4 as a temporary for big-endian SVE frames. */ aarch64_restore_callee_saves (final_adjust, frame.saved_fprs, &cfi_ops); aarch64_restore_callee_saves (final_adjust, frame.saved_prs, &cfi_ops); if (maybe_ne (sve_callee_adjust, 0)) - aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true); + aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, + force_isa_mode, true); /* When shadow call stack is enabled, the scs_pop in the epilogue will restore x30, we don't need to restore x30 again in the traditional @@ -11808,7 +11961,7 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) /* Liveness of EP0_REGNUM can not be trusted across function calls either, so add restriction on emit_move optimization to leaf functions. */ - aarch64_add_sp (tmp0_rtx, tmp1_rtx, initial_adjust, + aarch64_add_sp (tmp0_rtx, tmp1_rtx, initial_adjust, force_isa_mode, (!can_inherit_p || !crtl->is_leaf || df_regs_ever_live_p (EP0_REGNUM))); @@ -11941,7 +12094,8 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, temp1 = gen_rtx_REG (Pmode, EP1_REGNUM); if (vcall_offset == 0) - aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, false); + aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, + 0, false); else { gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0); @@ -11954,7 +12108,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED, plus_constant (Pmode, this_rtx, delta)); else aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, - temp1, temp0, false); + temp1, temp0, 0, false); } if (Pmode == ptr_mode) @@ -31355,6 +31509,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_EXTRA_LIVE_ON_ENTRY #define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_extra_live_on_entry +#undef TARGET_USE_LATE_PROLOGUE_EPILOGUE +#define TARGET_USE_LATE_PROLOGUE_EPILOGUE aarch64_use_late_prologue_epilogue + #undef TARGET_EMIT_EPILOGUE_FOR_SIBCALL #define TARGET_EMIT_EPILOGUE_FOR_SIBCALL aarch64_expand_epilogue diff --git a/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C index 8b0755014cc..dc5c097bd52 100644 --- a/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C +++ b/gcc/testsuite/g++.target/aarch64/sme/keyword_macros_1.C @@ -7,3 +7,4 @@ void f4 () __arm_out("za"); void f5 () __arm_inout("za"); void f6 () __arm_preserves("za"); __arm_new("za") void f7 () {} +__arm_locally_streaming void f8 () {} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c index fcabe3edc55..22f5facfdf9 100644 --- a/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sme/keyword_macros_1.c @@ -7,3 +7,4 @@ void f4 () __arm_out("za"); void f5 () __arm_inout("za"); void f6 () __arm_preserves("za"); __arm_new("za") void f7 () {} +__arm_locally_streaming void f8 () {} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c new file mode 100644 index 00000000000..20ff4b87d94 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c @@ -0,0 +1,466 @@ +// { dg-options "-O -fomit-frame-pointer" } +// { dg-final { check-function-bodies "**" "" } } + +void consume_za () [[arm::streaming, arm::inout("za")]]; + +/* +** n_ls: +** sub sp, sp, #?80 +** cntd x16 +** str x16, \[sp\] +** stp d8, d9, \[sp, #?16\] +** stp d10, d11, \[sp, #?32\] +** stp d12, d13, \[sp, #?48\] +** stp d14, d15, \[sp, #?64\] +** smstart sm +** smstop sm +** ldp d8, d9, \[sp, #?16\] +** ldp d10, d11, \[sp, #?32\] +** ldp d12, d13, \[sp, #?48\] +** ldp d14, d15, \[sp, #?64\] +** add sp, sp, #?80 +** ret +*/ +[[arm::locally_streaming]] void +n_ls () +{ + asm (""); +} + +/* +** s_ls: +** ret +*/ +[[arm::locally_streaming]] void +s_ls () [[arm::streaming]] +{ + asm (""); +} + +/* +** sc_ls: +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstart sm +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstop sm +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** ret +*/ +[[arm::locally_streaming]] void +sc_ls () [[arm::streaming_compatible]] +{ + asm (""); +} + +/* +** n_ls_new_za: +** str x30, \[sp, #?-80\]! +** cntd x16 +** str x16, \[sp, #?8\] +** stp d8, d9, \[sp, #?16\] +** stp d10, d11, \[sp, #?32\] +** stp d12, d13, \[sp, #?48\] +** stp d14, d15, \[sp, #?64\] +** smstart sm +** mrs (x[0-9]+), tpidr2_el0 +** cbz \1, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** zero { za } +** smstart za +** bl consume_za +** smstop za +** smstop sm +** ldp d8, d9, \[sp, #?16\] +** ldp d10, d11, \[sp, #?32\] +** ldp d12, d13, \[sp, #?48\] +** ldp d14, d15, \[sp, #?64\] +** ldr x30, \[sp\], #?80 +** ret +*/ +[[arm::locally_streaming, arm::new("za")]] void +n_ls_new_za () +{ + consume_za (); + asm (""); +} + +/* +** s_ls_new_za: +** str x30, \[sp, #?-16\]! +** mrs (x[0-9]+), tpidr2_el0 +** cbz \1, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** zero { za } +** smstart za +** bl consume_za +** smstop za +** ldr x30, \[sp\], #?16 +** ret +*/ +[[arm::locally_streaming, arm::new("za")]] void +s_ls_new_za () [[arm::streaming]] +{ + consume_za (); + asm (""); +} + +/* +** sc_ls_new_za: +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstart sm +** mrs (x[0-9]+), tpidr2_el0 +** cbz \1, [^\n]+ +** bl __arm_tpidr2_save +** msr tpidr2_el0, xzr +** zero { za } +** smstart za +** bl consume_za +** smstop za +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstop sm +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** ret +*/ +[[arm::locally_streaming, arm::new("za")]] void +sc_ls_new_za () [[arm::streaming_compatible]] +{ + consume_za (); + asm (""); +} + +/* +** n_ls_shared_za: +** str x30, \[sp, #?-80\]! +** cntd x16 +** str x16, \[sp, #?8\] +** stp d8, d9, \[sp, #?16\] +** stp d10, d11, \[sp, #?32\] +** stp d12, d13, \[sp, #?48\] +** stp d14, d15, \[sp, #?64\] +** smstart sm +** bl consume_za +** smstop sm +** ldp d8, d9, \[sp, #?16\] +** ldp d10, d11, \[sp, #?32\] +** ldp d12, d13, \[sp, #?48\] +** ldp d14, d15, \[sp, #?64\] +** ldr x30, \[sp\], #?80 +** ret +*/ +[[arm::locally_streaming]] void +n_ls_shared_za () [[arm::inout("za")]] +{ + consume_za (); + asm (""); +} + +/* +** s_ls_shared_za: +** str x30, \[sp, #?-16\]! +** bl consume_za +** ldr x30, \[sp\], #?16 +** ret +*/ +[[arm::locally_streaming]] void +s_ls_shared_za () [[arm::streaming, arm::inout("za")]] +{ + consume_za (); + asm (""); +} + +/* +** sc_ls_shared_za: +** stp x29, x30, \[sp, #?-96\]! +** mov x29, sp +** cntd x16 +** str x16, \[sp, #?24\] +** stp d8, d9, \[sp, #?32\] +** stp d10, d11, \[sp, #?48\] +** stp d12, d13, \[sp, #?64\] +** stp d14, d15, \[sp, #?80\] +** mrs x16, svcr +** str x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstart sm +** bl consume_za +** ldr x16, \[x29, #?16\] +** tbnz x16, 0, [^\n]+ +** smstop sm +** ldp d8, d9, \[sp, #?32\] +** ldp d10, d11, \[sp, #?48\] +** ldp d12, d13, \[sp, #?64\] +** ldp d14, d15, \[sp, #?80\] +** ldp x29, x30, \[sp\], #?96 +** ret +*/ +[[arm::locally_streaming]] void +sc_ls_shared_za () [[arm::streaming_compatible, arm::inout("za")]] +{ + consume_za (); + asm (""); +} + +/* +** n_ls_vector_pcs: +** sub sp, sp, #?272 +** cntd x16 +** str x16, \[sp\] +** stp q8, q9, \[sp, #?16\] +** stp q10, q11, \[sp, #?48\] +** stp q12, q13, \[sp, #?80\] +** stp q14, q15, \[sp, #?112\] +** stp q16, q17, \[sp, #?144\] +** stp q18, q19, \[sp, #?176\] +** stp q20, q21, \[sp, #?208\] +** stp q22, q23, \[sp, #?240\] +** smstart sm +** smstop sm +** ldp q8, q9, \[sp, #?16\] +** ldp q10, q11, \[sp, #?48\] +** ldp q12, q13, \[sp, #?80\] +** ldp q14, q15, \[sp, #?112\] +** ldp q16, q17, \[sp, #?144\] +** ldp q18, q19, \[sp, #?176\] +** ldp q20, q21, \[sp, #?208\] +** ldp q22, q23, \[sp, #?240\] +** add sp, sp, #?272 +** ret +*/ +[[arm::locally_streaming]] void __attribute__((aarch64_vector_pcs)) +n_ls_vector_pcs () +{ + asm (""); +} + +/* +** n_ls_sve_pcs: +** sub sp, sp, #?16 +** cntd x16 +** str x16, \[sp\] +** addsvl sp, sp, #-18 +** str p4, \[sp\] +** str p5, \[sp, #1, mul vl\] +** str p6, \[sp, #2, mul vl\] +** str p7, \[sp, #3, mul vl\] +** str p8, \[sp, #4, mul vl\] +** str p9, \[sp, #5, mul vl\] +** str p10, \[sp, #6, mul vl\] +** str p11, \[sp, #7, mul vl\] +** str p12, \[sp, #8, mul vl\] +** str p13, \[sp, #9, mul vl\] +** str p14, \[sp, #10, mul vl\] +** str p15, \[sp, #11, mul vl\] +** str z8, \[sp, #2, mul vl\] +** str z9, \[sp, #3, mul vl\] +** str z10, \[sp, #4, mul vl\] +** str z11, \[sp, #5, mul vl\] +** str z12, \[sp, #6, mul vl\] +** str z13, \[sp, #7, mul vl\] +** str z14, \[sp, #8, mul vl\] +** str z15, \[sp, #9, mul vl\] +** str z16, \[sp, #10, mul vl\] +** str z17, \[sp, #11, mul vl\] +** str z18, \[sp, #12, mul vl\] +** str z19, \[sp, #13, mul vl\] +** str z20, \[sp, #14, mul vl\] +** str z21, \[sp, #15, mul vl\] +** str z22, \[sp, #16, mul vl\] +** str z23, \[sp, #17, mul vl\] +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** smstop sm +** ldr z8, \[sp, #2, mul vl\] +** ldr z9, \[sp, #3, mul vl\] +** ldr z10, \[sp, #4, mul vl\] +** ldr z11, \[sp, #5, mul vl\] +** ldr z12, \[sp, #6, mul vl\] +** ldr z13, \[sp, #7, mul vl\] +** ldr z14, \[sp, #8, mul vl\] +** ldr z15, \[sp, #9, mul vl\] +** ldr z16, \[sp, #10, mul vl\] +** ldr z17, \[sp, #11, mul vl\] +** ldr z18, \[sp, #12, mul vl\] +** ldr z19, \[sp, #13, mul vl\] +** ldr z20, \[sp, #14, mul vl\] +** ldr z21, \[sp, #15, mul vl\] +** ldr z22, \[sp, #16, mul vl\] +** ldr z23, \[sp, #17, mul vl\] +** ldr p4, \[sp\] +** ldr p5, \[sp, #1, mul vl\] +** ldr p6, \[sp, #2, mul vl\] +** ldr p7, \[sp, #3, mul vl\] +** ldr p8, \[sp, #4, mul vl\] +** ldr p9, \[sp, #5, mul vl\] +** ldr p10, \[sp, #6, mul vl\] +** ldr p11, \[sp, #7, mul vl\] +** ldr p12, \[sp, #8, mul vl\] +** ldr p13, \[sp, #9, mul vl\] +** ldr p14, \[sp, #10, mul vl\] +** ldr p15, \[sp, #11, mul vl\] +** addsvl sp, sp, #18 +** add sp, sp, #?16 +** ret +*/ +[[arm::locally_streaming]] void +n_ls_sve_pcs (__SVBool_t x) +{ + asm (""); +} + +/* +** n_ls_v0: +** addsvl sp, sp, #-1 +** ... +** smstart sm +** add x[0-9]+, [^\n]+ +** smstop sm +** ... +** addsvl sp, sp, #1 +** ... +*/ +#define TEST(VN) __SVInt32_t VN; asm ("" :: "r" (&VN)); +[[arm::locally_streaming]] void +n_ls_v0 () +{ + TEST (v0); +} + +/* +** n_ls_v32: +** addsvl sp, sp, #-32 +** ... +** smstart sm +** ... +** smstop sm +** ... +** rdsvl (x[0-9]+), #1 +** lsl (x[0-9]+), \1, #?5 +** add sp, sp, \2 +** ... +*/ +[[arm::locally_streaming]] void +n_ls_v32 () +{ + TEST (v0); + TEST (v1); + TEST (v2); + TEST (v3); + TEST (v4); + TEST (v5); + TEST (v6); + TEST (v7); + TEST (v8); + TEST (v9); + TEST (v10); + TEST (v11); + TEST (v12); + TEST (v13); + TEST (v14); + TEST (v15); + TEST (v16); + TEST (v17); + TEST (v18); + TEST (v19); + TEST (v20); + TEST (v21); + TEST (v22); + TEST (v23); + TEST (v24); + TEST (v25); + TEST (v26); + TEST (v27); + TEST (v28); + TEST (v29); + TEST (v30); + TEST (v31); +} + +/* +** n_ls_v33: +** rdsvl (x[0-9]+), #1 +** mov (x[0-9]+), #?33 +** mul (x[0-9]+), (?:\1, \2|\2, \1) +** sub sp, sp, \3 +** ... +** smstart sm +** ... +** smstop sm +** ... +** rdsvl (x[0-9]+), #1 +** mov (x[0-9]+), #?33 +** mul (x[0-9]+), (?:\4, \5|\5, \4) +** add sp, sp, \6 +** ... +*/ +[[arm::locally_streaming]] void +n_ls_v33 () +{ + TEST (v0); + TEST (v1); + TEST (v2); + TEST (v3); + TEST (v4); + TEST (v5); + TEST (v6); + TEST (v7); + TEST (v8); + TEST (v9); + TEST (v10); + TEST (v11); + TEST (v12); + TEST (v13); + TEST (v14); + TEST (v15); + TEST (v16); + TEST (v17); + TEST (v18); + TEST (v19); + TEST (v20); + TEST (v21); + TEST (v22); + TEST (v23); + TEST (v24); + TEST (v25); + TEST (v26); + TEST (v27); + TEST (v28); + TEST (v29); + TEST (v30); + TEST (v31); + TEST (v32); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c new file mode 100644 index 00000000000..0eba993855f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c @@ -0,0 +1,177 @@ +// { dg-options "-O -fomit-frame-pointer" } +// { dg-final { check-function-bodies "**" "" } } + +#include +#include + +/* +** test_d0: +** ... +** smstart sm +** ... +** fmov x10, d0 +** smstop sm +** fmov d0, x10 +** ... +*/ +[[arm::locally_streaming]] double +test_d0 () +{ + asm (""); + return 1.0f; +} + +/* +** test_d0_vec: +** ... +** smstart sm +** ... +** ( +** fmov x10, d0 +** | +** umov x10, v0.d\[0\] +** ) +** smstop sm +** fmov d0, x10 +** ... +*/ +[[arm::locally_streaming]] int8x8_t +test_d0_vec () +{ + asm (""); + return (int8x8_t) {}; +} + +/* +** test_q0: +** ... +** smstart sm +** ... +** str q0, \[sp, #?-16\]! +** smstop sm +** ldr q0, \[sp\], #?16 +** ... +*/ +[[arm::locally_streaming]] int8x16_t +test_q0 () +{ + asm (""); + return (int8x16_t) {}; +} + +/* +** test_q1: +** ... +** smstart sm +** ... +** stp q0, q1, \[sp, #?-32\]! +** smstop sm +** ldp q0, q1, \[sp\], #?32 +** ... +*/ +[[arm::locally_streaming]] int8x16x2_t +test_q1 () +{ + asm (""); + return (int8x16x2_t) {}; +} + +/* +** test_q2: +** ... +** smstart sm +** ... +** stp q0, q1, \[sp, #?-48\]! +** str q2, \[sp, #?32\] +** smstop sm +** ldr q2, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?48 +** ... +*/ +[[arm::locally_streaming]] int8x16x3_t +test_q2 () +{ + asm (""); + return (int8x16x3_t) {}; +} + +/* +** test_q3: +** ... +** smstart sm +** ... +** stp q0, q1, \[sp, #?-64\]! +** stp q2, q3, \[sp, #?32\] +** smstop sm +** ldp q2, q3, \[sp, #?32\] +** ldp q0, q1, \[sp\], #?64 +** ... +*/ +[[arm::locally_streaming]] int8x16x4_t +test_q3 () +{ + asm (""); + return (int8x16x4_t) {}; +} + +/* +** test_z0: +** ... +** smstart sm +** mov z0\.b, #0 +** addvl sp, sp, #-1 +** str z0, \[sp\] +** smstop sm +** ldr z0, \[sp\] +** addvl sp, sp, #1 +** ... +*/ +[[arm::locally_streaming]] svint8_t +test_z0 () +{ + asm (""); + return (svint8_t) {}; +} + +/* +** test_z3: +** ... +** smstart sm +** ... +** addvl sp, sp, #-4 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** smstop sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** ... +*/ +[[arm::locally_streaming]] svint8x4_t +test_z3 () +{ + asm (""); + return (svint8x4_t) {}; +} + +/* +** test_p0: +** ... +** smstart sm +** pfalse p0\.b +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstop sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** ... +*/ +[[arm::locally_streaming]] svbool_t +test_p0 () +{ + asm (""); + return (svbool_t) {}; +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c new file mode 100644 index 00000000000..2bdea6ac631 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c @@ -0,0 +1,273 @@ +// { dg-options "-O -fomit-frame-pointer" } +// { dg-final { check-function-bodies "**" "" } } + +#include +#include + +/* +** test_d0: +** ... +** fmov x10, d0 +** smstart sm +** fmov d0, x10 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_d0 (double d0) +{ + asm (""); +} + +/* +** test_d7: +** ... +** fmov x10, d0 +** fmov x11, d1 +** fmov x12, d2 +** fmov x13, d3 +** fmov x14, d4 +** fmov x15, d5 +** fmov x16, d6 +** fmov x17, d7 +** smstart sm +** fmov d0, x10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d3, x13 +** fmov d4, x14 +** fmov d5, x15 +** fmov d6, x16 +** fmov d7, x17 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_d7 (double d0, double d1, double d2, double d3, + double d4, double d5, double d6, double d7) +{ + asm (""); +} + +/* +** test_d0_vec: +** ... +** ( +** fmov x10, d0 +** | +** umov x10, v0.d\[0\] +** ) +** smstart sm +** fmov d0, x10 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_d0_vec (int8x8_t d0) +{ + asm (""); +} + +/* +** test_d7_vec: +** ... +** ( +** fmov x10, d0 +** fmov x11, d1 +** fmov x12, d2 +** fmov x13, d3 +** fmov x14, d4 +** fmov x15, d5 +** fmov x16, d6 +** fmov x17, d7 +** | +** umov x10, v0.d\[0\] +** umov x11, v1.d\[0\] +** umov x12, v2.d\[0\] +** umov x13, v3.d\[0\] +** umov x14, v4.d\[0\] +** umov x15, v5.d\[0\] +** umov x16, v6.d\[0\] +** umov x17, v7.d\[0\] +** ) +** smstart sm +** fmov d0, x10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d3, x13 +** fmov d4, x14 +** fmov d5, x15 +** fmov d6, x16 +** fmov d7, x17 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_d7_vec (int8x8_t d0, int8x8_t d1, int8x8_t d2, int8x8_t d3, + int8x8_t d4, int8x8_t d5, int8x8_t d6, int8x8_t d7) +{ + asm (""); +} + +/* +** test_q0: +** ... +** str q0, \[sp, #?-16\]! +** smstart sm +** ldr q0, \[sp\], #?16 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_q0 (int8x16_t q0) +{ + asm (""); +} + +/* +** test_q7: +** ... +** stp q0, q1, \[sp, #?-128\]! +** stp q2, q3, \[sp, #?32\] +** stp q4, q5, \[sp, #?64\] +** stp q6, q7, \[sp, #?96\] +** smstart sm +** ldp q2, q3, \[sp, #?32\] +** ldp q4, q5, \[sp, #?64\] +** ldp q6, q7, \[sp, #?96\] +** ldp q0, q1, \[sp\], #?128 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_q7 (int8x16x4_t q0, int8x16x4_t q4) +{ + asm (""); +} + +/* +** test_z0: +** ... +** addvl sp, sp, #-1 +** str z0, \[sp\] +** smstart sm +** ldr z0, \[sp\] +** addvl sp, sp, #1 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_z0 (svint8_t z0) +{ + asm (""); +} + +/* +** test_z7: +** ... +** addvl sp, sp, #-8 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** str z4, \[sp, #4, mul vl\] +** str z5, \[sp, #5, mul vl\] +** str z6, \[sp, #6, mul vl\] +** str z7, \[sp, #7, mul vl\] +** smstart sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** ldr z4, \[sp, #4, mul vl\] +** ldr z5, \[sp, #5, mul vl\] +** ldr z6, \[sp, #6, mul vl\] +** ldr z7, \[sp, #7, mul vl\] +** addvl sp, sp, #8 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_z7 (svint8x4_t z0, svint8x4_t z4) +{ + asm (""); +} + +/* +** test_p0: +** ... +** addvl sp, sp, #-1 +** str p0, \[sp\] +** smstart sm +** ldr p0, \[sp\] +** addvl sp, sp, #1 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_p0 (svbool_t p0) +{ + asm (""); +} + +/* +** test_p3: +** ... +** addvl sp, sp, #-1 +** str p0, \[sp\] +** str p1, \[sp, #1, mul vl\] +** str p2, \[sp, #2, mul vl\] +** str p3, \[sp, #3, mul vl\] +** smstart sm +** ldr p0, \[sp\] +** ldr p1, \[sp, #1, mul vl\] +** ldr p2, \[sp, #2, mul vl\] +** ldr p3, \[sp, #3, mul vl\] +** addvl sp, sp, #1 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_p3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3) +{ + asm (""); +} + +/* +** test_mixed: +** ... +** addvl sp, sp, #-3 +** str p0, \[sp\] +** str p1, \[sp, #1, mul vl\] +** str p2, \[sp, #2, mul vl\] +** str p3, \[sp, #3, mul vl\] +** str z3, \[sp, #1, mul vl\] +** str z7, \[sp, #2, mul vl\] +** stp q2, q6, \[sp, #?-32\]! +** fmov w10, s0 +** fmov x11, d1 +** fmov w12, s4 +** fmov x13, d5 +** smstart sm +** fmov s0, w10 +** fmov d1, x11 +** fmov s4, w12 +** fmov d5, x13 +** ldp q2, q6, \[sp\], #?32 +** ldr p0, \[sp\] +** ldr p1, \[sp, #1, mul vl\] +** ldr p2, \[sp, #2, mul vl\] +** ldr p3, \[sp, #3, mul vl\] +** ldr z3, \[sp, #1, mul vl\] +** ldr z7, \[sp, #2, mul vl\] +** addvl sp, sp, #3 +** smstop sm +** ... +*/ +[[arm::locally_streaming]] void +test_mixed (float s0, double d1, float32x4_t q2, svfloat32_t z3, + float s4, double d5, float64x2_t q6, svfloat64_t z7, + svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3) +{ + asm (""); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c new file mode 100644 index 00000000000..42adeb152e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c @@ -0,0 +1,145 @@ +// { dg-options "-O -fomit-frame-pointer" } +/* { dg-final { check-function-bodies "**" "" } } */ + +#include +#include + +/* +** test_d0: +** ... +** smstart sm +** ... +** fmov x10, d0 +** smstop sm +** fmov d0, x10 +** ... +** smstart sm +** ... +** smstop sm +** ... +*/ +void consume_d0 (double d0); + +__arm_locally_streaming void +test_d0 () +{ + asm (""); + consume_d0 (1.0); + asm (""); +} + +/* +** test_d7: +** ... +** fmov x10, d0 +** fmov x11, d1 +** fmov x12, d2 +** fmov x13, d3 +** fmov x14, d4 +** fmov x15, d5 +** fmov x16, d6 +** fmov x17, d7 +** smstop sm +** fmov d0, x10 +** fmov d1, x11 +** fmov d2, x12 +** fmov d3, x13 +** fmov d4, x14 +** fmov d5, x15 +** fmov d6, x16 +** fmov d7, x17 +** ... +*/ +void consume_d7 (double d0, double d1, double d2, double d3, + double d4, double d5, double d6, double d7); +__arm_locally_streaming void +test_d7 () +{ + asm (""); + consume_d7 (1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0); + asm (""); +} + +/* +** test_q7: +** ... +** stp q0, q1, \[sp, #?-128\]! +** stp q2, q3, \[sp, #?32\] +** stp q4, q5, \[sp, #?64\] +** stp q6, q7, \[sp, #?96\] +** smstop sm +** ldp q2, q3, \[sp, #?32\] +** ldp q4, q5, \[sp, #?64\] +** ldp q6, q7, \[sp, #?96\] +** ldp q0, q1, \[sp\], #?128 +** ... +*/ +void consume_q7 (int8x16x4_t q0, int8x16x4_t q4); + +__arm_locally_streaming void +test_q7 (int8x16x4_t *ptr) +{ + asm (""); + consume_q7 (ptr[0], ptr[1]); + asm (""); +} + +/* +** test_z7: +** ... +** addvl sp, sp, #-8 +** str z0, \[sp\] +** str z1, \[sp, #1, mul vl\] +** str z2, \[sp, #2, mul vl\] +** str z3, \[sp, #3, mul vl\] +** str z4, \[sp, #4, mul vl\] +** str z5, \[sp, #5, mul vl\] +** str z6, \[sp, #6, mul vl\] +** str z7, \[sp, #7, mul vl\] +** smstop sm +** ldr z0, \[sp\] +** ldr z1, \[sp, #1, mul vl\] +** ldr z2, \[sp, #2, mul vl\] +** ldr z3, \[sp, #3, mul vl\] +** ldr z4, \[sp, #4, mul vl\] +** ldr z5, \[sp, #5, mul vl\] +** ldr z6, \[sp, #6, mul vl\] +** ldr z7, \[sp, #7, mul vl\] +** addvl sp, sp, #8 +** ... +*/ +void consume_z7 (svint8x4_t z0, svint8x4_t z4); + +__arm_locally_streaming void +test_z7 (svint8x4_t *ptr1, svint8x4_t *ptr2) +{ + asm (""); + consume_z7 (*ptr1, *ptr2); + asm (""); +} + +/* +** test_p3: +** ... +** addvl sp, sp, #-1 +** str p0, \[sp\] +** str p1, \[sp, #1, mul vl\] +** str p2, \[sp, #2, mul vl\] +** str p3, \[sp, #3, mul vl\] +** smstop sm +** ldr p0, \[sp\] +** ldr p1, \[sp, #1, mul vl\] +** ldr p2, \[sp, #2, mul vl\] +** ldr p3, \[sp, #3, mul vl\] +** addvl sp, sp, #1 +** ... +*/ +void consume_p3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3); + +__arm_locally_streaming void +test_p3 (svbool_t *ptr1, svbool_t *ptr2, svbool_t *ptr3, svbool_t *ptr4) +{ + asm (""); + consume_p3 (*ptr1, *ptr2, *ptr3, *ptr4); + asm (""); +} From patchwork Fri Nov 17 17:30:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865167 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3p85Gjyz1yRV for ; Sat, 18 Nov 2023 04:30:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BB493857B97 for ; Fri, 17 Nov 2023 17:30:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D443F3858404 for ; Fri, 17 Nov 2023 17:30:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D443F3858404 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D443F3858404 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242227; cv=none; b=xH8bVQqQFC2K1vxrvtCz9cybJ0H+udLu+E1GJqvEv8tu0pi6Pu3wuzG1WCp84ZKYqWGgnt8pwBTkSNt8X3a8U8JsUzzv5Epi9rm5mGyz7iNRc1FvgJ6oughtGUuQqCBzmOeivpOXW2eubRApm+P9Tt6n4jSAGvv2QblsemeEuYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242227; c=relaxed/simple; bh=qA3sGqAbcuiD9gs7b9ezDfSqH0Rb6n9wKcYveUGY3ko=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=UGfBfVwmiLbiZtQS5aXpoSZTUQjcyBwzeNC/GCRu9o92FwmqRAxiHirIuaV5lKf4VwfyUVLu2NHEmZpPBS1g8PBTeTWmctMAImyCAn0U7/KNZj4Hm3LkMmZbJegmiu/tc18dLIPShOI+A4OVe87gTLQg2YMDGPgh6a6/Rc/7br0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B05BC1477 for ; Fri, 17 Nov 2023 09:31:10 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 305063F73F for ; Fri, 17 Nov 2023 09:30:24 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 19/21] aarch64: Handle PSTATE.SM across abnormal edges References: Date: Fri, 17 Nov 2023 17:30:22 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org PSTATE.SM is always off on entry to an exception handler, and on entry to a nonlocal goto receiver. Those entry points need to switch PSTATE.SM back to the appropriate state for the current function. In the case of streaming-compatible functions, they need to restore the mode that the caller was originally using. The requirement on nonlocal goto receivers means that nonlocal jumps need to ensure that PSTATE.SM is zero. gcc/ * config/aarch64/aarch64.cc: Include except.h (aarch64_sme_mode_switch_regs::add_call_preserved_reg): New function. (aarch64_sme_mode_switch_regs::add_call_preserved_regs): Likewise. (aarch64_need_old_pstate_sm): Return true if the function has a nonlocal-goto or exception receiver. (aarch64_switch_pstate_sm_for_landing_pad): New function. (aarch64_switch_pstate_sm_for_jump): Likewise. (pass_switch_pstate_sm::gate): Enable the pass for all streaming and streaming-compatible functions. (pass_switch_pstate_sm::execute): Handle non-local gotos and their receivers. Handle exception handler entry points. gcc/testsuite/ * g++.target/aarch64/sme/exceptions_2.C: New test. * gcc.target/aarch64/sme/nonlocal_goto_1.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_4.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_5.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_6.c: Likewise. * gcc.target/aarch64/sme/nonlocal_goto_7.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 141 ++++++++++++++++- .../g++.target/aarch64/sme/exceptions_2.C | 148 ++++++++++++++++++ .../gcc.target/aarch64/sme/nonlocal_goto_1.c | 58 +++++++ .../gcc.target/aarch64/sme/nonlocal_goto_2.c | 44 ++++++ .../gcc.target/aarch64/sme/nonlocal_goto_3.c | 46 ++++++ .../gcc.target/aarch64/sme/nonlocal_goto_4.c | 25 +++ .../gcc.target/aarch64/sme/nonlocal_goto_5.c | 26 +++ .../gcc.target/aarch64/sme/nonlocal_goto_6.c | 31 ++++ .../gcc.target/aarch64/sme/nonlocal_goto_7.c | 25 +++ 9 files changed, 537 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.target/aarch64/sme/exceptions_2.C create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_7.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 6ad29a3a84f..340aa438d49 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -85,6 +85,7 @@ #include "config/arm/aarch-common.h" #include "config/arm/aarch-common-protos.h" #include "ssa.h" +#include "except.h" #include "tree-pass.h" #include "cfgbuild.h" @@ -7132,6 +7133,8 @@ public: void add_reg (machine_mode, unsigned int); void add_call_args (rtx_call_insn *); void add_call_result (rtx_call_insn *); + void add_call_preserved_reg (unsigned int); + void add_call_preserved_regs (bitmap); void emit_prologue (); void emit_epilogue (); @@ -7264,6 +7267,46 @@ aarch64_sme_mode_switch_regs::add_call_result (rtx_call_insn *call_insn) add_reg (GET_MODE (dest), REGNO (dest)); } +/* REGNO is a register that is call-preserved under the current function's ABI. + Record that it must be preserved around the mode switch. */ + +void +aarch64_sme_mode_switch_regs::add_call_preserved_reg (unsigned int regno) +{ + if (FP_REGNUM_P (regno)) + switch (crtl->abi->id ()) + { + case ARM_PCS_SVE: + add_reg (VNx16QImode, regno); + break; + case ARM_PCS_SIMD: + add_reg (V16QImode, regno); + break; + case ARM_PCS_AAPCS64: + add_reg (DImode, regno); + break; + default: + gcc_unreachable (); + } + else if (PR_REGNUM_P (regno)) + add_reg (VNx16BImode, regno); +} + +/* The hard registers in REGS are call-preserved under the current function's + ABI. Record that they must be preserved around the mode switch. */ + +void +aarch64_sme_mode_switch_regs::add_call_preserved_regs (bitmap regs) +{ + bitmap_iterator bi; + unsigned int regno; + EXECUTE_IF_SET_IN_BITMAP (regs, 0, regno, bi) + if (HARD_REGISTER_NUM_P (regno)) + add_call_preserved_reg (regno); + else + break; +} + /* Emit code to save registers before the mode switch. */ void @@ -9798,6 +9841,23 @@ aarch64_need_old_pstate_sm () if (aarch64_cfun_enables_pstate_sm ()) return true; + /* Non-local goto receivers are entered with PSTATE.SM equal to 0, + but the function needs to return with PSTATE.SM unchanged. */ + if (nonlocal_goto_handler_labels) + return true; + + /* Likewise for exception handlers. */ + eh_landing_pad lp; + for (unsigned int i = 1; vec_safe_iterate (cfun->eh->lp_array, i, &lp); ++i) + if (lp && lp->post_landing_pad) + return true; + + /* Non-local gotos need to set PSTATE.SM to zero. It's possible to call + streaming-compatible functions without SME being available, so PSTATE.SM + should only be changed if it is currently set to one. */ + if (crtl->has_nonlocal_goto) + return true; + if (cfun->machine->call_switches_pstate_sm) for (auto insn = get_insns (); insn; insn = NEXT_INSN (insn)) if (auto *call = dyn_cast (insn)) @@ -30682,6 +30742,59 @@ aarch64_md_asm_adjust (vec &outputs, vec &inputs, return seq; } +/* BB is the target of an exception or nonlocal goto edge, which means + that PSTATE.SM is known to be 0 on entry. Put it into the state that + the current function requires. */ + +static bool +aarch64_switch_pstate_sm_for_landing_pad (basic_block bb) +{ + if (TARGET_NON_STREAMING) + return false; + + start_sequence (); + rtx_insn *guard_label = nullptr; + if (TARGET_STREAMING_COMPATIBLE) + guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, + AARCH64_FL_SM_OFF); + aarch64_sme_mode_switch_regs args_switch; + args_switch.add_call_preserved_regs (df_get_live_in (bb)); + args_switch.emit_prologue (); + aarch64_switch_pstate_sm (AARCH64_FL_SM_OFF, AARCH64_FL_SM_ON); + args_switch.emit_epilogue (); + if (guard_label) + emit_label (guard_label); + auto seq = get_insns (); + end_sequence (); + + emit_insn_after (seq, bb_note (bb)); + return true; +} + +/* JUMP is a nonlocal goto. Its target requires PSTATE.SM to be 0 on entry, + so arrange to make it so. */ + +static bool +aarch64_switch_pstate_sm_for_jump (rtx_insn *jump) +{ + if (TARGET_NON_STREAMING) + return false; + + start_sequence (); + rtx_insn *guard_label = nullptr; + if (TARGET_STREAMING_COMPATIBLE) + guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, + AARCH64_FL_SM_OFF); + aarch64_switch_pstate_sm (AARCH64_FL_SM_ON, AARCH64_FL_SM_OFF); + if (guard_label) + emit_label (guard_label); + auto seq = get_insns (); + end_sequence (); + + emit_insn_before (seq, jump); + return true; +} + /* If CALL involves a change in PSTATE.SM, emit the instructions needed to switch to the new mode and the instructions needed to restore the original mode. Return true if something changed. */ @@ -30765,9 +30878,10 @@ public: }; bool -pass_switch_pstate_sm::gate (function *) +pass_switch_pstate_sm::gate (function *fn) { - return cfun->machine->call_switches_pstate_sm; + return (aarch64_fndecl_pstate_sm (fn->decl) != AARCH64_FL_SM_OFF + || cfun->machine->call_switches_pstate_sm); } /* Emit any instructions needed to switch PSTATE.SM. */ @@ -30780,11 +30894,24 @@ pass_switch_pstate_sm::execute (function *fn) bitmap_clear (blocks); FOR_EACH_BB_FN (bb, fn) { - rtx_insn *insn; - FOR_BB_INSNS (bb, insn) - if (auto *call = dyn_cast (insn)) - if (aarch64_switch_pstate_sm_for_call (call)) - bitmap_set_bit (blocks, bb->index); + if (has_abnormal_call_or_eh_pred_edge_p (bb) + && aarch64_switch_pstate_sm_for_landing_pad (bb)) + bitmap_set_bit (blocks, bb->index); + + if (cfun->machine->call_switches_pstate_sm) + { + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) + if (auto *call = dyn_cast (insn)) + if (aarch64_switch_pstate_sm_for_call (call)) + bitmap_set_bit (blocks, bb->index); + } + + auto end = BB_END (bb); + if (JUMP_P (end) + && find_reg_note (end, REG_NON_LOCAL_GOTO, NULL_RTX) + && aarch64_switch_pstate_sm_for_jump (end)) + bitmap_set_bit (blocks, bb->index); } find_many_sub_basic_blocks (blocks); clear_aux_for_blocks (); diff --git a/gcc/testsuite/g++.target/aarch64/sme/exceptions_2.C b/gcc/testsuite/g++.target/aarch64/sme/exceptions_2.C new file mode 100644 index 00000000000..f791b6ecc54 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/sme/exceptions_2.C @@ -0,0 +1,148 @@ +// { dg-options "-O -fno-optimize-sibling-calls" } +// { dg-final { check-function-bodies "**" "" } } + +void n_callee(); +void s_callee() __arm_streaming; +void sc_callee() __arm_streaming_compatible; + +void n_callee_ne() noexcept; +void s_callee_ne() noexcept __arm_streaming; +void sc_callee_ne() noexcept __arm_streaming_compatible; + +void n_caller1() +{ + try + { + n_callee(); + sc_callee(); + } + catch (...) + { + n_callee_ne(); + sc_callee_ne(); + } +} +// { dg-final { scan-assembler {_Z9n_caller1v:(?:(?!smstart|smstop).)*\tret} } } + +/* +** _Z9n_caller2v: +** ... +** cntd (x[0-9]+) +** str \1, [^\n]+ +** ... +** bl __cxa_begin_catch +** smstart sm +** bl _Z11s_callee_nev +** smstop sm +** bl __cxa_end_catch +** ... +*/ +void n_caller2() +{ + try + { + n_callee(); + sc_callee(); + } + catch (...) + { + s_callee_ne(); + } +} + +/* +** _Z9s_caller1v: +** ... +** bl __cxa_end_catch +** smstart sm +** ... +*/ +int s_caller1() __arm_streaming +{ + try + { + s_callee(); + return 1; + } + catch (...) + { + return 2; + } +} + +/* +** _Z9s_caller2v: +** ... +** bl __cxa_begin_catch +** smstart sm +** bl _Z11s_callee_nev +** smstop sm +** bl __cxa_end_catch +** smstart sm +** ... +*/ +int s_caller2() __arm_streaming +{ + try + { + n_callee(); + return 1; + } + catch (...) + { + s_callee_ne(); + return 2; + } +} + +/* +** _Z10sc_caller1v: +** ... +** cntd (x[0-9]+) +** str \1, [^\n]+ +** mrs (x[0-9]+), svcr +** str \2, ([^\n]+) +** ... +** bl __cxa_end_catch +** ldr (x[0-9]+), \3 +** tbz \4, 0, [^\n]+ +** smstart sm +** ... +*/ +int sc_caller1() __arm_streaming_compatible +{ + try + { + sc_callee(); + return 1; + } + catch (...) + { + return 2; + } +} + +/* +** _Z10ls_caller1v: +** ... +** cntd (x[0-9]+) +** str \1, [^\n]+ +** ... +** bl __cxa_begin_catch +** smstart sm +** bl _Z12sc_callee_nev +** smstop sm +** bl __cxa_end_catch +** ... +*/ +__arm_locally_streaming void ls_caller1() +{ + try + { + sc_callee(); + } + catch (...) + { + sc_callee_ne(); + } +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_1.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_1.c new file mode 100644 index 00000000000..4e3869fcc9e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_1.c @@ -0,0 +1,58 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)()); + +/* +** foo: +** ... +** mrs x16, svcr +** ... +** str x16, (.*) +** ... +** ldr x16, \1 +** tbz x16, 0, .* +** smstop sm +** bl __clear_cache +** ldr x16, \1 +** tbz x16, 0, .* +** smstart sm +** add x0, .* +** ldr x16, \1 +** tbz x16, 0, .* +** smstop sm +** bl run +** ldr x16, \1 +** tbz x16, 0, .* +** smstart sm +** mov w0, 1 +** ... +** ret +** ldr x16, \1 +** tbz x16, 0, .* +** smstart sm +** mov w0, 0 +** ... +*/ +int +foo (int *ptr) __arm_streaming_compatible +{ + __label__ failure; + + void bar () { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler {\tstp\tx19, x20,} } } +// { dg-final { scan-assembler {\tstp\tx21, x22,} } } +// { dg-final { scan-assembler {\tstp\tx23, x24,} } } +// { dg-final { scan-assembler {\tstp\tx25, x26,} } } +// { dg-final { scan-assembler {\tstp\tx27, x28,} } } +// { dg-final { scan-assembler {\tstp\td8, d9,} } } +// { dg-final { scan-assembler {\tstp\td10, d11,} } } +// { dg-final { scan-assembler {\tstp\td12, d13,} } } +// { dg-final { scan-assembler {\tstp\td14, d15,} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_2.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_2.c new file mode 100644 index 00000000000..2a2db72c3a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_2.c @@ -0,0 +1,44 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)()); + +/* +** foo: +** ... +** smstop sm +** bl __clear_cache +** smstart sm +** add x0, .* +** smstop sm +** bl run +** smstart sm +** mov w0, 1 +** ... +** ret +** smstart sm +** mov w0, 0 +** ... +*/ +int +foo (int *ptr) __arm_streaming +{ + __label__ failure; + + void bar () { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler {\tstp\tx19, x20,} } } +// { dg-final { scan-assembler {\tstp\tx21, x22,} } } +// { dg-final { scan-assembler {\tstp\tx23, x24,} } } +// { dg-final { scan-assembler {\tstp\tx25, x26,} } } +// { dg-final { scan-assembler {\tstp\tx27, x28,} } } +// { dg-final { scan-assembler {\tstp\td8, d9,} } } +// { dg-final { scan-assembler {\tstp\td10, d11,} } } +// { dg-final { scan-assembler {\tstp\td12, d13,} } } +// { dg-final { scan-assembler {\tstp\td14, d15,} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_3.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_3.c new file mode 100644 index 00000000000..022b04052c5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_3.c @@ -0,0 +1,46 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)()); + +/* +** foo: +** ... +** smstart sm +** ... +** smstop sm +** bl __clear_cache +** smstart sm +** add x0, .* +** smstop sm +** bl run +** smstart sm +** mov w0, 1 +** ... +** smstart sm +** mov w0, 0 +** smstop sm +** ... +*/ +__arm_locally_streaming int +foo (int *ptr) +{ + __label__ failure; + + void bar () { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler {\tstp\tx19, x20,} } } +// { dg-final { scan-assembler {\tstp\tx21, x22,} } } +// { dg-final { scan-assembler {\tstp\tx23, x24,} } } +// { dg-final { scan-assembler {\tstp\tx25, x26,} } } +// { dg-final { scan-assembler {\tstp\tx27, x28,} } } +// { dg-final { scan-assembler {\tstp\td8, d9,} } } +// { dg-final { scan-assembler {\tstp\td10, d11,} } } +// { dg-final { scan-assembler {\tstp\td12, d13,} } } +// { dg-final { scan-assembler {\tstp\td14, d15,} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c new file mode 100644 index 00000000000..0446076286b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c @@ -0,0 +1,25 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)()); + +/* +** bar.0: +** ... +** smstart sm +** ... +** smstop sm +** br x[0-9]+ +*/ +int +foo (int *ptr) +{ + __label__ failure; + + __arm_locally_streaming void bar () { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c new file mode 100644 index 00000000000..4246aec8b2f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c @@ -0,0 +1,26 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)() __arm_streaming); + +/* +** bar.0: +** ... +** smstop sm +** br x[0-9]+ +*/ +int +foo (int *ptr) +{ + __label__ failure; + + void bar () __arm_streaming { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler-not {smstart\t} } } +// { dg-final { scan-assembler-not {mrs\t} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c new file mode 100644 index 00000000000..151e2f22dc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c @@ -0,0 +1,31 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +void run(void (*)() __arm_streaming_compatible); + +/* +** bar.0: +** ... +** mrs x16, svcr +** ... +** str x16, (.*) +** ... +** ldr x16, \1 +** tbz x16, 0, .* +** smstop sm +** br x[0-9]+ +*/ +int +foo (int *ptr) +{ + __label__ failure; + + void bar () __arm_streaming_compatible { *ptr += 1; goto failure; } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler-not {smstart\t} } } diff --git a/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_7.c b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_7.c new file mode 100644 index 00000000000..9cc3ad5d236 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_7.c @@ -0,0 +1,25 @@ +/* { dg-options "-O2 -fno-schedule-insns -fno-schedule-insns2" } */ + +void run(void (*)() __arm_inout("za")); +void callee () __arm_inout("za"); + +int +foo (int *ptr) +{ + __label__ failure; + + void bar () __arm_inout("za") + { + callee (); + *ptr += 1; + goto failure; + } + run (bar); + return 1; + +failure: + return 0; +} + +// { dg-final { scan-assembler-not {\tsmstart\t} } } +// { dg-final { scan-assembler-not {\tsmstop\t} } } From patchwork Fri Nov 17 17:30:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865170 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3qN3ntBz1yRV for ; Sat, 18 Nov 2023 04:31:44 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 171BD3857C49 for ; Fri, 17 Nov 2023 17:31:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 64BE73857C69 for ; Fri, 17 Nov 2023 17:30:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64BE73857C69 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 64BE73857C69 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242241; cv=none; b=uHZZF+a9hIJ/J/HcxT8rYo8rXCxU3AGAmtCqQoJU2PLBJhxT6OdWDm+UmAeZaPp98ctGfFeGVhFH+6ivUg9XNkOC5wD9ybU2U07mXO9ZJk8jIsqJGvW7Mkr2WJwQNTbbso/nLqpmGqXQkxX0EiTwt5kUI4fmnOGqJsNgjUhLP+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242241; c=relaxed/simple; bh=ZQcPgkDAKSxSDiVQA/UybN6kjd2lSaORywc81b5tNJ0=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ITWxYb5qJtNrhmq+h4Sa3y2Re79bjtQEQBhD03Ur6FoMsjPIYnGWCrqWaxM5r1lcHi5vpCX4OxcVOMGl+6/cem0JqMOdXShyw8l0aunh5+L56dXqLphmtQC23aMQIycVTK+IF50jeCRxgORiDGklOzbxv9fH+qGnb0Kz0WvaDSg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 111C01477 for ; Fri, 17 Nov 2023 09:31:23 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8513B3F73F for ; Fri, 17 Nov 2023 09:30:36 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 20/21] aarch64: Enforce inlining restrictions for SME References: Date: Fri, 17 Nov 2023 17:30:35 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org A function that has local ZA state cannot be inlined into its caller, since we only support managing ZA switches at function scope. A function whose body directly clobbers ZA state cannot be inlined into a function with ZA state. A function whose body requires a particular PSTATE.SM setting can only be inlined into a function body that guarantees that PSTATE.SM setting. The callee's function type doesn't matter here: one locally-streaming function can be inlined into another. gcc/ * config/aarch64/aarch64.cc: Include symbol-summary.h, ipa-prop.h, and ipa-fnsummary.h (aarch64_function_attribute_inlinable_p): New function. (AARCH64_IPA_SM_FIXED, AARCH64_IPA_CLOBBERS_ZA): New constants. (aarch64_need_ipa_fn_target_info): New function. (aarch64_update_ipa_fn_target_info): Likewise. (aarch64_can_inline_p): Restrict the previous ISA flag checks to non-modal features. Prevent callees that require a particular PSTATE.SM state from being inlined into callers that can't guarantee that state. Also prevent callees that have ZA state from being inlined into callers that don't. Finally, prevent callees that clobber ZA from being inlined into callers that have ZA state. (TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define. (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. gcc/testsuite/ * gcc.target/aarch64/sme/inlining_1.c: New test. * gcc.target/aarch64/sme/inlining_2.c: Likewise. * gcc.target/aarch64/sme/inlining_3.c: Likewise. * gcc.target/aarch64/sme/inlining_4.c: Likewise. * gcc.target/aarch64/sme/inlining_5.c: Likewise. * gcc.target/aarch64/sme/inlining_6.c: Likewise. * gcc.target/aarch64/sme/inlining_7.c: Likewise. * gcc.target/aarch64/sme/inlining_8.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 132 +++++++++++++++++- .../gcc.target/aarch64/sme/inlining_1.c | 47 +++++++ .../gcc.target/aarch64/sme/inlining_10.c | 57 ++++++++ .../gcc.target/aarch64/sme/inlining_11.c | 57 ++++++++ .../gcc.target/aarch64/sme/inlining_12.c | 15 ++ .../gcc.target/aarch64/sme/inlining_13.c | 15 ++ .../gcc.target/aarch64/sme/inlining_14.c | 15 ++ .../gcc.target/aarch64/sme/inlining_15.c | 27 ++++ .../gcc.target/aarch64/sme/inlining_2.c | 47 +++++++ .../gcc.target/aarch64/sme/inlining_3.c | 47 +++++++ .../gcc.target/aarch64/sme/inlining_4.c | 47 +++++++ .../gcc.target/aarch64/sme/inlining_5.c | 47 +++++++ .../gcc.target/aarch64/sme/inlining_6.c | 31 ++++ .../gcc.target/aarch64/sme/inlining_7.c | 31 ++++ .../gcc.target/aarch64/sme/inlining_8.c | 31 ++++ .../gcc.target/aarch64/sme/inlining_9.c | 55 ++++++++ 16 files changed, 696 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_10.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_11.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_12.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_13.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_14.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_15.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_9.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 340aa438d49..6fa77d79dd7 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -88,6 +88,9 @@ #include "except.h" #include "tree-pass.h" #include "cfgbuild.h" +#include "symbol-summary.h" +#include "ipa-prop.h" +#include "ipa-fnsummary.h" /* This file should be included last. */ #include "target-def.h" @@ -21533,6 +21536,17 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int) return ret; } +/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P. Use an opt-out + rather than an opt-in list. */ + +static bool +aarch64_function_attribute_inlinable_p (const_tree fndecl) +{ + /* A function that has local ZA state cannot be inlined into its caller, + since we only support managing ZA switches at function scope. */ + return !aarch64_fndecl_has_new_state (fndecl, "za"); +} + /* Helper for aarch64_can_inline_p. In the case where CALLER and CALLEE are tri-bool options (yes, no, don't care) and the default value is DEF, determine whether to reject inlining. */ @@ -21554,6 +21568,60 @@ aarch64_tribools_ok_for_inlining_p (int caller, int callee, return (callee == caller || callee == def); } +/* Bit allocations for ipa_fn_summary::target_info. */ + +/* Set if the function contains a stmt that relies on the function's + choice of PSTATE.SM setting (0 for non-streaming, 1 for streaming). + Not meaningful for streaming-compatible functions. */ +constexpr auto AARCH64_IPA_SM_FIXED = 1U << 0; + +/* Set if the function clobbers ZA. Not meaningful for functions that + have ZA state. */ +constexpr auto AARCH64_IPA_CLOBBERS_ZA = 1U << 1; + +/* Implement TARGET_NEED_IPA_FN_TARGET_INFO. */ + +static bool +aarch64_need_ipa_fn_target_info (const_tree, unsigned int &) +{ + /* We could in principle skip this for streaming-compatible functions + that have ZA state, but that's a rare combination. */ + return true; +} + +/* Implement TARGET_UPDATE_IPA_FN_TARGET_INFO. */ + +static bool +aarch64_update_ipa_fn_target_info (unsigned int &info, const gimple *stmt) +{ + if (auto *ga = dyn_cast (stmt)) + { + /* We don't know what the asm does, so conservatively assume that + it requires the function's current SM mode. */ + info |= AARCH64_IPA_SM_FIXED; + for (unsigned int i = 0; i < gimple_asm_nclobbers (ga); ++i) + { + tree op = gimple_asm_clobber_op (ga, i); + const char *clobber = TREE_STRING_POINTER (TREE_VALUE (op)); + if (strcmp (clobber, "za") == 0) + info |= AARCH64_IPA_CLOBBERS_ZA; + } + } + if (auto *call = dyn_cast (stmt)) + { + if (gimple_call_builtin_p (call, BUILT_IN_MD)) + { + /* The attributes on AArch64 builtins are supposed to be accurate. + If the function isn't marked streaming-compatible then it + needs whichever SM mode it selects. */ + tree decl = gimple_call_fndecl (call); + if (aarch64_fndecl_pstate_sm (decl) != 0) + info |= AARCH64_IPA_SM_FIXED; + } + } + return true; +} + /* Implement TARGET_CAN_INLINE_P. Decide whether it is valid to inline CALLEE into CALLER based on target-specific info. Make sure that the caller and callee have compatible architectural @@ -21576,12 +21644,56 @@ aarch64_can_inline_p (tree caller, tree callee) : target_option_default_node); /* Callee's ISA flags should be a subset of the caller's. */ - if ((caller_opts->x_aarch64_asm_isa_flags - & callee_opts->x_aarch64_asm_isa_flags) - != callee_opts->x_aarch64_asm_isa_flags) + auto caller_asm_isa = (caller_opts->x_aarch64_asm_isa_flags + & ~AARCH64_FL_ISA_MODES); + auto callee_asm_isa = (callee_opts->x_aarch64_asm_isa_flags + & ~AARCH64_FL_ISA_MODES); + if (callee_asm_isa & ~caller_asm_isa) return false; - if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags) - != callee_opts->x_aarch64_isa_flags) + + auto caller_isa = (caller_opts->x_aarch64_isa_flags + & ~AARCH64_FL_ISA_MODES); + auto callee_isa = (callee_opts->x_aarch64_isa_flags + & ~AARCH64_FL_ISA_MODES); + if (callee_isa & ~caller_isa) + return false; + + /* Return true if the callee might have target_info property PROPERTY. + The answer must be true unless we have positive proof to the contrary. */ + auto callee_has_property = [&](unsigned int property) + { + if (ipa_fn_summaries) + if (auto *summary = ipa_fn_summaries->get (cgraph_node::get (callee))) + if (!(summary->target_info & property)) + return false; + return true; + }; + + /* Streaming-compatible code can be inlined into functions with any + PSTATE.SM mode. Otherwise the caller and callee must agree on + PSTATE.SM mode, unless we can prove that the callee is naturally + streaming-compatible. */ + auto caller_sm = (caller_opts->x_aarch64_isa_flags & AARCH64_FL_SM_STATE); + auto callee_sm = (callee_opts->x_aarch64_isa_flags & AARCH64_FL_SM_STATE); + if (callee_sm + && caller_sm != callee_sm + && callee_has_property (AARCH64_IPA_SM_FIXED)) + return false; + + /* aarch64_function_attribute_inlinable_p prevents new-ZA functions + from being inlined into others. We also need to prevent inlining + of shared-ZA functions into functions without ZA state, since this + is an error condition. + + The only other problematic case for ZA is inlining a function that + directly clobbers ZA into a function that has ZA state. */ + auto caller_za = (caller_opts->x_aarch64_isa_flags & AARCH64_FL_ZA_ON); + auto callee_za = (callee_opts->x_aarch64_isa_flags & AARCH64_FL_ZA_ON); + if (!caller_za && callee_za) + return false; + if (caller_za + && !callee_za + && callee_has_property (AARCH64_IPA_CLOBBERS_ZA)) return false; /* Allow non-strict aligned functions inlining into strict @@ -31119,6 +31231,16 @@ aarch64_run_selftests (void) #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE aarch64_can_eliminate +#undef TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P +#define TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P \ + aarch64_function_attribute_inlinable_p + +#undef TARGET_NEED_IPA_FN_TARGET_INFO +#define TARGET_NEED_IPA_FN_TARGET_INFO aarch64_need_ipa_fn_target_info + +#undef TARGET_UPDATE_IPA_FN_TARGET_INFO +#define TARGET_UPDATE_IPA_FN_TARGET_INFO aarch64_update_ipa_fn_target_info + #undef TARGET_CAN_INLINE_P #define TARGET_CAN_INLINE_P aarch64_can_inline_p diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c new file mode 100644 index 00000000000..24dc2b34187 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c @@ -0,0 +1,47 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +sc_callee () [[arm::streaming_compatible]] {} + +inline void __attribute__((always_inline)) +s_callee () [[arm::streaming]] {} + +inline void __attribute__((always_inline)) +n_callee () {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_callee () [[arm::streaming_compatible]] {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_callee () {} + +inline void __attribute__((always_inline)) +sc_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +inline void __attribute__((always_inline)) +s_asm_callee () [[arm::streaming]] { asm (""); } // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +n_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_asm_callee () [[arm::streaming_compatible]] { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +void +sc_caller () [[arm::streaming_compatible]] +{ + sc_callee (); + s_callee (); + n_callee (); + sc_ls_callee (); + n_ls_callee (); + + sc_asm_callee (); + s_asm_callee (); + n_asm_callee (); + sc_ls_asm_callee (); + n_ls_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_10.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_10.c new file mode 100644 index 00000000000..adfd45a872f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_10.c @@ -0,0 +1,57 @@ +/* { dg-options "" } */ + +#include +#include + +uint8x16_t *neon; +svint64_t *sve; +int64_t *ptr; + +// Gets expanded to addition early, so no error. An error would be +// more correct though. +inline void __attribute__((always_inline)) +call_vadd () +{ + neon[4] = vaddq_u8 (neon[5], neon[6]); +} + +inline void __attribute__((always_inline)) +call_vbsl () // { dg-error "inlining failed" } +{ + neon[0] = vbslq_u8 (neon[1], neon[2], neon[3]); +} + +inline void __attribute__((always_inline)) +call_svadd () +{ + *sve = svadd_x (svptrue_b8 (), *sve, 1); +} + +inline void __attribute__((always_inline)) +call_svld1_gather () // { dg-error "inlining failed" } +{ + *sve = svld1_gather_offset (svptrue_b8 (), ptr, *sve); +} + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za")]] +{ + svzero_za (); +} + +inline void __attribute__((always_inline)) +call_svst1_za () [[arm::streaming, arm::inout("za")]] // { dg-error "inlining failed" } +{ + svst1_ver_za64 (0, 0, svptrue_b8 (), ptr); +} + +void +sc_caller () [[arm::inout("za"), arm::streaming_compatible]] +{ + call_vadd (); + call_vbsl (); + call_svadd (); + call_svld1_gather (); + call_svzero (); + call_svst1_za (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_11.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_11.c new file mode 100644 index 00000000000..d05a92c1c24 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_11.c @@ -0,0 +1,57 @@ +/* { dg-options "" } */ + +#include +#include + +uint8x16_t *neon; +svint64_t *sve; +int64_t *ptr; + +// Gets expanded to addition early, so no error. An error would be +// more correct though. +inline void __attribute__((always_inline)) +call_vadd () +{ + neon[4] = vaddq_u8 (neon[5], neon[6]); +} + +inline void __attribute__((always_inline)) +call_vbsl () // { dg-error "inlining failed" } +{ + neon[0] = vbslq_u8 (neon[1], neon[2], neon[3]); +} + +inline void __attribute__((always_inline)) +call_svadd () +{ + *sve = svadd_x (svptrue_b8 (), *sve, 1); +} + +inline void __attribute__((always_inline)) +call_svld1_gather () // { dg-error "inlining failed" } +{ + *sve = svld1_gather_offset (svptrue_b8 (), ptr, *sve); +} + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za")]] +{ + svzero_za (); +} + +inline void __attribute__((always_inline)) +call_svst1_za () [[arm::streaming, arm::inout("za")]] +{ + svst1_ver_za64 (0, 0, svptrue_b8 (), ptr); +} + +void +sc_caller () [[arm::inout("za"), arm::streaming]] +{ + call_vadd (); + call_vbsl (); + call_svadd (); + call_svld1_gather (); + call_svzero (); + call_svst1_za (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_12.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_12.c new file mode 100644 index 00000000000..366f8b24ac2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_12.c @@ -0,0 +1,15 @@ +/* { dg-options "" } */ + +#include + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za"), arm::streaming_compatible]] // { dg-error "inlining failed" } +{ + svzero_za (); +} + +void +n_caller () +{ + call_svzero (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_13.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_13.c new file mode 100644 index 00000000000..bdbd7408c33 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_13.c @@ -0,0 +1,15 @@ +/* { dg-options "" } */ + +#include + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za"), arm::streaming_compatible]] // { dg-error "inlining failed" } +{ + svzero_za (); +} + +void +s_caller () +{ + call_svzero (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_14.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_14.c new file mode 100644 index 00000000000..0ce4384f642 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_14.c @@ -0,0 +1,15 @@ +/* { dg-options "" } */ + +#include + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za"), arm::streaming_compatible]] // { dg-error "inlining failed" } +{ + svzero_za (); +} + +void +sc_caller () +{ + call_svzero (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_15.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_15.c new file mode 100644 index 00000000000..06fc5d7f5e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_15.c @@ -0,0 +1,27 @@ +/* { dg-options "" } */ + +#include + +inline void +call_svzero () [[arm::inout("za"), arm::streaming_compatible]] +{ + svzero_za (); +} + +void +n_caller () +{ + call_svzero (); // { dg-error "call to a function that shares 'za' state from a function that has no 'za' state" } +} + +void +s_caller () +{ + call_svzero (); // { dg-error "call to a function that shares 'za' state from a function that has no 'za' state" } +} + +void +sc_caller () +{ + call_svzero (); // { dg-error "call to a function that shares 'za' state from a function that has no 'za' state" } +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c new file mode 100644 index 00000000000..ea2a57049cd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c @@ -0,0 +1,47 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +sc_callee () [[arm::streaming_compatible]] {} + +inline void __attribute__((always_inline)) +s_callee () [[arm::streaming]] {} + +inline void __attribute__((always_inline)) +n_callee () {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_callee () [[arm::streaming_compatible]] {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_callee () {} + +inline void __attribute__((always_inline)) +sc_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +inline void __attribute__((always_inline)) +s_asm_callee () [[arm::streaming]] { asm (""); } + +inline void __attribute__((always_inline)) +n_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_asm_callee () { asm (""); } + +void +s_caller () [[arm::streaming]] +{ + sc_callee (); + s_callee (); + n_callee (); + sc_ls_callee (); + n_ls_callee (); + + sc_asm_callee (); + s_asm_callee (); + n_asm_callee (); + sc_ls_asm_callee (); + n_ls_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c new file mode 100644 index 00000000000..d7ffb381985 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c @@ -0,0 +1,47 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +sc_callee () [[arm::streaming_compatible]] {} + +inline void __attribute__((always_inline)) +s_callee () [[arm::streaming]] {} + +inline void __attribute__((always_inline)) +n_callee () {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_callee () [[arm::streaming_compatible]] {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_callee () {} + +inline void __attribute__((always_inline)) +sc_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +inline void __attribute__((always_inline)) +s_asm_callee () [[arm::streaming]] { asm (""); } // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +n_asm_callee () { asm (""); } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_asm_callee () [[arm::streaming_compatible]] { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +void +n_caller () +{ + sc_callee (); + s_callee (); + n_callee (); + sc_ls_callee (); + n_ls_callee (); + + sc_asm_callee (); + s_asm_callee (); + n_asm_callee (); + sc_ls_asm_callee (); + n_ls_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c new file mode 100644 index 00000000000..78920372500 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c @@ -0,0 +1,47 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +sc_callee () [[arm::streaming_compatible]] {} + +inline void __attribute__((always_inline)) +s_callee () [[arm::streaming]] {} + +inline void __attribute__((always_inline)) +n_callee () {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_callee () [[arm::streaming_compatible]] {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_callee () {} + +inline void __attribute__((always_inline)) +sc_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +inline void __attribute__((always_inline)) +s_asm_callee () [[arm::streaming]] { asm (""); } + +inline void __attribute__((always_inline)) +n_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_asm_callee () { asm (""); } + +[[arm::locally_streaming]] void +sc_ls_caller () [[arm::streaming_compatible]] +{ + sc_callee (); + s_callee (); + n_callee (); + sc_ls_callee (); + n_ls_callee (); + + sc_asm_callee (); + s_asm_callee (); + n_asm_callee (); + sc_ls_asm_callee (); + n_ls_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c new file mode 100644 index 00000000000..d19cdc450d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c @@ -0,0 +1,47 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +sc_callee () [[arm::streaming_compatible]] {} + +inline void __attribute__((always_inline)) +s_callee () [[arm::streaming]] {} + +inline void __attribute__((always_inline)) +n_callee () {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_callee () [[arm::streaming_compatible]] {} + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_callee () {} + +inline void __attribute__((always_inline)) +sc_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +inline void __attribute__((always_inline)) +s_asm_callee () [[arm::streaming]] { asm (""); } + +inline void __attribute__((always_inline)) +n_asm_callee () { asm (""); } // { dg-error "inlining failed" } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +sc_ls_asm_callee () [[arm::streaming_compatible]] { asm (""); } + +[[arm::locally_streaming]] inline void __attribute__((always_inline)) +n_ls_asm_callee () { asm (""); } + +[[arm::locally_streaming]] void +n_ls_caller () +{ + sc_callee (); + s_callee (); + n_callee (); + sc_ls_callee (); + n_ls_callee (); + + sc_asm_callee (); + s_asm_callee (); + n_asm_callee (); + sc_ls_asm_callee (); + n_ls_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c new file mode 100644 index 00000000000..a5eb399f10a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c @@ -0,0 +1,31 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +shared_callee () [[arm::inout("za")]] {} + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_callee () {} // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_callee () {} + +inline void __attribute__((always_inline)) +shared_asm_callee () [[arm::inout("za")]] { asm volatile ("" ::: "za"); } + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_asm_callee () { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_asm_callee () { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +void +shared_caller () [[arm::inout("za")]] +{ + shared_callee (); + new_callee (); + normal_callee (); + + shared_asm_callee (); + new_asm_callee (); + normal_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c new file mode 100644 index 00000000000..0f046283f3d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c @@ -0,0 +1,31 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +shared_callee () [[arm::inout("za")]] {} + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_callee () {} // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_callee () {} + +inline void __attribute__((always_inline)) +shared_asm_callee () [[arm::inout("za")]] { asm volatile ("" ::: "za"); } + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_asm_callee () { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_asm_callee () { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +[[arm::new("za")]] void +new_caller () +{ + shared_callee (); + new_callee (); + normal_callee (); + + shared_asm_callee (); + new_asm_callee (); + normal_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c new file mode 100644 index 00000000000..fd8a3a61e59 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c @@ -0,0 +1,31 @@ +/* { dg-options "" } */ + +inline void __attribute__((always_inline)) +shared_callee () [[arm::inout("za")]] {} // { dg-error "inlining failed" } + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_callee () {} // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_callee () {} + +inline void __attribute__((always_inline)) +shared_asm_callee () [[arm::inout("za")]] { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +[[arm::new("za")]] inline void __attribute__((always_inline)) +new_asm_callee () { asm volatile ("" ::: "za"); } // { dg-error "inlining failed" } + +inline void __attribute__((always_inline)) +normal_asm_callee () { asm volatile ("" ::: "za"); } + +void +normal_caller () +{ + shared_callee (); + new_callee (); + normal_callee (); + + shared_asm_callee (); + new_asm_callee (); + normal_asm_callee (); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/inlining_9.c b/gcc/testsuite/gcc.target/aarch64/sme/inlining_9.c new file mode 100644 index 00000000000..91520e3787b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/inlining_9.c @@ -0,0 +1,55 @@ +/* { dg-options "" } */ + +#include +#include + +uint8x16_t *neon; +svint64_t *sve; +int64_t *ptr; + +inline void __attribute__((always_inline)) +call_vadd () +{ + neon[4] = vaddq_u8 (neon[5], neon[6]); +} + +inline void __attribute__((always_inline)) +call_vbsl () +{ + neon[0] = vbslq_u8 (neon[1], neon[2], neon[3]); +} + +inline void __attribute__((always_inline)) +call_svadd () +{ + *sve = svadd_x (svptrue_b8 (), *sve, 1); +} + +inline void __attribute__((always_inline)) +call_svld1_gather () +{ + *sve = svld1_gather_offset (svptrue_b8 (), ptr, *sve); +} + +inline void __attribute__((always_inline)) +call_svzero () [[arm::inout("za")]] +{ + svzero_za (); +} + +inline void __attribute__((always_inline)) +call_svst1_za () [[arm::streaming, arm::inout("za")]] // { dg-error "inlining failed" } +{ + svst1_ver_za64 (0, 0, svptrue_b8 (), ptr); +} + +void +n_caller () [[arm::inout("za")]] +{ + call_vadd (); + call_vbsl (); + call_svadd (); + call_svld1_gather (); + call_svzero (); + call_svst1_za (); +} From patchwork Fri Nov 17 17:30:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1865169 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SX3pd4KdBz1yRV for ; Sat, 18 Nov 2023 04:31:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BABAD3858418 for ; Fri, 17 Nov 2023 17:31:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0BDCF3858283 for ; Fri, 17 Nov 2023 17:30:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0BDCF3858283 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0BDCF3858283 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242252; cv=none; b=HgP2wX3jFLjSbsyQjJOw20tsR/mxe/LMHusbiq8Yw1en360BjXdP8hon6FUC4DMw8x+k3bWKYRQdzHeE42kb3+7XQASltGIJkv5k6lyHveeYzElR338WRzljKQwEJtprjXq7FBfiW0PjK6YtTS/KTHBy5vKLc10FRZ3bjyY/9iI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700242252; c=relaxed/simple; bh=5Iyk9tadpfQRkPaIAtGFBCZzGNefaf1rv+PzQRE5DZQ=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=oTuhlTEwSOLgyn6W7n4IDv8H65vWS4F4z0be7diX4SoeeBF29OJ34YkVsCvfILF5no4GyUKbWoiM3LVvTgVopTgE2bkhrVLpPrd7ZWoWBy++3LXa1NeyRF62XTS2xMSsWKScUyqhijhL5AGekDJg4q8fVgporkaUpA1ljcPiwW4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BC4521477 for ; Fri, 17 Nov 2023 09:31:35 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4EB243F73F for ; Fri, 17 Nov 2023 09:30:49 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 21/21] aarch64: Update sibcall handling for SME References: Date: Fri, 17 Nov 2023 17:30:48 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 17 Nov 2023 17:23:28 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org We only support tail calls between functions with the same PSTATE.ZA setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA"). Only a normal non-streaming function can tail-call another non-streaming function, and only a streaming function can tail-call another streaming function. Any function can tail-call a streaming-compatible function. gcc/ * config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall): Enforce PSTATE.SM and PSTATE.ZA restrictions. (aarch64_expand_epilogue): Save and restore the arguments to a sibcall around any change to PSTATE.SM. gcc/testsuite/ * gcc.target/aarch64/sme/sibcall_1.c: New test. * gcc.target/aarch64/sme/sibcall_2.c: Likewise. * gcc.target/aarch64/sme/sibcall_3.c: Likewise. * gcc.target/aarch64/sme/sibcall_4.c: Likewise. * gcc.target/aarch64/sme/sibcall_5.c: Likewise. * gcc.target/aarch64/sme/sibcall_6.c: Likewise. * gcc.target/aarch64/sme/sibcall_7.c: Likewise. * gcc.target/aarch64/sme/sibcall_8.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 9 +++- .../gcc.target/aarch64/sme/sibcall_1.c | 45 +++++++++++++++++++ .../gcc.target/aarch64/sme/sibcall_2.c | 45 +++++++++++++++++++ .../gcc.target/aarch64/sme/sibcall_3.c | 45 +++++++++++++++++++ .../gcc.target/aarch64/sme/sibcall_4.c | 45 +++++++++++++++++++ .../gcc.target/aarch64/sme/sibcall_5.c | 45 +++++++++++++++++++ .../gcc.target/aarch64/sme/sibcall_6.c | 26 +++++++++++ .../gcc.target/aarch64/sme/sibcall_7.c | 26 +++++++++++ .../gcc.target/aarch64/sme/sibcall_8.c | 19 ++++++++ 9 files changed, 304 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 6fa77d79dd7..c8f99d5c991 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8498,6 +8498,11 @@ aarch64_function_ok_for_sibcall (tree, tree exp) if (crtl->abi->id () != expr_callee_abi (exp).id ()) return false; + tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp))); + if (aarch64_fntype_pstate_sm (fntype) & ~aarch64_cfun_incoming_pstate_sm ()) + return false; + if (aarch64_fntype_pstate_za (fntype) != aarch64_cfun_incoming_pstate_za ()) + return false; return true; } @@ -11950,7 +11955,9 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall) guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM, aarch64_isa_flags); aarch64_sme_mode_switch_regs return_switch; - if (crtl->return_rtx && REG_P (crtl->return_rtx)) + if (sibcall) + return_switch.add_call_args (sibcall); + else if (crtl->return_rtx && REG_P (crtl->return_rtx)) return_switch.add_reg (GET_MODE (crtl->return_rtx), REGNO (crtl->return_rtx)); return_switch.emit_prologue (); diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c new file mode 100644 index 00000000000..c7530de5c37 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c @@ -0,0 +1,45 @@ +/* { dg-options "-O2" } */ + +void sc_callee () [[arm::streaming_compatible]]; +void s_callee () [[arm::streaming]]; +void n_callee (); + +[[arm::locally_streaming]] __attribute__((noipa)) void +sc_ls_callee () [[arm::streaming_compatible]] {} +[[arm::locally_streaming]] __attribute__((noipa)) void +n_ls_callee () {} + +void +sc_to_sc () [[arm::streaming_compatible]] +{ + sc_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_callee} } } */ + +void +sc_to_s () [[arm::streaming_compatible]] +{ + s_callee (); +} +/* { dg-final { scan-assembler {\tbl\ts_callee} } } */ + +void +sc_to_n () [[arm::streaming_compatible]] +{ + n_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_callee} } } */ + +void +sc_to_sc_ls () [[arm::streaming_compatible]] +{ + sc_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */ + +void +sc_to_n_ls () [[arm::streaming_compatible]] +{ + n_ls_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_ls_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c new file mode 100644 index 00000000000..8d1c8a9f901 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c @@ -0,0 +1,45 @@ +/* { dg-options "-O2" } */ + +void sc_callee () [[arm::streaming_compatible]]; +void s_callee () [[arm::streaming]]; +void n_callee (); + +[[arm::locally_streaming]] __attribute__((noipa)) void +sc_ls_callee () [[arm::streaming_compatible]] {} +[[arm::locally_streaming]] __attribute__((noipa)) void +n_ls_callee () {} + +void +s_to_sc () [[arm::streaming]] +{ + sc_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_callee} } } */ + +void +s_to_s () [[arm::streaming]] +{ + s_callee (); +} +/* { dg-final { scan-assembler {\tb\ts_callee} } } */ + +void +s_to_n () [[arm::streaming]] +{ + n_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_callee} } } */ + +void +s_to_sc_ls () [[arm::streaming]] +{ + sc_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */ + +void +s_to_n_ls () [[arm::streaming]] +{ + n_ls_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_ls_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c new file mode 100644 index 00000000000..2ae937fc5dc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c @@ -0,0 +1,45 @@ +/* { dg-options "-O2" } */ + +void sc_callee () [[arm::streaming_compatible]]; +void s_callee () [[arm::streaming]]; +void n_callee (); + +[[arm::locally_streaming]] __attribute__((noipa)) void +sc_ls_callee () [[arm::streaming_compatible]] {} +[[arm::locally_streaming]] __attribute__((noipa)) void +n_ls_callee () {} + +void +n_to_sc () +{ + sc_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_callee} } } */ + +void +n_to_s () +{ + s_callee (); +} +/* { dg-final { scan-assembler {\tbl\ts_callee} } } */ + +void +n_to_n () +{ + n_callee (); +} +/* { dg-final { scan-assembler {\tb\tn_callee} } } */ + +void +n_to_sc_ls () +{ + sc_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */ + +void +n_to_n_ls () +{ + n_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tn_ls_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c new file mode 100644 index 00000000000..6935a1bd740 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c @@ -0,0 +1,45 @@ +/* { dg-options "-O2" } */ + +void sc_callee () [[arm::streaming_compatible]]; +void s_callee () [[arm::streaming]]; +void n_callee (); + +[[arm::locally_streaming]] __attribute__((noipa)) void +sc_ls_callee () [[arm::streaming_compatible]] {} +[[arm::locally_streaming]] __attribute__((noipa)) void +n_ls_callee () {} + +[[arm::locally_streaming]] void +sc_to_sc () [[arm::streaming_compatible]] +{ + sc_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_callee} } } */ + +[[arm::locally_streaming]] void +sc_to_s () [[arm::streaming_compatible]] +{ + s_callee (); +} +/* { dg-final { scan-assembler {\tbl\ts_callee} } } */ + +[[arm::locally_streaming]] void +sc_to_n () [[arm::streaming_compatible]] +{ + n_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_callee} } } */ + +[[arm::locally_streaming]] void +sc_to_sc_ls () [[arm::streaming_compatible]] +{ + sc_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */ + +[[arm::locally_streaming]] void +sc_to_n_ls () [[arm::streaming_compatible]] +{ + n_ls_callee (); +} +/* { dg-final { scan-assembler {\tbl\tn_ls_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c new file mode 100644 index 00000000000..7aaf58dfa22 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c @@ -0,0 +1,45 @@ +/* { dg-options "-O2" } */ + +void sc_callee () [[arm::streaming_compatible]]; +void s_callee () [[arm::streaming]]; +void n_callee (); + +[[arm::locally_streaming]] __attribute__((noipa)) void +sc_ls_callee () [[arm::streaming_compatible]] {} +[[arm::locally_streaming]] __attribute__((noipa)) void +n_ls_callee () {} + +[[arm::locally_streaming]] void +n_to_sc () +{ + sc_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_callee} } } */ + +[[arm::locally_streaming]] void +n_to_s () +{ + s_callee (); +} +/* { dg-final { scan-assembler {\tbl\ts_callee} } } */ + +[[arm::locally_streaming]] void +n_to_n () +{ + n_callee (); +} +/* { dg-final { scan-assembler {\tb\tn_callee} } } */ + +[[arm::locally_streaming]] void +n_to_sc_ls () +{ + sc_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */ + +[[arm::locally_streaming]] void +n_to_n_ls () +{ + n_ls_callee (); +} +/* { dg-final { scan-assembler {\tb\tn_ls_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c new file mode 100644 index 00000000000..e568edb17dd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c @@ -0,0 +1,26 @@ +/* { dg-options "-O2" } */ + +void shared_callee () [[arm::inout("za")]]; +[[arm::new("za")]] __attribute__((noipa)) void new_callee () {} +void normal_callee (); + +void +shared_to_shared () [[arm::inout("za")]] +{ + shared_callee (); +} +/* { dg-final { scan-assembler {\tb\tshared_callee} } } */ + +void +shared_to_new () [[arm::inout("za")]] +{ + new_callee (); +} +/* { dg-final { scan-assembler {\tbl\tnew_callee} } } */ + +void +shared_to_normal () [[arm::inout("za")]] +{ + normal_callee (); +} +/* { dg-final { scan-assembler {\tbl\tnormal_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c new file mode 100644 index 00000000000..a5f576d2044 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c @@ -0,0 +1,26 @@ +/* { dg-options "-O2" } */ + +void shared_callee () [[arm::inout("za")]]; +[[arm::new("za")]] __attribute__((noipa)) void new_callee () {} +void normal_callee (); + +[[arm::new("za")]] void +new_to_shared () +{ + shared_callee (); +} +/* { dg-final { scan-assembler {\tbl\tshared_callee} } } */ + +[[arm::new("za")]] void +new_to_new () +{ + new_callee (); +} +/* { dg-final { scan-assembler {\tb\tnew_callee} } } */ + +[[arm::new("za")]] void +new_to_normal () +{ + normal_callee (); +} +/* { dg-final { scan-assembler {\tb\tnormal_callee} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c new file mode 100644 index 00000000000..33370f7a87f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2" } */ + +void shared_callee () [[arm::inout("za")]]; +[[arm::new("za")]] __attribute__((noipa)) void new_callee () {} +void normal_callee (); + +void +normal_to_new () +{ + new_callee (); +} +/* { dg-final { scan-assembler {\tb\tnew_callee} } } */ + +void +normal_to_normal () +{ + normal_callee (); +} +/* { dg-final { scan-assembler {\tb\tnormal_callee} } } */