From patchwork Wed Feb 9 17:00:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590575 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Vx6uIXhM; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5ls6L3Cz9sFq for ; Thu, 10 Feb 2022 04:02:33 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B5FF3857C59 for ; Wed, 9 Feb 2022 17:02:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6B5FF3857C59 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426151; bh=kedIb9lS/fSpPvwaJlSgy3ZRtN9DWdCC3y/E/lwTWSM=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Vx6uIXhMrSmKsWFlfn+ifRGCp4NBidoKj30GERnCs45CLkabT5mUe2nUesNGkyaXj qiXyID57dhnfLY8G9TAM2SPQGlM24TDlreCBiAmMFCz1P7kdgenGp6AWxZmy5pPxfj XtKJLmeD4TrdKtbk9SQ1aHNifgKubQ+Cg3eu1wl0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E254F3858433 for ; Wed, 9 Feb 2022 17:00:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E254F3858433 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8C5BBED1 for ; Wed, 9 Feb 2022 09:00:32 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 20F223F73B for ; Wed, 9 Feb 2022 09:00:31 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 1/8] aarch64: Tighten general_operand predicates References: Date: Wed, 09 Feb 2022 17:00:30 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch fixes some case in which *general_operand was used over *nonimmediate_operand by patterns that don't accept immediates. This avoids some complication with later patches. gcc/ * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set): Use aarch64_simd_nonimmediate_operand instead of aarch64_simd_general_operand. (@aarch64_combinez): Use nonimmediate_operand instead of general_operand. (@aarch64_combinez_be): Likewise. --- gcc/config/aarch64/aarch64-simd.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6646e069ad2..9529bdb4997 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1039,7 +1039,7 @@ (define_insn "aarch64_simd_vec_set" [(set (match_operand:VALL_F16 0 "register_operand" "=w,w,w") (vec_merge:VALL_F16 (vec_duplicate:VALL_F16 - (match_operand: 1 "aarch64_simd_general_operand" "w,?r,Utv")) + (match_operand: 1 "aarch64_simd_nonimmediate_operand" "w,?r,Utv")) (match_operand:VALL_F16 3 "register_operand" "0,0,0") (match_operand:SI 2 "immediate_operand" "i,i,i")))] "TARGET_SIMD" @@ -4380,7 +4380,7 @@ (define_insn "store_pair_lanes" (define_insn "@aarch64_combinez" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: - (match_operand:VDC 1 "general_operand" "w,?r,m") + (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m") (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))] "TARGET_SIMD && !BYTES_BIG_ENDIAN" "@ @@ -4395,7 +4395,7 @@ (define_insn "@aarch64_combinez_be" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero") - (match_operand:VDC 1 "general_operand" "w,?r,m")))] + (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")))] "TARGET_SIMD && BYTES_BIG_ENDIAN" "@ mov\\t%0.8b, %1.8b From patchwork Wed Feb 9 17:00:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590580 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=F05RgavQ; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5mx6rxmz9sFq for ; Thu, 10 Feb 2022 04:03:29 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B296A3858C2D for ; Wed, 9 Feb 2022 17:03:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B296A3858C2D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426207; bh=Al6YUjIXqZTEQrUqT+FEZmFKCxP1B1+OoHENepDKBRI=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=F05RgavQaoxRV09ethQfIOlTGg+HOL6Sh7oED5x8MLRQACs936zbG1U7lBiE5h4iB oc4N/QT7NDyoCD0QHSuIQGSXLOr3mpAIYdNy8eH/u6l0gBC998uhRHpyhXtlz7OviY rDHbA2qoEj5xyfCL5fg1ZU+4VuabmqFZxCYygDaw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 748ED3858406 for ; Wed, 9 Feb 2022 17:00:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 748ED3858406 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 23426ED1 for ; Wed, 9 Feb 2022 09:00:47 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B14D03F73B for ; Wed, 9 Feb 2022 09:00:46 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 2/8] aarch64: Generalise vec_set predicate References: Date: Wed, 09 Feb 2022 17:00:45 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" The aarch64_simd_vec_set define_insn takes memory operands, so this patch makes the vec_set optab expander do the same. gcc/ * config/aarch64/aarch64-simd.md (vec_set): Allow the element to be an aarch64_simd_nonimmediate_operand. --- gcc/config/aarch64/aarch64-simd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 9529bdb4997..872a3d78269 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1378,7 +1378,7 @@ (define_insn "vec_shr_" (define_expand "vec_set" [(match_operand:VALL_F16 0 "register_operand") - (match_operand: 1 "register_operand") + (match_operand: 1 "aarch64_simd_nonimmediate_operand") (match_operand:SI 2 "immediate_operand")] "TARGET_SIMD" { From patchwork Wed Feb 9 17:00:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590586 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WP7kR66I; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5p229Xtz9sFq for ; Thu, 10 Feb 2022 04:04:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1A8C53858425 for ; Wed, 9 Feb 2022 17:04:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1A8C53858425 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426264; bh=2+FqWqaCZrijamNPuxgw4Vkw0E9gduMtllqVt7T+sC0=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=WP7kR66IZEEWa9EJ6sMp7tCaQSRY0xM4JBn6Ie+reBWrVihDRBBXGjWjz78KTiRBl hVQgOwKDjB/xQGeAFVvzQP4gI713uUlE5FHcQH49yC0kdxu3wqRqUkCuy9J4EpTs3+ WQLMcbsfLsdO1Jj0clEfL9JJVWabU0+wj3o47Lzg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 07BA93858424 for ; Wed, 9 Feb 2022 17:01:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 07BA93858424 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B0CC9ED1 for ; Wed, 9 Feb 2022 09:01:00 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 449903F73B for ; Wed, 9 Feb 2022 09:01:00 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 3/8] aarch64: Generalise adjacency check for load_pair_lanes References: Date: Wed, 09 Feb 2022 17:00:58 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch generalises the load_pair_lanes guard so that it uses aarch64_check_consecutive_mems to check for consecutive mems. It also allows the pattern to be used for STRICT_ALIGNMENT targets if the alignment is high enough. The main aim is to avoid an inline test, for the sake of a later patch that needs to repeat it. Reusing aarch64_check_consecutive_mems seemed simpler than writing an entirely new function. gcc/ * config/aarch64/aarch64-protos.h (aarch64_mergeable_load_pair_p): Declare. * config/aarch64/aarch64-simd.md (load_pair_lanes): Use aarch64_mergeable_load_pair_p instead of inline check. * config/aarch64/aarch64.cc (aarch64_expand_vector_init): Likewise. (aarch64_check_consecutive_mems): Allow the reversed parameter to be null. (aarch64_mergeable_load_pair_p): New function. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 7 +-- gcc/config/aarch64/aarch64.cc | 54 ++++++++++++------- gcc/testsuite/gcc.target/aarch64/vec-init-6.c | 12 +++++ gcc/testsuite/gcc.target/aarch64/vec-init-7.c | 12 +++++ 5 files changed, 62 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-6.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-7.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 26368538a55..b75ed35635b 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1000,6 +1000,7 @@ void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *); int aarch64_ccmp_mode_to_code (machine_mode mode); bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset); +bool aarch64_mergeable_load_pair_p (machine_mode, rtx, rtx); bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode); bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode); void aarch64_swap_ldrstr_operands (rtx *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 872a3d78269..c5bc2ea658b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4353,11 +4353,8 @@ (define_insn "load_pair_lanes" (vec_concat: (match_operand:VDC 1 "memory_operand" "Utq") (match_operand:VDC 2 "memory_operand" "m")))] - "TARGET_SIMD && !STRICT_ALIGNMENT - && rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[1], 0), - GET_MODE_SIZE (mode)))" + "TARGET_SIMD + && aarch64_mergeable_load_pair_p (mode, operands[1], operands[2])" "ldr\\t%q0, %1" [(set_attr "type" "neon_load1_1reg_q")] ) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 296145e6008..c47543aebf3 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -21063,11 +21063,7 @@ aarch64_expand_vector_init (rtx target, rtx vals) for store_pair_lanes. */ if (memory_operand (x0, inner_mode) && memory_operand (x1, inner_mode) - && !STRICT_ALIGNMENT - && rtx_equal_p (XEXP (x1, 0), - plus_constant (Pmode, - XEXP (x0, 0), - GET_MODE_SIZE (inner_mode)))) + && aarch64_mergeable_load_pair_p (mode, x0, x1)) { rtx t; if (inner_mode == DFmode) @@ -24687,14 +24683,20 @@ aarch64_sched_adjust_priority (rtx_insn *insn, int priority) return priority; } -/* Check if *MEM1 and *MEM2 are consecutive memory references and, +/* If REVERSED is null, return true if memory reference *MEM2 comes + immediately after memory reference *MEM1. Do not change the references + in this case. + + Otherwise, check if *MEM1 and *MEM2 are consecutive memory references and, if they are, try to make them use constant offsets from the same base register. Return true on success. When returning true, set *REVERSED to true if *MEM1 comes after *MEM2, false if *MEM1 comes before *MEM2. */ static bool aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed) { - *reversed = false; + if (reversed) + *reversed = false; + if (GET_RTX_CLASS (GET_CODE (XEXP (*mem1, 0))) == RTX_AUTOINC || GET_RTX_CLASS (GET_CODE (XEXP (*mem2, 0))) == RTX_AUTOINC) return false; @@ -24723,7 +24725,7 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed) if (known_eq (UINTVAL (offset1) + size1, UINTVAL (offset2))) return true; - if (known_eq (UINTVAL (offset2) + size2, UINTVAL (offset1))) + if (known_eq (UINTVAL (offset2) + size2, UINTVAL (offset1)) && reversed) { *reversed = true; return true; @@ -24756,22 +24758,25 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed) if (known_eq (expr_offset1 + size1, expr_offset2)) ; - else if (known_eq (expr_offset2 + size2, expr_offset1)) + else if (known_eq (expr_offset2 + size2, expr_offset1) && reversed) *reversed = true; else return false; - if (base2) + if (reversed) { - rtx addr1 = plus_constant (Pmode, XEXP (*mem2, 0), - expr_offset1 - expr_offset2); - *mem1 = replace_equiv_address_nv (*mem1, addr1); - } - else - { - rtx addr2 = plus_constant (Pmode, XEXP (*mem1, 0), - expr_offset2 - expr_offset1); - *mem2 = replace_equiv_address_nv (*mem2, addr2); + if (base2) + { + rtx addr1 = plus_constant (Pmode, XEXP (*mem2, 0), + expr_offset1 - expr_offset2); + *mem1 = replace_equiv_address_nv (*mem1, addr1); + } + else + { + rtx addr2 = plus_constant (Pmode, XEXP (*mem1, 0), + expr_offset2 - expr_offset1); + *mem2 = replace_equiv_address_nv (*mem2, addr2); + } } return true; } @@ -24779,6 +24784,17 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed) return false; } +/* Return true if MEM1 and MEM2 can be combined into a single access + of mode MODE, with the combined access having the same address as MEM1. */ + +bool +aarch64_mergeable_load_pair_p (machine_mode mode, rtx mem1, rtx mem2) +{ + if (STRICT_ALIGNMENT && MEM_ALIGN (mem1) < GET_MODE_ALIGNMENT (mode)) + return false; + return aarch64_check_consecutive_mems (&mem1, &mem2, nullptr); +} + /* Given OPERANDS of consecutive load/store, check if we can merge them into ldp/stp. LOAD is true if they are load instructions. MODE is the mode of memory operands. */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-6.c b/gcc/testsuite/gcc.target/aarch64/vec-init-6.c new file mode 100644 index 00000000000..96450157498 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-6.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +int64_t s64[2]; +float64_t f64[2]; + +int64x2_t test_s64() { return (int64x2_t) { s64[0], s64[1] }; } +float64x2_t test_f64() { return (float64x2_t) { f64[0], f64[1] }; } + +/* { dg-final { scan-assembler-not {\tins\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-7.c b/gcc/testsuite/gcc.target/aarch64/vec-init-7.c new file mode 100644 index 00000000000..795895286db --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-7.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mstrict-align" } */ + +#include + +int64_t s64[2] __attribute__((aligned(16))); +float64_t f64[2] __attribute__((aligned(16))); + +int64x2_t test_s64() { return (int64x2_t) { s64[0], s64[1] }; } +float64x2_t test_f64() { return (float64x2_t) { f64[0], f64[1] }; } + +/* { dg-final { scan-assembler-not {\tins\t} } } */ From patchwork Wed Feb 9 17:01:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590591 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=cZZ8V7b8; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5q7141Tz9sFq for ; Thu, 10 Feb 2022 04:05:23 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 882143858415 for ; Wed, 9 Feb 2022 17:05:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 882143858415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426320; bh=Hb+nGeiZq3DeRmCdSYXW+eOhJCRK4AB26PPYc3XYb6k=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=cZZ8V7b8DfusedIRnIoCP3PrQO2CSRMMqSuuRcfhko5HOOEVbMHQAhakG/11O/fga K4jA9rYpCBinNdnVilMIGBf0xa1aXB8/3/wDR/j9kLVGtvz7R7Xn8LG4wi1Q59RULf p6U/yX/6k2xixGRA4b4PWkQbM+j4IINwQgvODxek= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 35F1B3857C63 for ; Wed, 9 Feb 2022 17:01:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 35F1B3857C63 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9DE2ED1 for ; Wed, 9 Feb 2022 09:01:14 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6E8443F73B for ; Wed, 9 Feb 2022 09:01:14 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 4/8] aarch64: Remove redundant vec_concat patterns References: Date: Wed, 09 Feb 2022 17:01:12 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" move_lo_quad_internal_ and move_lo_quad_internal_be_ partially duplicate the later aarch64_combinez{,_be} patterns. The duplication itself is a regression. The only substantive differences between the two are: * combinez uses vector MOV (ORR) instead of element MOV (DUP). The former seems more likely to be handled via renaming. * combinez disparages the GPR->FPR alternative whereas move_lo_quad gave it equal cost. The new test gives a token example of when the combinez behaviour helps. gcc/ * config/aarch64/aarch64-simd.md (move_lo_quad_internal_) (move_lo_quad_internal_be_): Delete. (move_lo_quad_): Use aarch64_combine instead of the above. gcc/testsuite/ * gcc.target/aarch64/vec-init-8.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 37 +------------------ gcc/testsuite/gcc.target/aarch64/vec-init-8.c | 15 ++++++++ 2 files changed, 17 insertions(+), 35 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-8.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c5bc2ea658b..d6cd4c70fe7 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1584,46 +1584,13 @@ (define_insn "aarch64_p" ;; On little-endian this is { operand, zeroes } ;; On big-endian this is { zeroes, operand } -(define_insn "move_lo_quad_internal_" - [(set (match_operand:VQMOV 0 "register_operand" "=w,w,w") - (vec_concat:VQMOV - (match_operand: 1 "register_operand" "w,r,r") - (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")))] - "TARGET_SIMD && !BYTES_BIG_ENDIAN" - "@ - dup\\t%d0, %1.d[0] - fmov\\t%d0, %1 - dup\\t%d0, %1" - [(set_attr "type" "neon_dup,f_mcr,neon_dup") - (set_attr "length" "4") - (set_attr "arch" "simd,fp,simd")] -) - -(define_insn "move_lo_quad_internal_be_" - [(set (match_operand:VQMOV 0 "register_operand" "=w,w,w") - (vec_concat:VQMOV - (match_operand: 2 "aarch64_simd_or_scalar_imm_zero") - (match_operand: 1 "register_operand" "w,r,r")))] - "TARGET_SIMD && BYTES_BIG_ENDIAN" - "@ - dup\\t%d0, %1.d[0] - fmov\\t%d0, %1 - dup\\t%d0, %1" - [(set_attr "type" "neon_dup,f_mcr,neon_dup") - (set_attr "length" "4") - (set_attr "arch" "simd,fp,simd")] -) - (define_expand "move_lo_quad_" [(match_operand:VQMOV 0 "register_operand") (match_operand: 1 "register_operand")] "TARGET_SIMD" { - rtx zs = CONST0_RTX (mode); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_move_lo_quad_internal_be_ (operands[0], operands[1], zs)); - else - emit_insn (gen_move_lo_quad_internal_ (operands[0], operands[1], zs)); + emit_insn (gen_aarch64_combine (operands[0], operands[1], + CONST0_RTX (mode))); DONE; } ) diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-8.c b/gcc/testsuite/gcc.target/aarch64/vec-init-8.c new file mode 100644 index 00000000000..18f8afe10f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-8.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +int64x2_t f1(int64_t *ptr) { + int64_t x = *ptr; + asm volatile ("" ::: "memory"); + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { 0, x }; + else + return (int64x2_t) { x, 0 }; +} + +/* { dg-final { scan-assembler {\tldr\td0, \[x0\]\n} } } */ From patchwork Wed Feb 9 17:01:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590598 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=fhJN9RCp; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5rK5ZJRz9sFq for ; Thu, 10 Feb 2022 04:06:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F3D513858421 for ; Wed, 9 Feb 2022 17:06:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F3D513858421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426382; bh=ZDo/kZRmBX96E5EIOZPg9RzV6iZPO5Hk+JlwXG1rOJ8=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fhJN9RCpGNGvObWSJkJU9n3Tkc78KtoB87q1MTSu3BMARzCZJXwQ1BcxP5udkmKDd 4TfBj8s11DvOPBHovYlxiQeY1/CLPa9mWjd/YCRoO1NJYibfcpKls9stHyKoe4WmH0 /IGNCYmfZB/XilxRnWP4UBPWl9SH0Ea/xu3hcJnE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 2525F385AC25 for ; Wed, 9 Feb 2022 17:01:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2525F385AC25 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BDD2DED1 for ; Wed, 9 Feb 2022 09:01:29 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3131C3F73B for ; Wed, 9 Feb 2022 09:01:29 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 5/8] aarch64: Add more vec_combine patterns References: Date: Wed, 09 Feb 2022 17:01:27 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" vec_combine is really one instruction on aarch64, provided that the lowpart element is in the same register as the destination vector. This patch adds patterns for that. The patch fixes a regression from GCC 8. Before the patch: int64x2_t s64q_1(int64_t a0, int64_t a1) { if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) return (int64x2_t) { a1, a0 }; else return (int64x2_t) { a0, a1 }; } generated: fmov d0, x0 ins v0.d[1], x1 ins v0.d[1], x1 ret whereas GCC 8 generated the more respectable: dup v0.2d, x0 ins v0.d[1], x1 ret gcc/ * config/aarch64/predicates.md (aarch64_reg_or_mem_pair_operand): New predicate. * config/aarch64/aarch64-simd.md (*aarch64_combine_internal) (*aarch64_combine_internal_be): New patterns. gcc/testsuite/ * gcc.target/aarch64/vec-init-9.c: New test. * gcc.target/aarch64/vec-init-10.c: Likewise. * gcc.target/aarch64/vec-init-11.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 62 ++++ gcc/config/aarch64/predicates.md | 4 + .../gcc.target/aarch64/vec-init-10.c | 15 + .../gcc.target/aarch64/vec-init-11.c | 12 + gcc/testsuite/gcc.target/aarch64/vec-init-9.c | 267 ++++++++++++++++++ 5 files changed, 360 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-10.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-11.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-9.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d6cd4c70fe7..ead80396e70 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4326,6 +4326,25 @@ (define_insn "load_pair_lanes" [(set_attr "type" "neon_load1_1reg_q")] ) +;; This STP pattern is a partial duplicate of the general vec_concat patterns +;; below. The reason for having both of them is that the alternatives of +;; the later patterns do not have consistent register preferences: the STP +;; alternatives have no preference between GPRs and FPRs (and if anything, +;; the GPR form is more natural for scalar integers) whereas the other +;; alternatives *require* an FPR for operand 1 and prefer one for operand 2. +;; +;; Using "*" to hide the STP alternatives from the RA penalizes cases in +;; which the destination was always memory. On the other hand, expressing +;; the true preferences makes GPRs seem more palatable than they really are +;; for register destinations. +;; +;; Despite that, we do still want the general form to have STP alternatives, +;; in order to handle cases where a register destination is spilled. +;; +;; The best compromise therefore seemed to be to have a dedicated STP +;; pattern to catch cases in which the destination was always memory. +;; This dedicated pattern must come first. + (define_insn "store_pair_lanes" [(set (match_operand: 0 "aarch64_mem_pair_lanes_operand" "=Umn, Umn") (vec_concat: @@ -4338,6 +4357,49 @@ (define_insn "store_pair_lanes" [(set_attr "type" "neon_stp, store_16")] ) +;; Form a vector whose least significant half comes from operand 1 and whose +;; most significant half comes from operand 2. The register alternatives +;; tie the least significant half to the same register as the destination, +;; so that only the other half needs to be handled explicitly. For the +;; reasons given above, the STP alternatives use ? for constraints that +;; the register alternatives either don't accept or themselves disparage. + +(define_insn "*aarch64_combine_internal" + [(set (match_operand: 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") + (vec_concat: + (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r") + (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))] + "TARGET_SIMD + && !BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[2], mode))" + "@ + ins\t%0.d[1], %2.d[0] + ins\t%0.d[1], %2 + ld1\t{%0.d}[1], %2 + stp\t%d1, %d2, %y0 + stp\t%x1, %x2, %y0" + [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")] +) + +(define_insn "*aarch64_combine_internal_be" + [(set (match_operand: 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") + (vec_concat: + (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r") + (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")))] + "TARGET_SIMD + && BYTES_BIG_ENDIAN + && (register_operand (operands[0], mode) + || register_operand (operands[2], mode))" + "@ + ins\t%0.d[1], %2.d[0] + ins\t%0.d[1], %2 + ld1\t{%0.d}[1], %2 + stp\t%d2, %d1, %y0 + stp\t%x2, %x1, %y0" + [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")] +) + ;; In this insn, operand 1 should be low, and operand 2 the high part of the ;; dest vector. diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 7dc4c155ea8..c308015ac2c 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -254,6 +254,10 @@ (define_predicate "aarch64_mem_pair_lanes_operand" false, ADDR_QUERY_LDP_STP_N)"))) +(define_predicate "aarch64_reg_or_mem_pair_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_mem_pair_lanes_operand"))) + (define_predicate "aarch64_prefetch_operand" (match_test "aarch64_address_valid_for_prefetch_p (op, false)")) diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-10.c b/gcc/testsuite/gcc.target/aarch64/vec-init-10.c new file mode 100644 index 00000000000..f5dd83b94b5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-10.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +int64x2_t f1(int64_t *x, int c) { + return c ? (int64x2_t) { x[0], x[2] } : (int64x2_t) { 0, 0 }; +} + +int64x2_t f2(int64_t *x, int i0, int i1, int c) { + return c ? (int64x2_t) { x[i0], x[i1] } : (int64x2_t) { 0, 0 }; +} + +/* { dg-final { scan-assembler-times {\t(?:ldr\td[0-9]+|ld1\t)} 4 } } */ +/* { dg-final { scan-assembler-not {\tldr\tx} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-11.c b/gcc/testsuite/gcc.target/aarch64/vec-init-11.c new file mode 100644 index 00000000000..df242702c0c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-11.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +void f1(int64x2_t *res, int64_t *x, int c0, int c1) { + res[0] = (int64x2_t) { c0 ? x[0] : 0, c1 ? x[2] : 0 }; +} + +/* { dg-final { scan-assembler-times {\tldr\tx[0-9]+} 2 } } */ +/* { dg-final { scan-assembler {\tstp\tx[0-9]+, x[0-9]+} } } */ +/* { dg-final { scan-assembler-not {\tldr\td} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-9.c b/gcc/testsuite/gcc.target/aarch64/vec-init-9.c new file mode 100644 index 00000000000..8f68e06a559 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-9.c @@ -0,0 +1,267 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#include + +void ext(); + +/* +** s64q_1: +** fmov d0, x0 +** ins v0\.d\[1\], x1 +** ret +*/ +int64x2_t s64q_1(int64_t a0, int64_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { a1, a0 }; + else + return (int64x2_t) { a0, a1 }; +} +/* +** s64q_2: +** fmov d0, x0 +** ld1 {v0\.d}\[1\], \[x1\] +** ret +*/ +int64x2_t s64q_2(int64_t a0, int64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { ptr[0], a0 }; + else + return (int64x2_t) { a0, ptr[0] }; +} +/* +** s64q_3: +** ldr d0, \[x0\] +** ins v0\.d\[1\], x1 +** ret +*/ +int64x2_t s64q_3(int64_t *ptr, int64_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { a1, ptr[0] }; + else + return (int64x2_t) { ptr[0], a1 }; +} +/* +** s64q_4: +** stp x1, x2, \[x0\] +** ret +*/ +void s64q_4(int64x2_t *res, int64_t a0, int64_t a1) { + res[0] = (int64x2_t) { a0, a1 }; +} +/* +** s64q_5: +** stp x1, x2, \[x0, #?8\] +** ret +*/ +void s64q_5(uintptr_t res, int64_t a0, int64_t a1) { + *(int64x2_t *)(res + 8) = (int64x2_t) { a0, a1 }; +} +/* +** s64q_6: +** ... +** stp x0, x1, .* +** ... +** ldr q0, .* +** ... +** ret +*/ +int64x2_t s64q_6(int64_t a0, int64_t a1) { + int64x2_t res = { a0, a1 }; + ext (); + return res; +} + +/* +** f64q_1: +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +float64x2_t f64q_1(float64_t a0, float64_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { a1, a0 }; + else + return (float64x2_t) { a0, a1 }; +} +/* +** f64q_2: +** ld1 {v0\.d}\[1\], \[x0\] +** ret +*/ +float64x2_t f64q_2(float64_t a0, float64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { ptr[0], a0 }; + else + return (float64x2_t) { a0, ptr[0] }; +} +/* +** f64q_3: +** ldr d0, \[x0\] +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +float64x2_t f64q_3(float64_t a0, float64_t a1, float64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { a1, ptr[0] }; + else + return (float64x2_t) { ptr[0], a1 }; +} +/* +** f64q_4: +** stp d0, d1, \[x0\] +** ret +*/ +void f64q_4(float64x2_t *res, float64_t a0, float64_t a1) { + res[0] = (float64x2_t) { a0, a1 }; +} +/* +** f64q_5: +** stp d0, d1, \[x0, #?8\] +** ret +*/ +void f64q_5(uintptr_t res, float64_t a0, float64_t a1) { + *(float64x2_t *)(res + 8) = (float64x2_t) { a0, a1 }; +} +/* +** f64q_6: +** ... +** stp d0, d1, .* +** ... +** ldr q0, .* +** ... +** ret +*/ +float64x2_t f64q_6(float64_t a0, float64_t a1) { + float64x2_t res = { a0, a1 }; + ext (); + return res; +} + +/* +** s32q_1: +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +int32x4_t s32q_1(int32x2_t a0, int32x2_t a1) { + return vcombine_s32 (a0, a1); +} +/* +** s32q_2: +** ld1 {v0\.d}\[1\], \[x0\] +** ret +*/ +int32x4_t s32q_2(int32x2_t a0, int32x2_t *ptr) { + return vcombine_s32 (a0, ptr[0]); +} +/* +** s32q_3: +** ldr d0, \[x0\] +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +int32x4_t s32q_3(int32x2_t a0, int32x2_t a1, int32x2_t *ptr) { + return vcombine_s32 (ptr[0], a1); +} +/* +** s32q_4: +** stp d0, d1, \[x0\] +** ret +*/ +void s32q_4(int32x4_t *res, int32x2_t a0, int32x2_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + res[0] = vcombine_s32 (a1, a0); + else + res[0] = vcombine_s32 (a0, a1); +} +/* +** s32q_5: +** stp d0, d1, \[x0, #?8\] +** ret +*/ +void s32q_5(uintptr_t res, int32x2_t a0, int32x2_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + *(int32x4_t *)(res + 8) = vcombine_s32 (a1, a0); + else + *(int32x4_t *)(res + 8) = vcombine_s32 (a0, a1); +} +/* +** s32q_6: +** ... +** stp d0, d1, .* +** ... +** ldr q0, .* +** ... +** ret +*/ +int32x4_t s32q_6(int32x2_t a0, int32x2_t a1) { + int32x4_t res = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + ? vcombine_s32 (a1, a0) + : vcombine_s32 (a0, a1)); + ext (); + return res; +} + +/* +** f32q_1: +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +float32x4_t f32q_1(float32x2_t a0, float32x2_t a1) { + return vcombine_f32 (a0, a1); +} +/* +** f32q_2: +** ld1 {v0\.d}\[1\], \[x0\] +** ret +*/ +float32x4_t f32q_2(float32x2_t a0, float32x2_t *ptr) { + return vcombine_f32 (a0, ptr[0]); +} +/* +** f32q_3: +** ldr d0, \[x0\] +** ins v0\.d\[1\], v1\.d\[0\] +** ret +*/ +float32x4_t f32q_3(float32x2_t a0, float32x2_t a1, float32x2_t *ptr) { + return vcombine_f32 (ptr[0], a1); +} +/* +** f32q_4: +** stp d0, d1, \[x0\] +** ret +*/ +void f32q_4(float32x4_t *res, float32x2_t a0, float32x2_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + res[0] = vcombine_f32 (a1, a0); + else + res[0] = vcombine_f32 (a0, a1); +} +/* +** f32q_5: +** stp d0, d1, \[x0, #?8\] +** ret +*/ +void f32q_5(uintptr_t res, float32x2_t a0, float32x2_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + *(float32x4_t *)(res + 8) = vcombine_f32 (a1, a0); + else + *(float32x4_t *)(res + 8) = vcombine_f32 (a0, a1); +} +/* +** f32q_6: +** ... +** stp d0, d1, .* +** ... +** ldr q0, .* +** ... +** ret +*/ +float32x4_t f32q_6(float32x2_t a0, float32x2_t a1) { + float32x4_t res = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + ? vcombine_f32 (a1, a0) + : vcombine_f32 (a0, a1)); + ext (); + return res; +} From patchwork Wed Feb 9 17:01:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590603 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Jln5KuhC; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5sW5DYjz9sFq for ; Thu, 10 Feb 2022 04:07:27 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A53863858431 for ; Wed, 9 Feb 2022 17:07:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A53863858431 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426445; bh=tcKMy0XfKKJKcEzsGG1YPUuwe1Lf8Ju/Rd+ISGy64zM=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Jln5KuhCTSXyqpjEBWFskH81oTPA3Qo1h2LN38xW+/juxDBuuDsjegRdK+IcGHLul ydt4+01NNStmLs53sXuQN/XJP148M6wg2r3VqNQEgnKnXi/Bjc32sMpnT5CnOhws/8 bGHo5JhevSTk0DSRpYtTQTT5u5e4tGVV++1UJFaU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id BCE79385AC1E for ; Wed, 9 Feb 2022 17:01:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BCE79385AC1E Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6BC1FED1 for ; Wed, 9 Feb 2022 09:01:48 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F401D3F73B for ; Wed, 9 Feb 2022 09:01:47 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 6/8] aarch64: Add a general vec_concat expander References: Date: Wed, 09 Feb 2022 17:01:46 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" After previous patches, we have a (mostly new) group of vec_concat patterns as well as vestiges of the old move_lo/hi_quad patterns. (A previous patch removed the move_lo_quad insns, but we still have the move_hi_quad insns and both sets of expanders.) This patch is the first of two to remove the old move_lo/hi_quad stuff. It isn't technically a regression fix, but it seemed better to make the changes now rather than leave things in a half-finished and inconsistent state. This patch defines an aarch64_vec_concat expander that coerces the element operands into a valid form, including the ones added by the previous patch. This in turn lets us get rid of one move_lo/hi_quad pair. As a side-effect, it also means that vcombines of 2 vectors make better use of the available forms, like vec_inits of 2 scalars already do. gcc/ * config/aarch64/aarch64-protos.h (aarch64_split_simd_combine): Delete. * config/aarch64/aarch64-simd.md (@aarch64_combinez): Rename to... (*aarch64_combinez): ...this. (@aarch64_combinez_be): Rename to... (*aarch64_combinez_be): ...this. (@aarch64_vec_concat): New expander. (aarch64_combine): Use it. (@aarch64_simd_combine): Delete. * config/aarch64/aarch64.cc (aarch64_split_simd_combine): Delete. (aarch64_expand_vector_init): Use aarch64_vec_concat. gcc/testsuite/ * gcc.target/aarch64/vec-init-12.c: New test. --- gcc/config/aarch64/aarch64-protos.h | 2 - gcc/config/aarch64/aarch64-simd.md | 76 ++++++++++++------- gcc/config/aarch64/aarch64.cc | 55 ++------------ .../gcc.target/aarch64/vec-init-12.c | 65 ++++++++++++++++ 4 files changed, 122 insertions(+), 76 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-12.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index b75ed35635b..392efa0b74d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -925,8 +925,6 @@ bool aarch64_split_128bit_move_p (rtx, rtx); bool aarch64_mov128_immediate (rtx); -void aarch64_split_simd_combine (rtx, rtx, rtx); - void aarch64_split_simd_move (rtx, rtx); /* Check for a legitimate floating point constant for FMOV. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ead80396e70..7acde0dd099 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4403,7 +4403,7 @@ (define_insn "*aarch64_combine_internal_be" ;; In this insn, operand 1 should be low, and operand 2 the high part of the ;; dest vector. -(define_insn "@aarch64_combinez" +(define_insn "*aarch64_combinez" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m") @@ -4417,7 +4417,7 @@ (define_insn "@aarch64_combinez" (set_attr "arch" "simd,fp,simd")] ) -(define_insn "@aarch64_combinez_be" +(define_insn "*aarch64_combinez_be" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero") @@ -4431,38 +4431,62 @@ (define_insn "@aarch64_combinez_be" (set_attr "arch" "simd,fp,simd")] ) -(define_expand "aarch64_combine" - [(match_operand: 0 "register_operand") - (match_operand:VDC 1 "register_operand") - (match_operand:VDC 2 "aarch64_simd_reg_or_zero")] +;; Form a vector whose first half (in array order) comes from operand 1 +;; and whose second half (in array order) comes from operand 2. +;; This operand order follows the RTL vec_concat operation. +(define_expand "@aarch64_vec_concat" + [(set (match_operand: 0 "register_operand") + (vec_concat: + (match_operand:VDC 1 "general_operand") + (match_operand:VDC 2 "general_operand")))] "TARGET_SIMD" { - if (operands[2] == CONST0_RTX (mode)) + int lo = BYTES_BIG_ENDIAN ? 2 : 1; + int hi = BYTES_BIG_ENDIAN ? 1 : 2; + + if (MEM_P (operands[1]) + && MEM_P (operands[2]) + && aarch64_mergeable_load_pair_p (mode, operands[1], operands[2])) + /* Use load_pair_lanes. */ + ; + else if (operands[hi] == CONST0_RTX (mode)) { - if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_combinez_be (operands[0], operands[1], - operands[2])); - else - emit_insn (gen_aarch64_combinez (operands[0], operands[1], - operands[2])); + /* Use *aarch64_combinez. */ + if (!nonimmediate_operand (operands[lo], mode)) + operands[lo] = force_reg (mode, operands[lo]); } else - aarch64_split_simd_combine (operands[0], operands[1], operands[2]); - DONE; -} -) + { + /* Use *aarch64_combine_general. */ + operands[lo] = force_reg (mode, operands[lo]); + if (!aarch64_simd_nonimmediate_operand (operands[hi], mode)) + { + if (MEM_P (operands[hi])) + { + rtx addr = force_reg (Pmode, XEXP (operands[hi], 0)); + operands[hi] = replace_equiv_address (operands[hi], addr); + } + else + operands[hi] = force_reg (mode, operands[hi]); + } + } +}) -(define_expand "@aarch64_simd_combine" +;; Form a vector whose least significant half comes from operand 1 and whose +;; most significant half comes from operand 2. This operand order follows +;; arm_neon.h vcombine* intrinsics. +(define_expand "aarch64_combine" [(match_operand: 0 "register_operand") - (match_operand:VDC 1 "register_operand") - (match_operand:VDC 2 "register_operand")] + (match_operand:VDC 1 "general_operand") + (match_operand:VDC 2 "general_operand")] "TARGET_SIMD" - { - emit_insn (gen_move_lo_quad_ (operands[0], operands[1])); - emit_insn (gen_move_hi_quad_ (operands[0], operands[2])); - DONE; - } -[(set_attr "type" "multiple")] +{ + if (BYTES_BIG_ENDIAN) + std::swap (operands[1], operands[2]); + emit_insn (gen_aarch64_vec_concat (operands[0], operands[1], + operands[2])); + DONE; +} ) ;; l. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index c47543aebf3..af42d1bedfe 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -4239,23 +4239,6 @@ aarch64_split_128bit_move_p (rtx dst, rtx src) return true; } -/* Split a complex SIMD combine. */ - -void -aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2) -{ - machine_mode src_mode = GET_MODE (src1); - machine_mode dst_mode = GET_MODE (dst); - - gcc_assert (VECTOR_MODE_P (dst_mode)); - gcc_assert (register_operand (dst, dst_mode) - && register_operand (src1, src_mode) - && register_operand (src2, src_mode)); - - emit_insn (gen_aarch64_simd_combine (src_mode, dst, src1, src2)); - return; -} - /* Split a complex SIMD move. */ void @@ -20941,37 +20924,13 @@ aarch64_expand_vector_init (rtx target, rtx vals) of mode N in VALS and we must put their concatentation into TARGET. */ if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, 0)))) { - gcc_assert (known_eq (GET_MODE_SIZE (mode), - 2 * GET_MODE_SIZE (GET_MODE (XVECEXP (vals, 0, 0))))); - rtx lo = XVECEXP (vals, 0, 0); - rtx hi = XVECEXP (vals, 0, 1); - machine_mode narrow_mode = GET_MODE (lo); - gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode); - gcc_assert (narrow_mode == GET_MODE (hi)); - - /* When we want to concatenate a half-width vector with zeroes we can - use the aarch64_combinez[_be] patterns. Just make sure that the - zeroes are in the right half. */ - if (BYTES_BIG_ENDIAN - && aarch64_simd_imm_zero (lo, narrow_mode) - && general_operand (hi, narrow_mode)) - emit_insn (gen_aarch64_combinez_be (narrow_mode, target, hi, lo)); - else if (!BYTES_BIG_ENDIAN - && aarch64_simd_imm_zero (hi, narrow_mode) - && general_operand (lo, narrow_mode)) - emit_insn (gen_aarch64_combinez (narrow_mode, target, lo, hi)); - else - { - /* Else create the two half-width registers and combine them. */ - if (!REG_P (lo)) - lo = force_reg (GET_MODE (lo), lo); - if (!REG_P (hi)) - hi = force_reg (GET_MODE (hi), hi); - - if (BYTES_BIG_ENDIAN) - std::swap (lo, hi); - emit_insn (gen_aarch64_simd_combine (narrow_mode, target, lo, hi)); - } + machine_mode narrow_mode = GET_MODE (XVECEXP (vals, 0, 0)); + gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode + && known_eq (GET_MODE_SIZE (mode), + 2 * GET_MODE_SIZE (narrow_mode))); + emit_insn (gen_aarch64_vec_concat (narrow_mode, target, + XVECEXP (vals, 0, 0), + XVECEXP (vals, 0, 1))); return; } diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-12.c b/gcc/testsuite/gcc.target/aarch64/vec-init-12.c new file mode 100644 index 00000000000..c287478e2d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-12.c @@ -0,0 +1,65 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#include + +/* +** s32_1: +** ldr q0, \[x0\] +** ret +*/ +int32x4_t s32_1(int32x2_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return vcombine_s32 (ptr[1], ptr[0]); + else + return vcombine_s32 (ptr[0], ptr[1]); +} +/* +** s32_2: +** add x([0-9])+, x0, #?8 +** ld1 {v0\.d}\[1\], \[x\1\] +** ret +*/ +int32x4_t s32_2(int32x2_t a0, int32x2_t *ptr) { + return vcombine_s32 (a0, ptr[1]); +} +/* +** s32_3: +** ldr d0, \[x0\], #?16 +** ld1 {v0\.d}\[1\], \[x0\] +** ret +*/ +int32x4_t s32_3(int32x2_t *ptr) { + return vcombine_s32 (ptr[0], ptr[2]); +} + +/* +** f32_1: +** ldr q0, \[x0\] +** ret +*/ +float32x4_t f32_1(float32x2_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return vcombine_f32 (ptr[1], ptr[0]); + else + return vcombine_f32 (ptr[0], ptr[1]); +} +/* +** f32_2: +** add x([0-9])+, x0, #?8 +** ld1 {v0\.d}\[1\], \[x\1\] +** ret +*/ +float32x4_t f32_2(float32x2_t a0, float32x2_t *ptr) { + return vcombine_f32 (a0, ptr[1]); +} +/* +** f32_3: +** ldr d0, \[x0\], #?16 +** ld1 {v0\.d}\[1\], \[x0\] +** ret +*/ +float32x4_t f32_3(float32x2_t *ptr) { + return vcombine_f32 (ptr[0], ptr[2]); +} From patchwork Wed Feb 9 17:01:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590609 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=jLA4ZtdB; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5tc3RqQz9sFq for ; Thu, 10 Feb 2022 04:08:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E5D673858415 for ; Wed, 9 Feb 2022 17:08:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E5D673858415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426501; bh=ijOwJHFr6Pho0DFzb7BV+84M1i0U09q4ZiKIF12soWg=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jLA4ZtdBkMf/GLA3TVAXuUiaf4I+8J7JxortTBrhQDZuzGNMKlk61T/c2duQFPjLX 8nUahHfGIj344qz7ItibpK/ISRjO/erfUg9nwsLL9x1imp3MwzW3NNhB8iavJRc6CP mNFhjvV8LLQ2uqWxA+1yylu7+0u6Vt//lIBidFrc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 25CD13857815 for ; Wed, 9 Feb 2022 17:02:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25CD13857815 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D5CD4ED1 for ; Wed, 9 Feb 2022 09:02:01 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6A42B3F73B for ; Wed, 9 Feb 2022 09:02:01 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 7/8] aarch64: Remove move_lo/hi_quad expanders References: Date: Wed, 09 Feb 2022 17:01:59 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is the second of two to remove the old move_lo/hi_quad expanders and move_hi_quad insns. gcc/ * config/aarch64/aarch64-simd.md (@aarch64_split_simd_mov): Use aarch64_combine instead of move_lo/hi_quad. Tabify. (move_lo_quad_, aarch64_simd_move_hi_quad_): Delete. (aarch64_simd_move_hi_quad_be_, move_hi_quad_): Delete. (vec_pack_trunc_): Take general_operand elements and use aarch64_combine rather than move_lo/hi_quad to combine them. (vec_pack_trunc_df): Likewise. --- gcc/config/aarch64/aarch64-simd.md | 111 +++++------------------------ 1 file changed, 18 insertions(+), 93 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 7acde0dd099..ef6e772503d 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -272,7 +272,7 @@ (define_split (define_expand "@aarch64_split_simd_mov" [(set (match_operand:VQMOV 0) - (match_operand:VQMOV 1))] + (match_operand:VQMOV 1))] "TARGET_SIMD" { rtx dst = operands[0]; @@ -280,23 +280,22 @@ (define_expand "@aarch64_split_simd_mov" if (GP_REGNUM_P (REGNO (src))) { - rtx src_low_part = gen_lowpart (mode, src); - rtx src_high_part = gen_highpart (mode, src); + rtx src_low_part = gen_lowpart (mode, src); + rtx src_high_part = gen_highpart (mode, src); + rtx dst_low_part = gen_lowpart (mode, dst); - emit_insn - (gen_move_lo_quad_ (dst, src_low_part)); - emit_insn - (gen_move_hi_quad_ (dst, src_high_part)); + emit_move_insn (dst_low_part, src_low_part); + emit_insn (gen_aarch64_combine (dst, dst_low_part, + src_high_part)); } - else { - rtx dst_low_part = gen_lowpart (mode, dst); - rtx dst_high_part = gen_highpart (mode, dst); + rtx dst_low_part = gen_lowpart (mode, dst); + rtx dst_high_part = gen_highpart (mode, dst); rtx lo = aarch64_simd_vect_par_cnst_half (mode, , false); rtx hi = aarch64_simd_vect_par_cnst_half (mode, , true); - emit_insn (gen_aarch64_get_half (dst_low_part, src, lo)); - emit_insn (gen_aarch64_get_half (dst_high_part, src, hi)); + emit_insn (gen_aarch64_get_half (dst_low_part, src, lo)); + emit_insn (gen_aarch64_get_half (dst_high_part, src, hi)); } DONE; } @@ -1580,69 +1579,6 @@ (define_insn "aarch64_p" ;; What that means, is that the RTL descriptions of the below patterns ;; need to change depending on endianness. -;; Move to the low architectural bits of the register. -;; On little-endian this is { operand, zeroes } -;; On big-endian this is { zeroes, operand } - -(define_expand "move_lo_quad_" - [(match_operand:VQMOV 0 "register_operand") - (match_operand: 1 "register_operand")] - "TARGET_SIMD" -{ - emit_insn (gen_aarch64_combine (operands[0], operands[1], - CONST0_RTX (mode))); - DONE; -} -) - -;; Move operand1 to the high architectural bits of the register, keeping -;; the low architectural bits of operand2. -;; For little-endian this is { operand2, operand1 } -;; For big-endian this is { operand1, operand2 } - -(define_insn "aarch64_simd_move_hi_quad_" - [(set (match_operand:VQMOV 0 "register_operand" "+w,w") - (vec_concat:VQMOV - (vec_select: - (match_dup 0) - (match_operand:VQMOV 2 "vect_par_cnst_lo_half" "")) - (match_operand: 1 "register_operand" "w,r")))] - "TARGET_SIMD && !BYTES_BIG_ENDIAN" - "@ - ins\\t%0.d[1], %1.d[0] - ins\\t%0.d[1], %1" - [(set_attr "type" "neon_ins")] -) - -(define_insn "aarch64_simd_move_hi_quad_be_" - [(set (match_operand:VQMOV 0 "register_operand" "+w,w") - (vec_concat:VQMOV - (match_operand: 1 "register_operand" "w,r") - (vec_select: - (match_dup 0) - (match_operand:VQMOV 2 "vect_par_cnst_lo_half" ""))))] - "TARGET_SIMD && BYTES_BIG_ENDIAN" - "@ - ins\\t%0.d[1], %1.d[0] - ins\\t%0.d[1], %1" - [(set_attr "type" "neon_ins")] -) - -(define_expand "move_hi_quad_" - [(match_operand:VQMOV 0 "register_operand") - (match_operand: 1 "register_operand")] - "TARGET_SIMD" -{ - rtx p = aarch64_simd_vect_par_cnst_half (mode, , false); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_simd_move_hi_quad_be_ (operands[0], - operands[1], p)); - else - emit_insn (gen_aarch64_simd_move_hi_quad_ (operands[0], - operands[1], p)); - DONE; -}) - ;; Narrowing operations. (define_insn "aarch64_xtn_insn_le" @@ -1743,16 +1679,12 @@ (define_insn "*aarch64_narrow_trunc" (define_expand "vec_pack_trunc_" [(match_operand: 0 "register_operand") - (match_operand:VDN 1 "register_operand") - (match_operand:VDN 2 "register_operand")] + (match_operand:VDN 1 "general_operand") + (match_operand:VDN 2 "general_operand")] "TARGET_SIMD" { rtx tempreg = gen_reg_rtx (mode); - int lo = BYTES_BIG_ENDIAN ? 2 : 1; - int hi = BYTES_BIG_ENDIAN ? 1 : 2; - - emit_insn (gen_move_lo_quad_ (tempreg, operands[lo])); - emit_insn (gen_move_hi_quad_ (tempreg, operands[hi])); + emit_insn (gen_aarch64_vec_concat (tempreg, operands[1], operands[2])); emit_insn (gen_trunc2 (operands[0], tempreg)); DONE; }) @@ -3402,20 +3334,13 @@ (define_expand "vec_pack_trunc_v2df" (define_expand "vec_pack_trunc_df" [(set (match_operand:V2SF 0 "register_operand") - (vec_concat:V2SF - (float_truncate:SF - (match_operand:DF 1 "register_operand")) - (float_truncate:SF - (match_operand:DF 2 "register_operand")) - ))] + (vec_concat:V2SF + (float_truncate:SF (match_operand:DF 1 "general_operand")) + (float_truncate:SF (match_operand:DF 2 "general_operand"))))] "TARGET_SIMD" { rtx tmp = gen_reg_rtx (V2SFmode); - int lo = BYTES_BIG_ENDIAN ? 2 : 1; - int hi = BYTES_BIG_ENDIAN ? 1 : 2; - - emit_insn (gen_move_lo_quad_v2df (tmp, operands[lo])); - emit_insn (gen_move_hi_quad_v2df (tmp, operands[hi])); + emit_insn (gen_aarch64_vec_concatdf (tmp, operands[1], operands[2])); emit_insn (gen_aarch64_float_truncate_lo_v2sf (operands[0], tmp)); DONE; } From patchwork Wed Feb 9 17:02:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1590612 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Po0/MVcQ; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Jv5vj1QdMz9sFq for ; Thu, 10 Feb 2022 04:09:21 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B60F23858439 for ; Wed, 9 Feb 2022 17:09:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B60F23858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1644426558; bh=/qKixn+14F9x3KqVWKajkNQSlVJMSrKk4qQfBPF9e5I=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Po0/MVcQEwk+2luOzpOtrh7tnXDJbky+xx0JK3a/kFp8fVJTEJTMvYlYh3fuzhILc 2cSctynT7RcQIGaSvLmpkWHesyuro1YcI+m1+5qCsx8yUhYOatQbTgc15CwdB8QZmF ddPW1lAD3ZpIIl07BeOqtEWG0ec8MBjD2uIlXMQs= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4841F3857C75 for ; Wed, 9 Feb 2022 17:02:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4841F3857C75 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EBD25ED1 for ; Wed, 9 Feb 2022 09:02:21 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6473D3F73B for ; Wed, 9 Feb 2022 09:02:21 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed 8/8] aarch64: Extend vec_concat patterns to 8-byte vectors References: Date: Wed, 09 Feb 2022 17:02:19 +0000 In-Reply-To: (Richard Sandiford's message of "Wed, 09 Feb 2022 17:00:03 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch extends the previous support for 16-byte vec_concat so that it supports pairs of 4-byte elements. This too isn't strictly a regression fix, since the 8-byte forms weren't affected by the same problems as the 16-byte forms, but it leaves things in a more consistent state. gcc/ * config/aarch64/iterators.md (VDCSIF): New mode iterator. (VDBL): Handle SF. (single_wx, single_type, single_dtype, dblq): New mode attributes. * config/aarch64/aarch64-simd.md (load_pair_lanes): Extend from VDC to VDCSIF. (store_pair_lanes): Likewise. (*aarch64_combine_internal): Likewise. (*aarch64_combine_internal_be): Likewise. (*aarch64_combinez): Likewise. (*aarch64_combinez_be): Likewise. * config/aarch64/aarch64.cc (aarch64_classify_address): Handle 8-byte modes for ADDR_QUERY_LDP_STP_N. (aarch64_print_operand): Likewise for %y. gcc/testsuite/ * gcc.target/aarch64/vec-init-13.c: New test. * gcc.target/aarch64/vec-init-14.c: Likewise. * gcc.target/aarch64/vec-init-15.c: Likewise. * gcc.target/aarch64/vec-init-16.c: Likewise. * gcc.target/aarch64/vec-init-17.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 72 +++++----- gcc/config/aarch64/aarch64.cc | 16 ++- gcc/config/aarch64/iterators.md | 38 +++++- .../gcc.target/aarch64/vec-init-13.c | 123 ++++++++++++++++++ .../gcc.target/aarch64/vec-init-14.c | 123 ++++++++++++++++++ .../gcc.target/aarch64/vec-init-15.c | 15 +++ .../gcc.target/aarch64/vec-init-16.c | 12 ++ .../gcc.target/aarch64/vec-init-17.c | 73 +++++++++++ 8 files changed, 430 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-13.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-14.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-15.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-16.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-17.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ef6e772503d..18733428f3f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4243,12 +4243,12 @@ (define_insn_and_split "aarch64_get_lane" (define_insn "load_pair_lanes" [(set (match_operand: 0 "register_operand" "=w") (vec_concat: - (match_operand:VDC 1 "memory_operand" "Utq") - (match_operand:VDC 2 "memory_operand" "m")))] + (match_operand:VDCSIF 1 "memory_operand" "Utq") + (match_operand:VDCSIF 2 "memory_operand" "m")))] "TARGET_SIMD && aarch64_mergeable_load_pair_p (mode, operands[1], operands[2])" - "ldr\\t%q0, %1" - [(set_attr "type" "neon_load1_1reg_q")] + "ldr\\t%0, %1" + [(set_attr "type" "neon_load1_1reg")] ) ;; This STP pattern is a partial duplicate of the general vec_concat patterns @@ -4273,12 +4273,12 @@ (define_insn "load_pair_lanes" (define_insn "store_pair_lanes" [(set (match_operand: 0 "aarch64_mem_pair_lanes_operand" "=Umn, Umn") (vec_concat: - (match_operand:VDC 1 "register_operand" "w, r") - (match_operand:VDC 2 "register_operand" "w, r")))] + (match_operand:VDCSIF 1 "register_operand" "w, r") + (match_operand:VDCSIF 2 "register_operand" "w, r")))] "TARGET_SIMD" "@ - stp\\t%d1, %d2, %y0 - stp\\t%x1, %x2, %y0" + stp\t%1, %2, %y0 + stp\t%1, %2, %y0" [(set_attr "type" "neon_stp, store_16")] ) @@ -4292,37 +4292,37 @@ (define_insn "store_pair_lanes" (define_insn "*aarch64_combine_internal" [(set (match_operand: 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") (vec_concat: - (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r") - (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))] + (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r") + (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))] "TARGET_SIMD && !BYTES_BIG_ENDIAN && (register_operand (operands[0], mode) || register_operand (operands[2], mode))" "@ - ins\t%0.d[1], %2.d[0] - ins\t%0.d[1], %2 - ld1\t{%0.d}[1], %2 - stp\t%d1, %d2, %y0 - stp\t%x1, %x2, %y0" - [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")] + ins\t%0.[1], %2.[0] + ins\t%0.[1], %2 + ld1\t{%0.}[1], %2 + stp\t%1, %2, %y0 + stp\t%1, %2, %y0" + [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane, neon_stp, store_16")] ) (define_insn "*aarch64_combine_internal_be" [(set (match_operand: 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn") (vec_concat: - (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r") - (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")))] + (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r") + (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r")))] "TARGET_SIMD && BYTES_BIG_ENDIAN && (register_operand (operands[0], mode) || register_operand (operands[2], mode))" "@ - ins\t%0.d[1], %2.d[0] - ins\t%0.d[1], %2 - ld1\t{%0.d}[1], %2 - stp\t%d2, %d1, %y0 - stp\t%x2, %x1, %y0" - [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")] + ins\t%0.[1], %2.[0] + ins\t%0.[1], %2 + ld1\t{%0.}[1], %2 + stp\t%2, %1, %y0 + stp\t%2, %1, %y0" + [(set_attr "type" "neon_ins, neon_from_gp, neon_load1_one_lane, neon_stp, store_16")] ) ;; In this insn, operand 1 should be low, and operand 2 the high part of the @@ -4331,13 +4331,13 @@ (define_insn "*aarch64_combine_internal_be" (define_insn "*aarch64_combinez" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: - (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m") - (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))] + (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m") + (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero")))] "TARGET_SIMD && !BYTES_BIG_ENDIAN" "@ - mov\\t%0.8b, %1.8b - fmov\t%d0, %1 - ldr\\t%d0, %1" + fmov\\t%0, %1 + fmov\t%0, %1 + ldr\\t%0, %1" [(set_attr "type" "neon_move, neon_from_gp, neon_load1_1reg") (set_attr "arch" "simd,fp,simd")] ) @@ -4345,13 +4345,13 @@ (define_insn "*aarch64_combinez" (define_insn "*aarch64_combinez_be" [(set (match_operand: 0 "register_operand" "=w,w,w") (vec_concat: - (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero") - (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")))] + (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero") + (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m")))] "TARGET_SIMD && BYTES_BIG_ENDIAN" "@ - mov\\t%0.8b, %1.8b - fmov\t%d0, %1 - ldr\\t%d0, %1" + fmov\\t%0, %1 + fmov\t%0, %1 + ldr\\t%0, %1" [(set_attr "type" "neon_move, neon_from_gp, neon_load1_1reg") (set_attr "arch" "simd,fp,simd")] ) @@ -4362,8 +4362,8 @@ (define_insn "*aarch64_combinez_be" (define_expand "@aarch64_vec_concat" [(set (match_operand: 0 "register_operand") (vec_concat: - (match_operand:VDC 1 "general_operand") - (match_operand:VDC 2 "general_operand")))] + (match_operand:VDCSIF 1 "general_operand") + (match_operand:VDCSIF 2 "general_operand")))] "TARGET_SIMD" { int lo = BYTES_BIG_ENDIAN ? 2 : 1; diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index af42d1bedfe..7bb97bd48e4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -9922,9 +9922,15 @@ aarch64_classify_address (struct aarch64_address_info *info, /* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode corresponds to the actual size of the memory being loaded/stored and the mode of the corresponding addressing mode is half of that. */ - if (type == ADDR_QUERY_LDP_STP_N - && known_eq (GET_MODE_SIZE (mode), 16)) - mode = DFmode; + if (type == ADDR_QUERY_LDP_STP_N) + { + if (known_eq (GET_MODE_SIZE (mode), 16)) + mode = DFmode; + else if (known_eq (GET_MODE_SIZE (mode), 8)) + mode = SFmode; + else + return false; + } bool allow_reg_index_p = (!load_store_pair_p && ((vec_flags == 0 @@ -11404,7 +11410,9 @@ aarch64_print_operand (FILE *f, rtx x, int code) machine_mode mode = GET_MODE (x); if (!MEM_P (x) - || (code == 'y' && maybe_ne (GET_MODE_SIZE (mode), 16))) + || (code == 'y' + && maybe_ne (GET_MODE_SIZE (mode), 8) + && maybe_ne (GET_MODE_SIZE (mode), 16))) { output_operand_lossage ("invalid operand for '%%%c'", code); return; diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index a0c02e4ac15..88067a3536a 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -236,6 +236,9 @@ (define_mode_iterator VQW [V16QI V8HI V4SI]) ;; Double vector modes for combines. (define_mode_iterator VDC [V8QI V4HI V4BF V4HF V2SI V2SF DI DF]) +;; VDC plus SI and SF. +(define_mode_iterator VDCSIF [V8QI V4HI V4BF V4HF V2SI V2SF SI SF DI DF]) + ;; Polynomial modes for vector combines. (define_mode_iterator VDC_P [V8QI V4HI DI]) @@ -1436,8 +1439,8 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi") (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI") (V4HF "V8HF") (V4BF "V8BF") (V2SI "V4SI") (V2SF "V4SF") - (SI "V2SI") (DI "V2DI") - (DF "V2DF")]) + (SI "V2SI") (SF "V2SF") + (DI "V2DI") (DF "V2DF")]) ;; Register suffix for double-length mode. (define_mode_attr Vdtype [(V4HF "8h") (V2SF "4s")]) @@ -1557,6 +1560,30 @@ (define_mode_attr Vhalftype [(V16QI "8b") (V8HI "4h") (V4SI "2s") (V8HF "4h") (V4SF "2s")]) +;; Whether a mode fits in W or X registers (i.e. "w" for 32-bit modes +;; and "x" for 64-bit modes). +(define_mode_attr single_wx [(SI "w") (SF "w") + (V8QI "x") (V4HI "x") + (V4HF "x") (V4BF "x") + (V2SI "x") (V2SF "x") + (DI "x") (DF "x")]) + +;; Whether a mode fits in S or D registers (i.e. "s" for 32-bit modes +;; and "d" for 64-bit modes). +(define_mode_attr single_type [(SI "s") (SF "s") + (V8QI "d") (V4HI "d") + (V4HF "d") (V4BF "d") + (V2SI "d") (V2SF "d") + (DI "d") (DF "d")]) + +;; Whether a double-width mode fits in D or Q registers (i.e. "d" for +;; 32-bit modes and "q" for 64-bit modes). +(define_mode_attr single_dtype [(SI "d") (SF "d") + (V8QI "q") (V4HI "q") + (V4HF "q") (V4BF "q") + (V2SI "q") (V2SF "q") + (DI "q") (DF "q")]) + ;; Define corresponding core/FP element mode for each vector mode. (define_mode_attr vw [(V8QI "w") (V16QI "w") (V4HI "w") (V8HI "w") @@ -1849,6 +1876,13 @@ (define_mode_attr q [(V8QI "") (V16QI "_q") (V4x1DF "") (V4x2DF "_q") (V4x4BF "") (V4x8BF "_q")]) +;; Equivalent of the "q" attribute for the mode. +(define_mode_attr dblq [(SI "") (SF "") + (V8QI "_q") (V4HI "_q") + (V4HF "_q") (V4BF "_q") + (V2SI "_q") (V2SF "_q") + (DI "_q") (DF "_q")]) + (define_mode_attr vp [(V8QI "v") (V16QI "v") (V4HI "v") (V8HI "v") (V2SI "p") (V4SI "v") diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-13.c b/gcc/testsuite/gcc.target/aarch64/vec-init-13.c new file mode 100644 index 00000000000..d0f88cbe71a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-13.c @@ -0,0 +1,123 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#include + +/* +** s64q_1: +** fmov d0, x0 +** ret +*/ +int64x2_t s64q_1(int64_t a0) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { 0, a0 }; + else + return (int64x2_t) { a0, 0 }; +} +/* +** s64q_2: +** ldr d0, \[x0\] +** ret +*/ +int64x2_t s64q_2(int64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { 0, ptr[0] }; + else + return (int64x2_t) { ptr[0], 0 }; +} +/* +** s64q_3: +** ldr d0, \[x0, #?8\] +** ret +*/ +int64x2_t s64q_3(int64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int64x2_t) { 0, ptr[1] }; + else + return (int64x2_t) { ptr[1], 0 }; +} + +/* +** f64q_1: +** fmov d0, d0 +** ret +*/ +float64x2_t f64q_1(float64_t a0) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { 0, a0 }; + else + return (float64x2_t) { a0, 0 }; +} +/* +** f64q_2: +** ldr d0, \[x0\] +** ret +*/ +float64x2_t f64q_2(float64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { 0, ptr[0] }; + else + return (float64x2_t) { ptr[0], 0 }; +} +/* +** f64q_3: +** ldr d0, \[x0, #?8\] +** ret +*/ +float64x2_t f64q_3(float64_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float64x2_t) { 0, ptr[1] }; + else + return (float64x2_t) { ptr[1], 0 }; +} + +/* +** s32q_1: +** fmov d0, d0 +** ret +*/ +int32x4_t s32q_1(int32x2_t a0, int32x2_t a1) { + return vcombine_s32 (a0, (int32x2_t) { 0, 0 }); +} +/* +** s32q_2: +** ldr d0, \[x0\] +** ret +*/ +int32x4_t s32q_2(int32x2_t *ptr) { + return vcombine_s32 (ptr[0], (int32x2_t) { 0, 0 }); +} +/* +** s32q_3: +** ldr d0, \[x0, #?8\] +** ret +*/ +int32x4_t s32q_3(int32x2_t *ptr) { + return vcombine_s32 (ptr[1], (int32x2_t) { 0, 0 }); +} + +/* +** f32q_1: +** fmov d0, d0 +** ret +*/ +float32x4_t f32q_1(float32x2_t a0, float32x2_t a1) { + return vcombine_f32 (a0, (float32x2_t) { 0, 0 }); +} +/* +** f32q_2: +** ldr d0, \[x0\] +** ret +*/ +float32x4_t f32q_2(float32x2_t *ptr) { + return vcombine_f32 (ptr[0], (float32x2_t) { 0, 0 }); +} +/* +** f32q_3: +** ldr d0, \[x0, #?8\] +** ret +*/ +float32x4_t f32q_3(float32x2_t *ptr) { + return vcombine_f32 (ptr[1], (float32x2_t) { 0, 0 }); +} diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-14.c b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c new file mode 100644 index 00000000000..02875088cd9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c @@ -0,0 +1,123 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#include + +void ext(); + +/* +** s32_1: +** fmov s0, w0 +** ins v0\.s\[1\], w1 +** ret +*/ +int32x2_t s32_1(int32_t a0, int32_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { a1, a0 }; + else + return (int32x2_t) { a0, a1 }; +} +/* +** s32_2: +** fmov s0, w0 +** ld1 {v0\.s}\[1\], \[x1\] +** ret +*/ +int32x2_t s32_2(int32_t a0, int32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { ptr[0], a0 }; + else + return (int32x2_t) { a0, ptr[0] }; +} +/* +** s32_3: +** ldr s0, \[x0\] +** ins v0\.s\[1\], w1 +** ret +*/ +int32x2_t s32_3(int32_t *ptr, int32_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { a1, ptr[0] }; + else + return (int32x2_t) { ptr[0], a1 }; +} +/* +** s32_4: +** stp w1, w2, \[x0\] +** ret +*/ +void s32_4(int32x2_t *res, int32_t a0, int32_t a1) { + res[0] = (int32x2_t) { a0, a1 }; +} +/* +** s32_5: +** stp w1, w2, \[x0, #?4\] +** ret +*/ +void s32_5(uintptr_t res, int32_t a0, int32_t a1) { + *(int32x2_t *)(res + 4) = (int32x2_t) { a0, a1 }; +} +/* Currently uses d8 to hold res across the call. */ +int32x2_t s32_6(int32_t a0, int32_t a1) { + int32x2_t res = { a0, a1 }; + ext (); + return res; +} + +/* +** f32_1: +** ins v0\.s\[1\], v1\.s\[0\] +** ret +*/ +float32x2_t f32_1(float32_t a0, float32_t a1) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { a1, a0 }; + else + return (float32x2_t) { a0, a1 }; +} +/* +** f32_2: +** ld1 {v0\.s}\[1\], \[x0\] +** ret +*/ +float32x2_t f32_2(float32_t a0, float32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { ptr[0], a0 }; + else + return (float32x2_t) { a0, ptr[0] }; +} +/* +** f32_3: +** ldr s0, \[x0\] +** ins v0\.s\[1\], v1\.s\[0\] +** ret +*/ +float32x2_t f32_3(float32_t a0, float32_t a1, float32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { a1, ptr[0] }; + else + return (float32x2_t) { ptr[0], a1 }; +} +/* +** f32_4: +** stp s0, s1, \[x0\] +** ret +*/ +void f32_4(float32x2_t *res, float32_t a0, float32_t a1) { + res[0] = (float32x2_t) { a0, a1 }; +} +/* +** f32_5: +** stp s0, s1, \[x0, #?4\] +** ret +*/ +void f32_5(uintptr_t res, float32_t a0, float32_t a1) { + *(float32x2_t *)(res + 4) = (float32x2_t) { a0, a1 }; +} +/* Currently uses d8 to hold res across the call. */ +float32x2_t f32_6(float32_t a0, float32_t a1) { + float32x2_t res = { a0, a1 }; + ext (); + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-15.c b/gcc/testsuite/gcc.target/aarch64/vec-init-15.c new file mode 100644 index 00000000000..82f0a8f55ee --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-15.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +int32x2_t f1(int32_t *x, int c) { + return c ? (int32x2_t) { x[0], x[2] } : (int32x2_t) { 0, 0 }; +} + +int32x2_t f2(int32_t *x, int i0, int i1, int c) { + return c ? (int32x2_t) { x[i0], x[i1] } : (int32x2_t) { 0, 0 }; +} + +/* { dg-final { scan-assembler-times {\t(?:ldr\ts[0-9]+|ld1\t)} 4 } } */ +/* { dg-final { scan-assembler-not {\tldr\tw} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-16.c b/gcc/testsuite/gcc.target/aarch64/vec-init-16.c new file mode 100644 index 00000000000..e00aec7a32c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-16.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +void f1(int32x2_t *res, int32_t *x, int c0, int c1) { + res[0] = (int32x2_t) { c0 ? x[0] : 0, c1 ? x[2] : 0 }; +} + +/* { dg-final { scan-assembler-times {\tldr\tw[0-9]+} 2 } } */ +/* { dg-final { scan-assembler {\tstp\tw[0-9]+, w[0-9]+} } } */ +/* { dg-final { scan-assembler-not {\tldr\ts} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-17.c b/gcc/testsuite/gcc.target/aarch64/vec-init-17.c new file mode 100644 index 00000000000..86191b3ca1d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-17.c @@ -0,0 +1,73 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#include + +/* +** s32_1: +** fmov s0, w0 +** ret +*/ +int32x2_t s32_1(int32_t a0) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { 0, a0 }; + else + return (int32x2_t) { a0, 0 }; +} +/* +** s32_2: +** ldr s0, \[x0\] +** ret +*/ +int32x2_t s32_2(int32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { 0, ptr[0] }; + else + return (int32x2_t) { ptr[0], 0 }; +} +/* +** s32_3: +** ldr s0, \[x0, #?4\] +** ret +*/ +int32x2_t s32_3(int32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (int32x2_t) { 0, ptr[1] }; + else + return (int32x2_t) { ptr[1], 0 }; +} + +/* +** f32_1: +** fmov s0, s0 +** ret +*/ +float32x2_t f32_1(float32_t a0) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { 0, a0 }; + else + return (float32x2_t) { a0, 0 }; +} +/* +** f32_2: +** ldr s0, \[x0\] +** ret +*/ +float32x2_t f32_2(float32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { 0, ptr[0] }; + else + return (float32x2_t) { ptr[0], 0 }; +} +/* +** f32_3: +** ldr s0, \[x0, #?4\] +** ret +*/ +float32x2_t f32_3(float32_t *ptr) { + if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) + return (float32x2_t) { 0, ptr[1] }; + else + return (float32x2_t) { ptr[1], 0 }; +}