From patchwork Sat Nov 16 11:27:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1196094 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513783-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="QyWs82TN"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47FXy61KRpz9sP3 for ; Sat, 16 Nov 2019 22:27:40 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=o6RZvgROgGEguI6B+K3J8DyiSMxOxhPZ/0TKPEqaraVrbZJmVcECd MOcWI+Sa1xJtWMBWXuT6E8K5voTdUp2B3Zf6NfOFAdRRa0Gr8VoVp/58H48zPSAT X6J95xUxhE4GCOfBf9G4RkNWO+O7DgVAqL3qlE/mnVubHCFkaOM720= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=h8U61ozLkI6zxr5K2CaYw1EXBPA=; b=QyWs82TNET12ok8CJl4T tPsKbT5XP/iadEelwstbkCs7i11DfQyiNt5nSADNfcvzDKv17reGF11UDaxlv4Ew eHP4ChirkPyk3yO5HndEc2ouzIsAm/cZXeGrojafFQJtuyGr0nxr2nf1nXtqLMmF o4ut7EmQAX2BZNJ2zYNBVsI= Received: (qmail 10883 invoked by alias); 16 Nov 2019 11:27:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 10873 invoked by uid 89); 16 Nov 2019 11:27:30 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 16 Nov 2019 11:27:25 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EB5DE30E for ; Sat, 16 Nov 2019 03:27:22 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5A1243F534 for ; Sat, 16 Nov 2019 03:27:22 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed][AArch64] Pattern-match SVE extending gather loads Date: Sat, 16 Nov 2019 11:27:21 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes This patch pattern-matches a partial gather load followed by a sign or zero extension into an extending gather load. (The partial gather load is already an extending load; we just don't rely on the upper bits of the elements.) Tested on aarch64-linux-gnu and applied as r278346. Richard 2019-11-16 Richard Sandiford gcc/ * config/aarch64/iterators.md (SVE_2BHSI, SVE_2HSDI, SVE_4BHI) (SVE_4HSI): New mode iterators. (ANY_EXTEND2): New code iterator. * config/aarch64/aarch64-sve.md (@aarch64_gather_load_): Extend to... (@aarch64_gather_load_): ...this, handling extension to partial modes as well as full modes. Describe the extension as a predicated rather than unpredicated extension. (@aarch64_gather_load_): Likewise extend to... (@aarch64_gather_load_): ...this, making the same adjustments. (*aarch64_gather_load__sxtw): Likewise extend to... (*aarch64_gather_load__sxtw) ...this, making the same adjustments. (*aarch64_gather_load__uxtw): Likewise extend to... (*aarch64_gather_load__uxtw) ...this, making the same adjustments. (*aarch64_gather_load__xtw_unpacked): New pattern. (*aarch64_ldff1_gather_sxtw): Canonicalize to a constant extension predicate. (@aarch64_ldff1_gather_) (@aarch64_ldff1_gather_) (*aarch64_ldff1_gather__uxtw): Describe the extension as a predicated rather than unpredicated extension. (*aarch64_ldff1_gather__sxtw): Likewise. Canonicalize to a constant extension predicate. * config/aarch64/aarch64-sve-builtins-base.cc (svld1_gather_extend_impl::expand): Add an extra predicate for the extension. (svldff1_gather_extend_impl::expand): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/gather_load_extend_1.c: New test. * gcc.target/aarch64/sve/gather_load_extend_2.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_3.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_4.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_5.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_6.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_7.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_8.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_9.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_10.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_11.c: Likewise. * gcc.target/aarch64/sve/gather_load_extend_12.c: Likewise. Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2019-11-16 11:20:23.477584640 +0000 +++ gcc/config/aarch64/iterators.md 2019-11-16 11:23:58.060071392 +0000 @@ -371,9 +371,21 @@ (define_mode_iterator SVE_24 [VNx2QI VNx ;; SVE modes with 2 elements. (define_mode_iterator SVE_2 [VNx2QI VNx2HI VNx2HF VNx2SI VNx2SF VNx2DI VNx2DF]) +;; SVE integer modes with 2 elements, excluding the widest element. +(define_mode_iterator SVE_2BHSI [VNx2QI VNx2HI VNx2SI]) + +;; SVE integer modes with 2 elements, excluding the narrowest element. +(define_mode_iterator SVE_2HSDI [VNx2HI VNx2SI VNx2DI]) + ;; SVE modes with 4 elements. (define_mode_iterator SVE_4 [VNx4QI VNx4HI VNx4HF VNx4SI VNx4SF]) +;; SVE integer modes with 4 elements, excluding the widest element. +(define_mode_iterator SVE_4BHI [VNx4QI VNx4HI]) + +;; SVE integer modes with 4 elements, excluding the narrowest element. +(define_mode_iterator SVE_4HSI [VNx4HI VNx4SI]) + ;; Modes involved in extending or truncating SVE data, for 8 elements per ;; 128-bit block. (define_mode_iterator VNx8_NARROW [VNx8QI]) @@ -1443,6 +1455,7 @@ (define_code_iterator NEG_NOT [neg not]) ;; Code iterator for sign/zero extension (define_code_iterator ANY_EXTEND [sign_extend zero_extend]) +(define_code_iterator ANY_EXTEND2 [sign_extend zero_extend]) ;; All division operations (signed/unsigned) (define_code_iterator ANY_DIV [div udiv]) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2019-11-16 11:20:23.477584640 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2019-11-16 11:23:58.060071392 +0000 @@ -1446,93 +1446,150 @@ (define_insn "*mask_gather_load" - [(set (match_operand:VNx4_WIDE 0 "register_operand" "=w, w, w, w, w, w") - (ANY_EXTEND:VNx4_WIDE - (unspec:VNx4_NARROW - [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl") - (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk, rk, rk") - (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w") - (match_operand:DI 3 "const_int_operand" "Ui1, Ui1, Z, Ui1, Z, Ui1") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, Ui1, i, i") - (mem:BLK (scratch))] - UNSPEC_LD1_GATHER)))] - "TARGET_SVE" - "@ - ld1\t%0.s, %5/z, [%2.s] - ld1\t%0.s, %5/z, [%2.s, #%1] - ld1\t%0.s, %5/z, [%1, %2.s, sxtw] - ld1\t%0.s, %5/z, [%1, %2.s, uxtw] - ld1\t%0.s, %5/z, [%1, %2.s, sxtw %p4] - ld1\t%0.s, %5/z, [%1, %2.s, uxtw %p4]" +(define_insn_and_rewrite "@aarch64_gather_load_" + [(set (match_operand:SVE_4HSI 0 "register_operand" "=w, w, w, w, w, w") + (unspec:SVE_4HSI + [(match_operand:VNx4BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm, UplDnm, UplDnm") + (ANY_EXTEND:SVE_4HSI + (unspec:SVE_4BHI + [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk, rk, rk") + (match_operand:VNx4SI 2 "register_operand" "w, w, w, w, w, w") + (match_operand:DI 3 "const_int_operand" "Ui1, Ui1, Z, Ui1, Z, Ui1") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, Ui1, i, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "@ + ld1\t%0.s, %5/z, [%2.s] + ld1\t%0.s, %5/z, [%2.s, #%1] + ld1\t%0.s, %5/z, [%1, %2.s, sxtw] + ld1\t%0.s, %5/z, [%1, %2.s, uxtw] + ld1\t%0.s, %5/z, [%1, %2.s, sxtw %p4] + ld1\t%0.s, %5/z, [%1, %2.s, uxtw %p4]" + "&& !CONSTANT_P (operands[6])" + { + operands[6] = CONSTM1_RTX (VNx4BImode); + } ) ;; Predicated extending gather loads for 64-bit elements. The value of ;; operand 3 doesn't matter in this case. -(define_insn "@aarch64_gather_load_" - [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w, w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") - (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk") - (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w") - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, i") - (mem:BLK (scratch))] - UNSPEC_LD1_GATHER)))] - "TARGET_SVE" - "@ - ld1\t%0.d, %5/z, [%2.d] - ld1\t%0.d, %5/z, [%2.d, #%1] - ld1\t%0.d, %5/z, [%1, %2.d] - ld1\t%0.d, %5/z, [%1, %2.d, lsl %p4]" +(define_insn_and_rewrite "@aarch64_gather_load_" + [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w, w, w") + (unspec:SVE_2HSDI + [(match_operand:VNx2BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm") + (ANY_EXTEND:SVE_2HSDI + (unspec:SVE_2BHSI + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk") + (match_operand:VNx2DI 2 "register_operand" "w, w, w, w") + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "@ + ld1\t%0.d, %5/z, [%2.d] + ld1\t%0.d, %5/z, [%2.d, #%1] + ld1\t%0.d, %5/z, [%1, %2.d] + ld1\t%0.d, %5/z, [%1, %2.d, lsl %p4]" + "&& !CONSTANT_P (operands[6])" + { + operands[6] = CONSTM1_RTX (VNx2BImode); + } ) -;; Likewise, but with the offset being sign-extended from 32 bits. -(define_insn_and_rewrite "*aarch64_gather_load__sxtw" - [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") - (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") - (unspec:VNx2DI - [(match_operand 6) - (sign_extend:VNx2DI - (truncate:VNx2SI - (match_operand:VNx2DI 2 "register_operand" "w, w")))] - UNSPEC_PRED_X) - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") - (mem:BLK (scratch))] - UNSPEC_LD1_GATHER)))] - "TARGET_SVE" - "@ - ld1\t%0.d, %5/z, [%1, %2.d, sxtw] - ld1\t%0.d, %5/z, [%1, %2.d, sxtw %p4]" - "&& !rtx_equal_p (operands[5], operands[6])" +;; Likewise, but with the offset being extended from 32 bits. +(define_insn_and_rewrite "*aarch64_gather_load__xtw_unpacked" + [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w") + (unspec:SVE_2HSDI + [(match_operand 6) + (ANY_EXTEND:SVE_2HSDI + (unspec:SVE_2BHSI + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") + (unspec:VNx2DI + [(match_operand 7) + (ANY_EXTEND2:VNx2DI + (match_operand:VNx2SI 2 "register_operand" "w, w"))] + UNSPEC_PRED_X) + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "@ + ld1\t%0.d, %5/z, [%1, %2.d, xtw] + ld1\t%0.d, %5/z, [%1, %2.d, xtw %p4]" + "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))" { - operands[6] = copy_rtx (operands[5]); + operands[6] = CONSTM1_RTX (VNx2BImode); + operands[7] = CONSTM1_RTX (VNx2BImode); } ) -;; Likewise, but with the offset being zero-extended from 32 bits. -(define_insn "*aarch64_gather_load__uxtw" - [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") - (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") - (and:VNx2DI - (match_operand:VNx2DI 2 "register_operand" "w, w") - (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") - (mem:BLK (scratch))] - UNSPEC_LD1_GATHER)))] - "TARGET_SVE" - "@ - ld1\t%0.d, %5/z, [%1, %2.d, uxtw] - ld1\t%0.d, %5/z, [%1, %2.d, uxtw %p4]" +;; Likewise, but with the offset being truncated to 32 bits and then +;; sign-extended. +(define_insn_and_rewrite "*aarch64_gather_load__sxtw" + [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w") + (unspec:SVE_2HSDI + [(match_operand 6) + (ANY_EXTEND:SVE_2HSDI + (unspec:SVE_2BHSI + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") + (unspec:VNx2DI + [(match_operand 7) + (sign_extend:VNx2DI + (truncate:VNx2SI + (match_operand:VNx2DI 2 "register_operand" "w, w")))] + UNSPEC_PRED_X) + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "@ + ld1\t%0.d, %5/z, [%1, %2.d, sxtw] + ld1\t%0.d, %5/z, [%1, %2.d, sxtw %p4]" + "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))" + { + operands[6] = CONSTM1_RTX (VNx2BImode); + operands[7] = CONSTM1_RTX (VNx2BImode); + } +) + +;; Likewise, but with the offset being truncated to 32 bits and then +;; zero-extended. +(define_insn_and_rewrite "*aarch64_gather_load__uxtw" + [(set (match_operand:SVE_2HSDI 0 "register_operand" "=w, w") + (unspec:SVE_2HSDI + [(match_operand 7) + (ANY_EXTEND:SVE_2HSDI + (unspec:SVE_2BHSI + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") + (and:VNx2DI + (match_operand:VNx2DI 2 "register_operand" "w, w") + (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") + (mem:BLK (scratch))] + UNSPEC_LD1_GATHER))] + UNSPEC_PRED_X))] + "TARGET_SVE && (~ & ) == 0" + "@ + ld1\t%0.d, %5/z, [%1, %2.d, uxtw] + ld1\t%0.d, %5/z, [%1, %2.d, uxtw %p4]" + "&& !CONSTANT_P (operands[7])" + { + operands[7] = CONSTM1_RTX (VNx2BImode); + } ) ;; ------------------------------------------------------------------------- @@ -1608,9 +1665,9 @@ (define_insn_and_rewrite "*aarch64_ldff1 "@ ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw] ldff1d\t%0.d, %5/z, [%1, %2.d, sxtw %p4]" - "&& !rtx_equal_p (operands[5], operands[6])" + "&& !CONSTANT_P (operands[6])" { - operands[6] = copy_rtx (operands[5]); + operands[6] = CONSTM1_RTX (VNx2BImode); } ) @@ -1648,18 +1705,21 @@ (define_insn "*aarch64_ldff1_gather" +(define_insn_and_rewrite "@aarch64_ldff1_gather_" [(set (match_operand:VNx4_WIDE 0 "register_operand" "=w, w, w, w, w, w") - (ANY_EXTEND:VNx4_WIDE - (unspec:VNx4_NARROW - [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl") - (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk, rk, rk") - (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w") - (match_operand:DI 3 "const_int_operand" "i, i, Z, Ui1, Z, Ui1") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, Ui1, i, i") - (mem:BLK (scratch)) - (reg:VNx16BI FFRT_REGNUM)] - UNSPEC_LDFF1_GATHER)))] + (unspec:VNx4_WIDE + [(match_operand:VNx4BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm, UplDnm, UplDnm") + (ANY_EXTEND:VNx4_WIDE + (unspec:VNx4_NARROW + [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk, rk, rk") + (match_operand:VNx4_WIDE 2 "register_operand" "w, w, w, w, w, w") + (match_operand:DI 3 "const_int_operand" "i, i, Z, Ui1, Z, Ui1") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, Ui1, i, i") + (mem:BLK (scratch)) + (reg:VNx16BI FFRT_REGNUM)] + UNSPEC_LDFF1_GATHER))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ ldff1\t%0.s, %5/z, [%2.s] @@ -1668,77 +1728,99 @@ (define_insn "@aarch64_ldff1_gather_\t%0.s, %5/z, [%1, %2.s, uxtw] ldff1\t%0.s, %5/z, [%1, %2.s, sxtw %p4] ldff1\t%0.s, %5/z, [%1, %2.s, uxtw %p4]" + "&& !CONSTANT_P (operands[6])" + { + operands[6] = CONSTM1_RTX (VNx4BImode); + } ) ;; Predicated extending first-faulting gather loads for 64-bit elements. ;; The value of operand 3 doesn't matter in this case. -(define_insn "@aarch64_ldff1_gather_" +(define_insn_and_rewrite "@aarch64_ldff1_gather_" [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w, w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") - (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk") - (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w") - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, i") - (mem:BLK (scratch)) - (reg:VNx16BI FFRT_REGNUM)] - UNSPEC_LDFF1_GATHER)))] + (unspec:VNx2_WIDE + [(match_operand:VNx2BI 6 "general_operand" "UplDnm, UplDnm, UplDnm, UplDnm") + (ANY_EXTEND:VNx2_WIDE + (unspec:VNx2_NARROW + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") + (match_operand:DI 1 "aarch64_sve_gather_offset_" "Z, vg, rk, rk") + (match_operand:VNx2_WIDE 2 "register_operand" "w, w, w, w") + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, Ui1, Ui1, i") + (mem:BLK (scratch)) + (reg:VNx16BI FFRT_REGNUM)] + UNSPEC_LDFF1_GATHER))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ ldff1\t%0.d, %5/z, [%2.d] ldff1\t%0.d, %5/z, [%2.d, #%1] ldff1\t%0.d, %5/z, [%1, %2.d] ldff1\t%0.d, %5/z, [%1, %2.d, lsl %p4]" + "&& !CONSTANT_P (operands[6])" + { + operands[6] = CONSTM1_RTX (VNx2BImode); + } ) ;; Likewise, but with the offset being sign-extended from 32 bits. (define_insn_and_rewrite "*aarch64_ldff1_gather__sxtw" [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") - (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") - (unspec:VNx2DI - [(match_operand 6) - (sign_extend:VNx2DI - (truncate:VNx2SI - (match_operand:VNx2DI 2 "register_operand" "w, w")))] - UNSPEC_PRED_X) - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") - (mem:BLK (scratch)) - (reg:VNx16BI FFRT_REGNUM)] - UNSPEC_LDFF1_GATHER)))] + (unspec:VNx2_WIDE + [(match_operand 6) + (ANY_EXTEND:VNx2_WIDE + (unspec:VNx2_NARROW + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") + (unspec:VNx2DI + [(match_operand 7) + (sign_extend:VNx2DI + (truncate:VNx2SI + (match_operand:VNx2DI 2 "register_operand" "w, w")))] + UNSPEC_PRED_X) + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") + (mem:BLK (scratch)) + (reg:VNx16BI FFRT_REGNUM)] + UNSPEC_LDFF1_GATHER))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ ldff1\t%0.d, %5/z, [%1, %2.d, sxtw] ldff1\t%0.d, %5/z, [%1, %2.d, sxtw %p4]" - "&& !rtx_equal_p (operands[5], operands[6])" + "&& (!CONSTANT_P (operands[6]) || !CONSTANT_P (operands[7]))" { - operands[6] = copy_rtx (operands[5]); + operands[6] = CONSTM1_RTX (VNx2BImode); + operands[7] = CONSTM1_RTX (VNx2BImode); } ) ;; Likewise, but with the offset being zero-extended from 32 bits. -(define_insn "*aarch64_ldff1_gather__uxtw" +(define_insn_and_rewrite "*aarch64_ldff1_gather__uxtw" [(set (match_operand:VNx2_WIDE 0 "register_operand" "=w, w") - (ANY_EXTEND:VNx2_WIDE - (unspec:VNx2_NARROW - [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") - (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") - (and:VNx2DI - (match_operand:VNx2DI 2 "register_operand" "w, w") - (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) - (match_operand:DI 3 "const_int_operand") - (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") - (mem:BLK (scratch)) - (reg:VNx16BI FFRT_REGNUM)] - UNSPEC_LDFF1_GATHER)))] + (unspec:VNx2_WIDE + [(match_operand 7) + (ANY_EXTEND:VNx2_WIDE + (unspec:VNx2_NARROW + [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") + (match_operand:DI 1 "aarch64_reg_or_zero" "rk, rk") + (and:VNx2DI + (match_operand:VNx2DI 2 "register_operand" "w, w") + (match_operand:VNx2DI 6 "aarch64_sve_uxtw_immediate")) + (match_operand:DI 3 "const_int_operand") + (match_operand:DI 4 "aarch64_gather_scale_operand_" "Ui1, i") + (mem:BLK (scratch)) + (reg:VNx16BI FFRT_REGNUM)] + UNSPEC_LDFF1_GATHER))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ ldff1\t%0.d, %5/z, [%1, %2.d, uxtw] ldff1\t%0.d, %5/z, [%1, %2.d, uxtw %p4]" + "&& !CONSTANT_P (operands[7])" + { + operands[7] = CONSTM1_RTX (VNx2BImode); + } ) ;; ========================================================================= Index: gcc/config/aarch64/aarch64-sve-builtins-base.cc =================================================================== --- gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-11-08 08:32:05.949440451 +0000 +++ gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-11-16 11:23:58.060071392 +0000 @@ -1097,6 +1097,8 @@ public: /* Put the predicate last, since the extending gathers use the same operand order as mask_gather_load_optab. */ e.rotate_inputs_left (0, 5); + /* Add a constant predicate for the extension rtx. */ + e.args.quick_push (CONSTM1_RTX (VNx16BImode)); insn_code icode = code_for_aarch64_gather_load (extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); @@ -1234,6 +1236,8 @@ public: /* Put the predicate last, since ldff1_gather uses the same operand order as mask_gather_load_optab. */ e.rotate_inputs_left (0, 5); + /* Add a constant predicate for the extension rtx. */ + e.args.quick_push (CONSTM1_RTX (VNx16BImode)); insn_code icode = code_for_aarch64_ldff1_gather (extend_rtx_code (), e.vector_mode (0), e.memory_vector_mode ()); Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_1.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_1.c 2019-11-16 11:23:58.060071392 +0000 @@ -0,0 +1,34 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, uint32_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (uint16_t, uint8_t) \ + T (uint32_t, uint8_t) \ + T (uint64_t, uint8_t) \ + T (uint32_t, uint16_t) \ + T (uint64_t, uint16_t) \ + T (uint64_t, uint32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */ + +/* { dg-final { scan-assembler-not {\tuxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_2.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_2.c 2019-11-16 11:23:58.060071392 +0000 @@ -0,0 +1,34 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, uint32_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (int16_t, int8_t) \ + T (int32_t, int8_t) \ + T (int64_t, int8_t) \ + T (int32_t, int16_t) \ + T (int64_t, int16_t) \ + T (int64_t, int32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, uxtw 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */ + +/* { dg-final { scan-assembler-not {\tsxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_3.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_3.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,34 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, int32_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (uint16_t, uint8_t) \ + T (uint32_t, uint8_t) \ + T (uint64_t, uint8_t) \ + T (uint32_t, uint16_t) \ + T (uint64_t, uint16_t) \ + T (uint64_t, uint32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */ +/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */ + +/* { dg-final { scan-assembler-not {\tuxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_4.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_4.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,34 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, int32_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (int16_t, int8_t) \ + T (int32_t, int8_t) \ + T (int64_t, int8_t) \ + T (int32_t, int16_t) \ + T (int64_t, int16_t) \ + T (int64_t, int32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+\.s, sxtw 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 7 } } */ +/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 3 } } */ + +/* { dg-final { scan-assembler-not {\tsxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_5.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_5.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,29 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, uint64_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (uint16_t, uint8_t) \ + T (uint32_t, uint8_t) \ + T (uint64_t, uint8_t) \ + T (uint32_t, uint16_t) \ + T (uint64_t, uint16_t) \ + T (uint64_t, uint32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-not {\tuxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_6.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_6.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,29 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST_LOOP(TYPE1, TYPE2) \ + void \ + f_##TYPE1##_##TYPE2 (TYPE1 *restrict dst, TYPE1 *restrict src1, \ + TYPE2 *restrict src2, uint64_t *restrict index, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] += src1[i] + src2[index[i]]; \ + } + +#define TEST_ALL(T) \ + T (int16_t, int8_t) \ + T (int32_t, int8_t) \ + T (int64_t, int8_t) \ + T (int32_t, int16_t) \ + T (int64_t, int16_t) \ + T (int64_t, int32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 1\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sw\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+\.d, lsl 2\]\n} 1 } } */ + +/* { dg-final { scan-assembler-not {\tsxt.\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_7.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_7.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint16_t) (src1[i] + src2[index[i]]); +} + +void +f2 (uint64_t *restrict dst, uint16_t *src1, uint8_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint16_t) (src1[i] + src2[index[i]]); +} + +void +f3 (uint64_t *restrict dst, uint16_t *src1, uint8_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint16_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tuxth\tz[0-9]+\.d,} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_8.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_8.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + src2[index[i]]); +} + +void +f2 (uint64_t *restrict dst, uint32_t *src1, uint8_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + src2[index[i]]); +} + +void +f3 (uint64_t *restrict dst, uint32_t *src1, uint8_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_9.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_9.c 2019-11-16 11:23:58.064071363 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + src2[index[i]]); +} + +void +f2 (uint64_t *restrict dst, uint32_t *src1, uint16_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + src2[index[i]]); +} + +void +f3 (uint64_t *restrict dst, uint32_t *src1, uint16_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (uint32_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tuxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_10.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_10.c 2019-11-16 11:23:58.060071392 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int16_t) (src1[i] + src2[index[i]]); +} + +void +f2 (int64_t *restrict dst, int16_t *src1, int8_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int16_t) (src1[i] + src2[index[i]]); +} + +void +f3 (int64_t *restrict dst, int16_t *src1, int8_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int16_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tsxth\tz[0-9]+\.d,} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_11.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_11.c 2019-11-16 11:23:58.060071392 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + src2[index[i]]); +} + +void +f2 (int64_t *restrict dst, int32_t *src1, int8_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + src2[index[i]]); +} + +void +f3 (int64_t *restrict dst, int32_t *src1, int8_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sb\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_12.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/gather_load_extend_12.c 2019-11-16 11:23:58.060071392 +0000 @@ -0,0 +1,39 @@ +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=512" } */ + +#include + +void +f1 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint32_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + src2[index[i]]); +} + +void +f2 (int64_t *restrict dst, int32_t *src1, int16_t *src2, uint64_t *index) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + src2[index[i]]); +} + +void +f3 (int64_t *restrict dst, int32_t *src1, int16_t **src2) +{ + for (int i = 0; i < 7; ++i) + dst[i] += (int32_t) (src1[i] + *src2[i]); +} + +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[x2, z[0-9]+\.d, lsl 1\]\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1sh\tz[0-9]+\.d, p[0-7]/z, \[z[0-9]+\.d\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x0\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x2\]\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x3\]\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tadd\tz} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tsxt.\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tsxtw\tz[0-9]+\.d,} 3 } } */