From patchwork Thu Aug 15 08:48:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1147491 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-507022-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="rRG2GpLJ"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 468KqT5l0Jz9sNm for ; Thu, 15 Aug 2019 18:48:36 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=t39S1JNVHcn3cTFPPw9SBK1+swoYDl0UN8mv2WIa3o9pUeEdWXOfT eHOApCKQtXczuFAeUxLxl/MRw4AQNt3hYOsUueJegZxasOs4ddoHINlcKYtglZtb 2FhfV/2SrKH2z/FXP+GBYNvhG9KNDecxp6yAWwRUCLdF/o7lS+YN7k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=CpZgdqoUKaGbCrxwxu4NJyvFXv4=; b=rRG2GpLJQYUSHhtkk2pA qDJn7A1urOST41EYZrhuzLhBV3sTTn8KrQYc6l9vqRx1CAWYSsNx2EGADjUm91PY o+9hmMu59+ZX73IkkeETckqoKcNmbBO91eSkkkQrZPGM/lyrsa6eRqabJZIQQCq7 TyNSasmqodYZ4Fsnra7Axm4= Received: (qmail 110246 invoked by alias); 15 Aug 2019 08:48:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 110108 invoked by uid 89); 15 Aug 2019 08:48:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy=cse, CSE X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 15 Aug 2019 08:48:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3888428 for ; Thu, 15 Aug 2019 01:48:09 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D372D3F706 for ; Thu, 15 Aug 2019 01:48:08 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed][AArch64] Rework SVE INC/DEC handling Date: Thu, 15 Aug 2019 09:48:07 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes The scalar addition patterns allowed all the VL constants that ADDVL and ADDPL allow, but wrote the instructions as INC or DEC if possible (i.e. adding or subtracting a number of elements * [1, 16] when the source and target registers the same). That works for the cases that the autovectoriser needs, but there are a few constants that INC and DEC can handle but ADDPL and ADDVL can't. E.g.: inch x0, all, mul #9 is not a multiple of the number of bytes in an SVE register, and so can't use ADDVL. It represents 36 times the number of bytes in an SVE predicate, putting it outside the range of ADDPL. This patch therefore adds separate alternatives for INC and DEC, tied to a new Uai constraint. It also adds an explicit "scalar" or "vector" to the function names, to avoid a clash with the existing support for vector INC and DEC. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274518. Richard 2019-08-15 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_sve_scalar_inc_dec_immediate_p): Declare. (aarch64_sve_inc_dec_immediate_p): Rename to... (aarch64_sve_vector_inc_dec_immediate_p): ...this. (aarch64_output_sve_addvl_addpl): Take a single rtx argument. (aarch64_output_sve_scalar_inc_dec): Declare. (aarch64_output_sve_inc_dec_immediate): Rename to... (aarch64_output_sve_vector_inc_dec): ...this. * config/aarch64/aarch64.c (aarch64_sve_scalar_inc_dec_immediate_p) (aarch64_output_sve_scalar_inc_dec): New functions. (aarch64_output_sve_addvl_addpl): Remove the base and offset arguments. Only handle true ADDVL and ADDPL instructions; don't emit an INC or DEC. (aarch64_sve_inc_dec_immediate_p): Rename to... (aarch64_sve_vector_inc_dec_immediate_p): ...this. (aarch64_output_sve_inc_dec_immediate): Rename to... (aarch64_output_sve_vector_inc_dec): ...this. Update call to aarch64_sve_vector_inc_dec_immediate_p. * config/aarch64/predicates.md (aarch64_sve_scalar_inc_dec_immediate) (aarch64_sve_plus_immediate): New predicates. (aarch64_pluslong_operand): Accept aarch64_sve_plus_immediate rather than aarch64_sve_addvl_addpl_immediate. (aarch64_sve_inc_dec_immediate): Rename to... (aarch64_sve_vector_inc_dec_immediate): ...this. Update call to aarch64_sve_vector_inc_dec_immediate_p. (aarch64_sve_add_operand): Update accordingly. * config/aarch64/constraints.md (Uai): New constraint. (vsi): Update call to aarch64_sve_vector_inc_dec_immediate_p. * config/aarch64/aarch64.md (add3): Don't force the second operand into a register if it satisfies aarch64_sve_plus_immediate. (*add3_aarch64, *add3_poly_1): Add an alternative for Uai. Update calls to aarch64_output_sve_addvl_addpl. * config/aarch64/aarch64-sve.md (add3): Call aarch64_output_sve_vector_inc_dec instead of aarch64_output_sve_inc_dec_immediate. Index: gcc/config/aarch64/aarch64-protos.h =================================================================== --- gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:22:03.039558220 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-15 09:47:06.552458841 +0100 @@ -476,8 +476,9 @@ bool aarch64_zero_extend_const_eq (machi bool aarch64_move_imm (HOST_WIDE_INT, machine_mode); opt_machine_mode aarch64_sve_pred_mode (unsigned int); bool aarch64_sve_cnt_immediate_p (rtx); +bool aarch64_sve_scalar_inc_dec_immediate_p (rtx); bool aarch64_sve_addvl_addpl_immediate_p (rtx); -bool aarch64_sve_inc_dec_immediate_p (rtx); +bool aarch64_sve_vector_inc_dec_immediate_p (rtx); int aarch64_add_offset_temporaries (rtx); void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx); bool aarch64_mov_operand_p (rtx, machine_mode); @@ -485,8 +486,9 @@ rtx aarch64_reverse_mask (machine_mode, bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64); bool aarch64_offset_9bit_signed_unscaled_p (machine_mode, poly_int64); char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx); -char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx); -char *aarch64_output_sve_inc_dec_immediate (const char *, rtx); +char *aarch64_output_sve_scalar_inc_dec (rtx); +char *aarch64_output_sve_addvl_addpl (rtx); +char *aarch64_output_sve_vector_inc_dec (const char *, rtx); char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode); char *aarch64_output_simd_mov_immediate (rtx, unsigned, enum simd_immediate_check w = AARCH64_CHECK_MOV); Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-08-15 09:43:30.758050951 +0100 +++ gcc/config/aarch64/aarch64.c 2019-08-15 09:47:06.560458781 +0100 @@ -2950,6 +2950,33 @@ aarch64_output_sve_cnt_immediate (const value.coeffs[1], 0); } +/* Return true if we can add X using a single SVE INC or DEC instruction. */ + +bool +aarch64_sve_scalar_inc_dec_immediate_p (rtx x) +{ + poly_int64 value; + return (poly_int_rtx_p (x, &value) + && (aarch64_sve_cnt_immediate_p (value) + || aarch64_sve_cnt_immediate_p (-value))); +} + +/* Return the asm string for adding SVE INC/DEC immediate OFFSET to + operand 0. */ + +char * +aarch64_output_sve_scalar_inc_dec (rtx offset) +{ + poly_int64 offset_value = rtx_to_poly_int64 (offset); + gcc_assert (offset_value.coeffs[0] == offset_value.coeffs[1]); + if (offset_value.coeffs[1] > 0) + return aarch64_output_sve_cnt_immediate ("inc", "%x0", + offset_value.coeffs[1], 0); + else + return aarch64_output_sve_cnt_immediate ("dec", "%x0", + -offset_value.coeffs[1], 0); +} + /* Return true if we can add VALUE to a register using a single ADDVL or ADDPL instruction. */ @@ -2975,27 +3002,16 @@ aarch64_sve_addvl_addpl_immediate_p (rtx && aarch64_sve_addvl_addpl_immediate_p (value)); } -/* Return the asm string for adding ADDVL or ADDPL immediate X to operand 1 - and storing the result in operand 0. */ +/* Return the asm string for adding ADDVL or ADDPL immediate OFFSET + to operand 1 and storing the result in operand 0. */ char * -aarch64_output_sve_addvl_addpl (rtx dest, rtx base, rtx offset) +aarch64_output_sve_addvl_addpl (rtx offset) { static char buffer[sizeof ("addpl\t%x0, %x1, #-") + 3 * sizeof (int)]; poly_int64 offset_value = rtx_to_poly_int64 (offset); gcc_assert (aarch64_sve_addvl_addpl_immediate_p (offset_value)); - /* Use INC or DEC if possible. */ - if (rtx_equal_p (dest, base) && GP_REGNUM_P (REGNO (dest))) - { - if (aarch64_sve_cnt_immediate_p (offset_value)) - return aarch64_output_sve_cnt_immediate ("inc", "%x0", - offset_value.coeffs[1], 0); - if (aarch64_sve_cnt_immediate_p (-offset_value)) - return aarch64_output_sve_cnt_immediate ("dec", "%x0", - -offset_value.coeffs[1], 0); - } - int factor = offset_value.coeffs[1]; if ((factor & 15) == 0) snprintf (buffer, sizeof (buffer), "addvl\t%%x0, %%x1, #%d", factor / 16); @@ -3010,8 +3026,8 @@ aarch64_output_sve_addvl_addpl (rtx dest factor in *FACTOR_OUT (if nonnull). */ bool -aarch64_sve_inc_dec_immediate_p (rtx x, int *factor_out, - unsigned int *nelts_per_vq_out) +aarch64_sve_vector_inc_dec_immediate_p (rtx x, int *factor_out, + unsigned int *nelts_per_vq_out) { rtx elt; poly_int64 value; @@ -3045,9 +3061,9 @@ aarch64_sve_inc_dec_immediate_p (rtx x, instruction. */ bool -aarch64_sve_inc_dec_immediate_p (rtx x) +aarch64_sve_vector_inc_dec_immediate_p (rtx x) { - return aarch64_sve_inc_dec_immediate_p (x, NULL, NULL); + return aarch64_sve_vector_inc_dec_immediate_p (x, NULL, NULL); } /* Return the asm template for an SVE vector INC or DEC instruction. @@ -3055,11 +3071,11 @@ aarch64_sve_inc_dec_immediate_p (rtx x) value of the vector count operand itself. */ char * -aarch64_output_sve_inc_dec_immediate (const char *operands, rtx x) +aarch64_output_sve_vector_inc_dec (const char *operands, rtx x) { int factor; unsigned int nelts_per_vq; - if (!aarch64_sve_inc_dec_immediate_p (x, &factor, &nelts_per_vq)) + if (!aarch64_sve_vector_inc_dec_immediate_p (x, &factor, &nelts_per_vq)) gcc_unreachable (); if (factor < 0) return aarch64_output_sve_cnt_immediate ("dec", operands, -factor, Index: gcc/config/aarch64/predicates.md =================================================================== --- gcc/config/aarch64/predicates.md 2019-08-15 09:17:59.073360556 +0100 +++ gcc/config/aarch64/predicates.md 2019-08-15 09:47:06.560458781 +0100 @@ -144,10 +144,18 @@ (define_predicate "aarch64_pluslong_stri (and (match_operand 0 "aarch64_pluslong_immediate") (not (match_operand 0 "aarch64_plus_immediate")))) +(define_predicate "aarch64_sve_scalar_inc_dec_immediate" + (and (match_code "const_poly_int") + (match_test "aarch64_sve_scalar_inc_dec_immediate_p (op)"))) + (define_predicate "aarch64_sve_addvl_addpl_immediate" (and (match_code "const_poly_int") (match_test "aarch64_sve_addvl_addpl_immediate_p (op)"))) +(define_predicate "aarch64_sve_plus_immediate" + (ior (match_operand 0 "aarch64_sve_scalar_inc_dec_immediate") + (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))) + (define_predicate "aarch64_split_add_offset_immediate" (and (match_code "const_poly_int") (match_test "aarch64_add_offset_temporaries (op) == 1"))) @@ -155,7 +163,8 @@ (define_predicate "aarch64_split_add_off (define_predicate "aarch64_pluslong_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_pluslong_immediate") - (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))) + (and (match_test "TARGET_SVE") + (match_operand 0 "aarch64_sve_plus_immediate")))) (define_predicate "aarch64_pluslong_or_poly_operand" (ior (match_operand 0 "aarch64_pluslong_operand") @@ -602,9 +611,9 @@ (define_predicate "aarch64_sve_sub_arith (and (match_code "const,const_vector") (match_test "aarch64_sve_arith_immediate_p (op, true)"))) -(define_predicate "aarch64_sve_inc_dec_immediate" +(define_predicate "aarch64_sve_vector_inc_dec_immediate" (and (match_code "const,const_vector") - (match_test "aarch64_sve_inc_dec_immediate_p (op)"))) + (match_test "aarch64_sve_vector_inc_dec_immediate_p (op)"))) (define_predicate "aarch64_sve_uxtb_immediate" (and (match_code "const_vector") @@ -687,7 +696,7 @@ (define_predicate "aarch64_sve_arith_ope (define_predicate "aarch64_sve_add_operand" (ior (match_operand 0 "aarch64_sve_arith_operand") (match_operand 0 "aarch64_sve_sub_arith_immediate") - (match_operand 0 "aarch64_sve_inc_dec_immediate"))) + (match_operand 0 "aarch64_sve_vector_inc_dec_immediate"))) (define_predicate "aarch64_sve_pred_and_operand" (ior (match_operand 0 "register_operand") Index: gcc/config/aarch64/constraints.md =================================================================== --- gcc/config/aarch64/constraints.md 2019-08-15 09:17:59.073360556 +0100 +++ gcc/config/aarch64/constraints.md 2019-08-15 09:47:06.560458781 +0100 @@ -49,6 +49,12 @@ (define_constraint "Uaa" (and (match_code "const_int") (match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)"))) +(define_constraint "Uai" + "@internal + A constraint that matches a VG-based constant that can be added by + a single INC or DEC." + (match_operand 0 "aarch64_sve_scalar_inc_dec_immediate")) + (define_constraint "Uav" "@internal A constraint that matches a VG-based constant that can be added by @@ -416,7 +422,7 @@ (define_constraint "vsi" "@internal A constraint that matches a vector count operand valid for SVE INC and DEC instructions." - (match_operand 0 "aarch64_sve_inc_dec_immediate")) + (match_operand 0 "aarch64_sve_vector_inc_dec_immediate")) (define_constraint "vsn" "@internal Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2019-08-14 09:50:03.682705602 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-15 09:47:06.560458781 +0100 @@ -1753,6 +1753,7 @@ (define_expand "add3" /* If the constant is too large for a single instruction and isn't frame based, split off the immediate so it is available for CSE. */ if (!aarch64_plus_immediate (operands[2], mode) + && !(TARGET_SVE && aarch64_sve_plus_immediate (operands[2], mode)) && can_create_pseudo_p () && (!REG_P (op1) || !REGNO_PTR_FRAME_P (REGNO (op1)))) @@ -1770,10 +1771,10 @@ (define_expand "add3" (define_insn "*add3_aarch64" [(set - (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,rk") + (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,r,rk") (plus:GPI - (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,rk") - (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uav")))] + (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,0,rk") + (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uai,Uav")))] "" "@ add\\t%0, %1, %2 @@ -1781,10 +1782,11 @@ (define_insn "*add3_aarch64" add\\t%0, %1, %2 sub\\t%0, %1, #%n2 # - * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);" - ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder. - [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm") - (set_attr "arch" "*,*,simd,*,*,*")] + * return aarch64_output_sve_scalar_inc_dec (operands[2]); + * return aarch64_output_sve_addvl_addpl (operands[2]);" + ;; The "alu_imm" types for INC/DEC and ADDVL/ADDPL are just placeholders. + [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm,alu_imm") + (set_attr "arch" "*,*,simd,*,*,sve,sve")] ) ;; zero_extend version of above @@ -1863,17 +1865,18 @@ (define_split ;; this pattern. (define_insn_and_split "*add3_poly_1" [(set - (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,&r") + (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,r,&r") (plus:GPI - (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,rk") - (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uat")))] + (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,0,rk") + (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uai,Uat")))] "TARGET_SVE && operands[0] != stack_pointer_rtx" "@ add\\t%0, %1, %2 add\\t%0, %1, %2 sub\\t%0, %1, #%n2 # - * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]); + * return aarch64_output_sve_scalar_inc_dec (operands[2]); + * return aarch64_output_sve_addvl_addpl (operands[2]); #" "&& epilogue_completed && !reg_overlap_mentioned_p (operands[0], operands[1]) @@ -1884,8 +1887,8 @@ (define_insn_and_split "*add3_poly operands[2], operands[0], NULL_RTX); DONE; } - ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder. - [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,multiple")] + ;; The "alu_imm" types for INC/DEC and ADDVL/ADDPL are just placeholders. + [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,alu_imm,multiple")] ) (define_split Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2019-08-15 09:43:30.754050982 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-15 09:47:06.556458811 +0100 @@ -1971,7 +1971,7 @@ (define_insn "add3" "@ add\t%0., %0., #%D2 sub\t%0., %0., #%N2 - * return aarch64_output_sve_inc_dec_immediate (\"%0.\", operands[2]); + * return aarch64_output_sve_vector_inc_dec (\"%0.\", operands[2]); movprfx\t%0, %1\;add\t%0., %0., #%D2 movprfx\t%0, %1\;sub\t%0., %0., #%N2 add\t%0., %1., %2."