From patchwork Tue May 23 06:08:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 1784807 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QQP5M6DQcz20Pr for ; Tue, 23 May 2023 16:08:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D1EC73856DCB for ; Tue, 23 May 2023 06:08:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr1.qq.com (smtpbgbr1.qq.com [54.207.19.206]) by sourceware.org (Postfix) with ESMTPS id E4F453858426 for ; Tue, 23 May 2023 06:08:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4F453858426 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp89t1684822087t9e3n1xd Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 23 May 2023 14:08:05 +0800 (CST) X-QQ-SSF: 01400000000000F0R000000A0000000 X-QQ-FEAT: PS/N6jJLnDbDllq/I4WONo+OFuAYdh4bLbWa2XuYIRhFZmaNouRLjlMH7Hvik PMJeyKVPOfZxJXTeMhpYovSioAz7Z/Vn/cq0yEfk/GyHTpgd/Ktyef7IkWoV5MIpQ6kuY/i yDK7CzMEeZgTfjUw7RsQknUPMRF5JSFJrlVZyzF7PFkGtv1CmIG8sofWBINGGe+xMkV/fHJ rCh9yJB7jQ8nycTBjF6lfC4zzV90WUkslhSrodcYz69lfnTwUaCcB/71OT1JnFx5s88R8UJ L96JK49SjMrPK7pXnDRjWvNPubgBmasvUg2muzLcb/oM+O+UIti6+jejtTdtqAlMlDAWhrD VswBduHkydjeKEZMywC1efaplk4paFWJVhm6OeGzaRwMYVau1kIG6TFoz3jCg== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 17815029316187462015 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization Date: Tue, 23 May 2023 14:08:04 +0800 Message-Id: <20230523060804.61556-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Juzhe-Zhong This patch is to refactor the framework of RVV auto-vectorization. Since we find out are keep adding helpers && wrappers when implementing auto-vectorization. It will make the RVV auto-vectorizaiton very messy. After double check my downstream RVV GCC, assemble all auto-vectorization patterns we are going to have. Base on these informations, I refactor the RVV framework to make it is easier and flexible for future use. For example, we will definitely implement len_mask_load/len_mask_store patterns which have both length && mask operand and use undefine merge operand. len_cond_div or cond_div will have length or mask operand and use a real merge operand instead of undefine merge operand. Also, we will have some patterns will use tail undisturbed and mask any. etc..... We will defintely have various features. Base on these circumstances, we add these following private members: int m_op_num; /* It't true when the pattern has a dest operand. Most of the patterns have dest operand wheras some patterns like STOREs does not have dest operand. */ bool m_has_dest_p; /* It't true if the pattern uses all trues mask operand. */ bool m_use_all_trues_mask_p; /* It's true if the pattern uses undefined merge operand. */ bool m_use_undef_merge_p; bool m_has_avl_p; bool m_vlmax_p; bool m_has_tail_policy_p; bool m_has_mask_policy_p; enum tail_policy m_tail_policy; enum mask_policy m_mask_policy; machine_mode m_dest_mode; machine_mode m_mask_mode; These variables I believe can cover all potential situations. And the instruction generater wrapper is "emit_insn" which will add operands and emit instruction according to the variables I mentioned above. After this is done. We will easily add helpers without changing any base class "insn_expand". Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many". For example, when we want to emit a binary operations: We have #define RVV_BINOP_NUM 3 (number including the output) Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...) So, if we support ternary operation in the future. It's quite simple: #define RVV_TERNOP_NUM 4 (number including the output) emit_vlmax_tany_many (...RVV_BINOP_NUM...) "*_tany_many" means we are using tail any and mask any. We will definitely need tail undisturbed or mask undisturbed when we support these patterns in middle-end. It's very simple to extend such helper base on current framework: we can do that in the future like this: void emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops) { machine_mode data_mode = GET_MODE (ops[0]); machine_mode mask_mode = get_mask_mode (data_mode).require (); /* The number = 11 is because we have maximum 11 operands for RVV instruction patterns according to vector.md. */ insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true, /*USE_ALL_TRUES_MASK_P*/ true, /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true, /*VLMAX_P*/ false, /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true, /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ MASK_UNDISTURBED, /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode); e.emit_insn ((enum insn_code) icode, ops); } That's enough (I have tested it fully in my downstream RVV GCC). I didn't add it in this patch. Thanks. gcc/ChangeLog: * config/riscv/autovec.md: Refactor the framework of RVV auto-vectorization. * config/riscv/riscv-protos.h (RVV_MISC_OP_NUM): Ditto. (RVV_UNOP_NUM): New macro. (RVV_BINOP_NUM): Ditto. (legitimize_move): Refactor the framework of RVV auto-vectorization. (emit_vlmax_op): Ditto. (emit_vlmax_reg_op): Ditto. (emit_len_op): Ditto. (emit_len_binop): Ditto. (emit_vlmax_tany_many): Ditto. (emit_nonvlmax_tany_many): Ditto. (sew64_scalar_helper): Ditto. (expand_tuple_move): Ditto. * config/riscv/riscv-v.cc (emit_pred_op): Ditto. (emit_pred_binop): Ditto. (emit_vlmax_op): Ditto. (emit_vlmax_tany_many): New function. (emit_len_op): Remove. (emit_nonvlmax_tany_many): New function. (emit_vlmax_reg_op): Remove. (emit_len_binop): Ditto. (emit_index_op): Ditto. (expand_vec_series): Refactor the framework of RVV auto-vectorization. (expand_const_vector): Ditto. (legitimize_move): Ditto. (sew64_scalar_helper): Ditto. (expand_tuple_move): Ditto. (expand_vector_init_insert_elems): Ditto. * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto. * config/riscv/vector.md: Ditto. --- gcc/config/riscv/autovec.md | 40 ++-- gcc/config/riscv/riscv-protos.h | 16 +- gcc/config/riscv/riscv-v.cc | 341 +++++++++++++++++--------------- gcc/config/riscv/riscv.cc | 8 +- gcc/config/riscv/vector.md | 40 +--- 5 files changed, 216 insertions(+), 229 deletions(-) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index ce0b46537ad..24405b869fa 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -31,8 +31,8 @@ (match_operand 3 "const_0_operand")] "TARGET_VECTOR" { - riscv_vector::emit_len_op (code_for_pred_mov (mode), operands[0], - operands[1], operands[2], mode); + riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (mode), + RVV_UNOP_NUM, operands); DONE; }) @@ -43,8 +43,8 @@ (match_operand 3 "const_0_operand")] "TARGET_VECTOR" { - riscv_vector::emit_len_op (code_for_pred_mov (mode), operands[0], - operands[1], operands[2], mode); + riscv_vector::emit_nonvlmax_tany_many (code_for_pred_mov (mode), + RVV_UNOP_NUM, operands); DONE; }) @@ -118,21 +118,8 @@ (match_operand:VI 2 "")))] "TARGET_VECTOR" { - if (!register_operand (operands[2], mode)) - { - rtx cst; - gcc_assert (const_vec_duplicate_p(operands[2], &cst)); - riscv_vector::emit_len_binop (code_for_pred_scalar - (, mode), - operands[0], operands[1], cst, - NULL, mode, - mode); - } - else - riscv_vector::emit_len_binop (code_for_pred - (, mode), - operands[0], operands[1], operands[2], - NULL, mode); + riscv_vector::emit_vlmax_tany_many (code_for_pred (, mode), + RVV_BINOP_NUM, operands); DONE; }) @@ -151,12 +138,9 @@ (match_operand: 2 "csr_operand")))] "TARGET_VECTOR" { - if (!CONST_SCALAR_INT_P (operands[2])) - operands[2] = gen_lowpart (Pmode, operands[2]); - riscv_vector::emit_len_binop (code_for_pred_scalar - (, mode), - operands[0], operands[1], operands[2], - NULL_RTX, mode, Pmode); + operands[2] = gen_lowpart (Pmode, operands[2]); + riscv_vector::emit_vlmax_tany_many (code_for_pred_scalar (, mode), + RVV_BINOP_NUM, operands); DONE; }) @@ -174,9 +158,7 @@ (match_operand:VI 2 "vector_shift_operand")))] "TARGET_VECTOR" { - riscv_vector::emit_len_binop (code_for_pred - (, mode), - operands[0], operands[1], operands[2], - NULL_RTX, mode); + riscv_vector::emit_vlmax_tany_many (code_for_pred (, mode), + RVV_BINOP_NUM, operands); DONE; }) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 12634d0ac1a..ba6d56517d3 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -132,6 +132,9 @@ namespace riscv_vector { #define RVV_VUNDEF(MODE) \ gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)), \ UNSPEC_VUNDEF) +#define RVV_MISC_OP_NUM 1 +#define RVV_UNOP_NUM 2 +#define RVV_BINOP_NUM 3 enum vlmul_type { LMUL_1 = 0, @@ -163,14 +166,11 @@ rtx expand_builtin (unsigned int, tree, rtx); bool check_builtin_call (location_t, vec, unsigned int, tree, unsigned int, tree *); bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); -bool legitimize_move (rtx, rtx, machine_mode); +bool legitimize_move (rtx, rtx); void emit_vlmax_vsetvl (machine_mode, rtx); void emit_hard_vlmax_vsetvl (machine_mode, rtx); -void emit_vlmax_op (unsigned, rtx, rtx, machine_mode); -void emit_vlmax_reg_op (unsigned, rtx, rtx, rtx, machine_mode); -void emit_len_op (unsigned, rtx, rtx, rtx, machine_mode); -void emit_len_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode, - machine_mode = VOIDmode); +void emit_vlmax_tany_many (unsigned, int, rtx *); +void emit_nonvlmax_tany_many (unsigned, int, rtx *); enum vlmul_type get_vlmul (machine_mode); unsigned int get_ratio (machine_mode); unsigned int get_nf (machine_mode); @@ -202,7 +202,7 @@ bool neg_simm5_p (rtx); #ifdef RTX_CODE bool has_vi_variant_p (rtx_code, rtx); #endif -bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, machine_mode, +bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode, bool, void (*)(rtx *, rtx)); rtx gen_scalar_move_mask (machine_mode); @@ -218,7 +218,7 @@ enum vlen_enum bool slide1_sew64_helper (int, machine_mode, machine_mode, machine_mode, rtx *); rtx gen_avl_for_scalar_move (rtx); -void expand_tuple_move (machine_mode, rtx *); +void expand_tuple_move (rtx *); machine_mode preferred_simd_mode (scalar_mode); opt_machine_mode get_mask_mode (machine_mode); void expand_vec_series (rtx, rtx, rtx); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index e0b19bc1754..980928c8aff 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -66,7 +66,29 @@ const_vlmax_p (machine_mode mode) template class insn_expander { public: - insn_expander () : m_opno (0), m_has_dest_p(false) {} + insn_expander () + : m_opno (0), m_op_num (0), m_has_dest_p (false), + m_use_all_trues_mask_p (false), m_use_undef_merge_p (false), + m_has_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p (false), + m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY), + m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode (VOIDmode) + {} + + /* Initializer for various configurations. */ + insn_expander (int op_num, bool has_dest_p, bool use_all_trues_mask_p, + bool use_undef_merge_p, bool has_avl_p, bool vlmax_p, + bool has_tail_policy_p, bool has_mask_policy_p, + enum tail_policy tail_policy, enum mask_policy mask_policy, + machine_mode dest_mode, machine_mode mask_mode) + : m_opno (0), m_op_num (op_num), m_has_dest_p (has_dest_p), + m_use_all_trues_mask_p (use_all_trues_mask_p), + m_use_undef_merge_p (use_undef_merge_p), m_has_avl_p (has_avl_p), + m_vlmax_p (vlmax_p), m_has_tail_policy_p (has_tail_policy_p), + m_has_mask_policy_p (has_mask_policy_p), m_tail_policy (tail_policy), + m_mask_policy (mask_policy), m_dest_mode (dest_mode), + m_mask_mode (mask_mode) + {} + void add_output_operand (rtx x, machine_mode mode) { create_output_operand (&m_ops[m_opno++], x, mode); @@ -77,67 +99,94 @@ public: create_input_operand (&m_ops[m_opno++], x, mode); gcc_assert (m_opno <= MAX_OPERANDS); } - void add_all_one_mask_operand (machine_mode mode) + void add_all_one_mask_operand () { - add_input_operand (CONSTM1_RTX (mode), mode); + add_input_operand (CONSTM1_RTX (m_mask_mode), m_mask_mode); } - void add_vundef_operand (machine_mode mode) + void add_vundef_operand () { - add_input_operand (RVV_VUNDEF (mode), mode); + add_input_operand (RVV_VUNDEF (m_dest_mode), m_dest_mode); } - void add_policy_operand (enum tail_policy vta, enum mask_policy vma) + void add_policy_operand () { - rtx tail_policy_rtx = gen_int_mode (vta, Pmode); - rtx mask_policy_rtx = gen_int_mode (vma, Pmode); - add_input_operand (tail_policy_rtx, Pmode); - add_input_operand (mask_policy_rtx, Pmode); + if (m_has_tail_policy_p) + { + rtx tail_policy_rtx = gen_int_mode (m_tail_policy, Pmode); + add_input_operand (tail_policy_rtx, Pmode); + } + if (m_has_mask_policy_p) + { + rtx mask_policy_rtx = gen_int_mode (m_mask_policy, Pmode); + add_input_operand (mask_policy_rtx, Pmode); + } } void add_avl_type_operand (avl_type type) { add_input_operand (gen_int_mode (type, Pmode), Pmode); } - void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode) + void emit_insn (enum insn_code icode, rtx *ops) { - m_dest_mode = GET_MODE (dest); - m_has_dest_p = true; - - add_output_operand (dest, m_dest_mode); - - if (mask) - add_input_operand (mask, GET_MODE (mask)); - else - add_all_one_mask_operand (mask_mode); - - add_vundef_operand (m_dest_mode); - } + int opno = 0; + /* It's true if any operand is memory operand. */ + bool any_mem_p = false; + /* It's true if all operands are mask operand. */ + bool all_mask_p = true; + if (m_has_dest_p) + { + any_mem_p |= MEM_P (ops[opno]); + all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL; + add_output_operand (ops[opno++], m_dest_mode); + } - void set_len_and_policy (rtx len, bool force_vlmax = false) - { - bool vlmax_p = force_vlmax || !len; - gcc_assert (m_has_dest_p); + if (m_use_all_trues_mask_p) + add_all_one_mask_operand (); - if (vlmax_p && const_vlmax_p (m_dest_mode)) - { - /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of the - vsetvli to obtain the value of vlmax. */ - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode); - len = gen_int_mode (nunits, Pmode); - vlmax_p = false; /* It has became NONVLMAX now. */ - } - else if (!len) - { - len = gen_reg_rtx (Pmode); - emit_vlmax_vsetvl (m_dest_mode, len); - } + if (m_use_undef_merge_p) + add_vundef_operand (); - add_input_operand (len, Pmode); + for (; opno < m_op_num; opno++) + { + any_mem_p |= MEM_P (ops[opno]); + all_mask_p &= GET_MODE_CLASS (GET_MODE (ops[opno])) == MODE_VECTOR_BOOL; + machine_mode mode = insn_data[(int) icode].operand[m_opno].mode; + /* 'create_input_operand doesn't allow VOIDmode. + According to vector.md, we may have some patterns that do not have + explicit machine mode specifying the operand. Such operands are + always Pmode. */ + if (mode == VOIDmode) + mode = Pmode; + add_input_operand (ops[opno], mode); + } - if (GET_MODE_CLASS (m_dest_mode) != MODE_VECTOR_BOOL) - add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy ()); + if (m_has_avl_p) + { + rtx len = ops[m_op_num]; + if (m_vlmax_p) + { + if (const_vlmax_p (m_dest_mode)) + { + /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of + the vsetvli to obtain the value of vlmax. */ + poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode); + len = gen_int_mode (nunits, Pmode); + m_vlmax_p = false; /* It has became NONVLMAX now. */ + } + else if (can_create_pseudo_p ()) + { + len = gen_reg_rtx (Pmode); + emit_vlmax_vsetvl (m_dest_mode, len); + } + } + add_input_operand (len, Pmode); + } - add_avl_type_operand (vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX); - } + if (!all_mask_p) + add_policy_operand (); + if (m_has_avl_p) + add_avl_type_operand (m_vlmax_p ? avl_type::VLMAX : avl_type::NONVLMAX); + expand (icode, any_mem_p); + } void expand (enum insn_code icode, bool temporary_volatile_p = false) { @@ -152,8 +201,23 @@ public: private: int m_opno; + int m_op_num; + /* It't true when the pattern has a dest operand. Most of the patterns have + dest operand wheras some patterns like STOREs does not have dest operand. + */ bool m_has_dest_p; + /* It't true if the pattern uses all trues mask operand. */ + bool m_use_all_trues_mask_p; + /* It's true if the pattern uses undefined merge operand. */ + bool m_use_undef_merge_p; + bool m_has_avl_p; + bool m_vlmax_p; + bool m_has_tail_policy_p; + bool m_has_mask_policy_p; + enum tail_policy m_tail_policy; + enum mask_policy m_mask_policy; machine_mode m_dest_mode; + machine_mode m_mask_mode; expand_operand m_ops[MAX_OPERANDS]; }; @@ -246,49 +310,6 @@ autovec_use_vlmax_p (void) || riscv_autovec_preference == RVV_FIXED_VLMAX); } -/* Emit an RVV unmask && vl mov from SRC to DEST. */ -static void -emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len, - machine_mode mask_mode, bool force_vlmax = false) -{ - insn_expander<8> e; - e.set_dest_and_mask (mask, dest, mask_mode); - - e.add_input_operand (src, GET_MODE (src)); - - e.set_len_and_policy (len, force_vlmax); - - e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src)); -} - -/* Emit an RVV binop. If one of SRC1 and SRC2 is a scalar operand, its mode is - specified using SCALAR_MODE. */ -static void -emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2, - rtx len, machine_mode mask_mode, - machine_mode scalar_mode = VOIDmode) -{ - insn_expander<9> e; - e.set_dest_and_mask (mask, dest, mask_mode); - - gcc_assert (VECTOR_MODE_P (GET_MODE (src1)) - || VECTOR_MODE_P (GET_MODE (src2))); - - if (VECTOR_MODE_P (GET_MODE (src1))) - e.add_input_operand (src1, GET_MODE (src1)); - else - e.add_input_operand (src1, scalar_mode); - - if (VECTOR_MODE_P (GET_MODE (src2))) - e.add_input_operand (src2, GET_MODE (src2)); - else - e.add_input_operand (src2, scalar_mode); - - e.set_len_and_policy (len); - - e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src1) || MEM_P (src2)); -} - /* The RISC-V vsetvli pass uses "known vlmax" operations for optimization. Whether or not an instruction actually is a vlmax operation is not recognizable from the length operand alone but the avl_type operand @@ -305,52 +326,42 @@ emit_pred_binop (unsigned icode, rtx mask, rtx dest, rtx src1, rtx src2, For that case we also allow to set the avl_type to VLMAX. */ -/* This function emits a VLMAX vsetvli followed by the actual operation. */ +/* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the + * actual operation. */ void -emit_vlmax_op (unsigned icode, rtx dest, rtx src, machine_mode mask_mode) +emit_vlmax_tany_many (unsigned icode, int op_num, rtx *ops) { - emit_pred_op (icode, NULL_RTX, dest, src, NULL_RTX, mask_mode); + machine_mode data_mode = GET_MODE (ops[0]); + machine_mode mask_mode = get_mask_mode (data_mode).require (); + /* The number = 11 is because we have maximum 11 operands for + RVV instruction patterns according to vector.md. */ + insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true, + /*USE_ALL_TRUES_MASK_P*/ true, + /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true, + /*VLMAX_P*/ true, + /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true, + /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY, + /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode); + e.emit_insn ((enum insn_code) icode, ops); } -/* This function emits an operation with a given LEN that is determined - by a previously emitted VLMAX vsetvli. */ +/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the + * actual operation. */ void -emit_len_op (unsigned icode, rtx dest, rtx src, rtx len, - machine_mode mask_mode) +emit_nonvlmax_tany_many (unsigned icode, int op_num, rtx *ops) { - emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode); -} - -/* This function emits an operation with a given LEN that is known to be - a preceding VLMAX. It also sets the VLMAX flag which allows further - optimization in the vsetvli pass. */ -void -emit_vlmax_reg_op (unsigned icode, rtx dest, rtx src, rtx len, - machine_mode mask_mode) -{ - emit_pred_op (icode, NULL_RTX, dest, src, len, mask_mode, - /* Force VLMAX */ true); -} - -void -emit_len_binop (unsigned icode, rtx dest, rtx src1, rtx src2, rtx len, - machine_mode mask_mode, machine_mode scalar_mode) -{ - emit_pred_binop (icode, NULL_RTX, dest, src1, src2, len, - mask_mode, scalar_mode); -} - -/* Emit vid.v instruction. */ - -static void -emit_index_op (rtx dest, machine_mode mask_mode) -{ - insn_expander<7> e; - e.set_dest_and_mask (NULL, dest, mask_mode); - - e.set_len_and_policy (NULL, true); - - e.expand (code_for_pred_series (GET_MODE (dest)), false); + machine_mode data_mode = GET_MODE (ops[0]); + machine_mode mask_mode = get_mask_mode (data_mode).require (); + /* The number = 11 is because we have maximum 11 operands for + RVV instruction patterns according to vector.md. */ + insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true, + /*USE_ALL_TRUES_MASK_P*/ true, + /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true, + /*VLMAX_P*/ false, + /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true, + /*TAIL_POLICY*/ TAIL_ANY, /*MASK_POLICY*/ MASK_ANY, + /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode); + e.emit_insn ((enum insn_code) icode, ops); } /* Expand series const vector. */ @@ -359,7 +370,6 @@ void expand_vec_series (rtx dest, rtx base, rtx step) { machine_mode mode = GET_MODE (dest); - machine_mode inner_mode = GET_MODE_INNER (mode); machine_mode mask_mode; gcc_assert (get_mask_mode (mode).exists (&mask_mode)); @@ -367,7 +377,8 @@ expand_vec_series (rtx dest, rtx base, rtx step) /* Step 1: Generate I = { 0, 1, 2, ... } by vid.v. */ rtx vid = gen_reg_rtx (mode); - emit_index_op (vid, mask_mode); + rtx op[1] = {vid}; + emit_vlmax_tany_many (code_for_pred_series (mode), RVV_MISC_OP_NUM, op); /* Step 2: Generate I * STEP. - STEP is 1, we don't emit any instructions. @@ -385,14 +396,14 @@ expand_vec_series (rtx dest, rtx base, rtx step) int shift = exact_log2 (INTVAL (step)); rtx shift_amount = gen_int_mode (shift, Pmode); insn_code icode = code_for_pred_scalar (ASHIFT, mode); - emit_len_binop (icode, step_adj, vid, shift_amount, - NULL, mask_mode, Pmode); + rtx ops[3] = {step_adj, vid, shift_amount}; + emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops); } else { insn_code icode = code_for_pred_scalar (MULT, mode); - emit_len_binop (icode, step_adj, vid, step, - NULL, mask_mode, inner_mode); + rtx ops[3] = {step_adj, vid, step}; + emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops); } } @@ -407,14 +418,14 @@ expand_vec_series (rtx dest, rtx base, rtx step) { rtx result = gen_reg_rtx (mode); insn_code icode = code_for_pred_scalar (PLUS, mode); - emit_len_binop (icode, result, step_adj, base, - NULL, mask_mode, inner_mode); + rtx ops[3] = {result, step_adj, base}; + emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops); emit_move_insn (dest, result); } } static void -expand_const_vector (rtx target, rtx src, machine_mode mask_mode) +expand_const_vector (rtx target, rtx src) { machine_mode mode = GET_MODE (target); scalar_mode elt_mode = GET_MODE_INNER (mode); @@ -424,7 +435,8 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode) gcc_assert ( const_vec_duplicate_p (src, &elt) && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx))); - emit_vlmax_op (code_for_pred_mov (mode), target, src, mask_mode); + rtx ops[2] = {target, src}; + emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops); return; } @@ -435,10 +447,16 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode) /* Element in range -16 ~ 15 integer or 0.0 floating-point, we use vmv.v.i instruction. */ if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src)) - emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode); + { + rtx ops[2] = {tmp, src}; + emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops); + } else - emit_vlmax_op (code_for_pred_broadcast (mode), tmp, - force_reg (elt_mode, elt), mask_mode); + { + elt = force_reg (elt_mode, elt); + rtx ops[2] = {tmp, elt}; + emit_vlmax_tany_many (code_for_pred_broadcast (mode), RVV_UNOP_NUM, ops); + } if (tmp != target) emit_move_insn (target, tmp); @@ -463,12 +481,12 @@ expand_const_vector (rtx target, rtx src, machine_mode mask_mode) /* Expand a pre-RA RVV data move from SRC to DEST. It expands move for RVV fractional vector modes. */ bool -legitimize_move (rtx dest, rtx src, machine_mode mask_mode) +legitimize_move (rtx dest, rtx src) { machine_mode mode = GET_MODE (dest); if (CONST_VECTOR_P (src)) { - expand_const_vector (dest, src, mask_mode); + expand_const_vector (dest, src); return true; } @@ -505,7 +523,10 @@ legitimize_move (rtx dest, rtx src, machine_mode mask_mode) { rtx tmp = gen_reg_rtx (mode); if (MEM_P (src)) - emit_vlmax_op (code_for_pred_mov (mode), tmp, src, mask_mode); + { + rtx ops[2] = {tmp, src}; + emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops); + } else emit_move_insn (tmp, src); src = tmp; @@ -514,7 +535,8 @@ legitimize_move (rtx dest, rtx src, machine_mode mask_mode) if (satisfies_constraint_vu (src)) return false; - emit_vlmax_op (code_for_pred_mov (mode), dest, src, mask_mode); + rtx ops[2] = {dest, src}; + emit_vlmax_tany_many (code_for_pred_mov (mode), RVV_UNOP_NUM, ops); return true; } @@ -748,8 +770,7 @@ has_vi_variant_p (rtx_code code, rtx x) bool sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl, - machine_mode vector_mode, machine_mode mask_mode, - bool has_vi_variant_p, + machine_mode vector_mode, bool has_vi_variant_p, void (*emit_vector_func) (rtx *, rtx)) { machine_mode scalar_mode = GET_MODE_INNER (vector_mode); @@ -779,8 +800,9 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl, *scalar_op = force_reg (scalar_mode, *scalar_op); rtx tmp = gen_reg_rtx (vector_mode); - riscv_vector::emit_len_op (code_for_pred_broadcast (vector_mode), tmp, - *scalar_op, vl, mask_mode); + rtx ops[3] = {tmp, *scalar_op, vl}; + riscv_vector::emit_nonvlmax_tany_many (code_for_pred_broadcast (vector_mode), + RVV_UNOP_NUM, ops); emit_vector_func (operands, tmp); return true; @@ -990,7 +1012,7 @@ gen_avl_for_scalar_move (rtx avl) /* Expand tuple modes data movement for. */ void -expand_tuple_move (machine_mode mask_mode, rtx *ops) +expand_tuple_move (rtx *ops) { unsigned int i; machine_mode tuple_mode = GET_MODE (ops[0]); @@ -1086,8 +1108,11 @@ expand_tuple_move (machine_mode mask_mode, rtx *ops) rtx mem = gen_rtx_MEM (subpart_mode, ops[3]); if (fractional_p) - emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), subreg, mem, - ops[4], mask_mode); + { + rtx operands[3] = {subreg, mem, ops[4]}; + emit_vlmax_tany_many (code_for_pred_mov (subpart_mode), + RVV_UNOP_NUM, operands); + } else emit_move_insn (subreg, mem); } @@ -1108,8 +1133,11 @@ expand_tuple_move (machine_mode mask_mode, rtx *ops) rtx mem = gen_rtx_MEM (subpart_mode, ops[3]); if (fractional_p) - emit_vlmax_reg_op (code_for_pred_mov (subpart_mode), mem, subreg, - ops[4], mask_mode); + { + rtx operands[3] = {mem, subreg, ops[4]}; + emit_vlmax_tany_many (code_for_pred_mov (subpart_mode), + RVV_UNOP_NUM, operands); + } else emit_move_insn (mem, subreg); } @@ -1230,7 +1258,6 @@ expand_vector_init_insert_elems (rtx target, const rvv_builder &builder, int nelts_reqd) { machine_mode mode = GET_MODE (target); - scalar_mode elem_mode = GET_MODE_INNER (mode); machine_mode mask_mode; gcc_assert (get_mask_mode (mode).exists (&mask_mode)); rtx dup = expand_vector_broadcast (mode, builder.elt (0)); @@ -1241,8 +1268,8 @@ expand_vector_init_insert_elems (rtx target, const rvv_builder &builder, unsigned int unspec = FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN; insn_code icode = code_for_pred_slide (unspec, mode); - emit_len_binop (icode, target, target, builder.elt (i), NULL, mask_mode, - elem_mode); + rtx ops[3] = {target, target, builder.elt (i)}; + emit_vlmax_tany_many (icode, RVV_BINOP_NUM, ops); } } diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 5ac187c1b1b..109483c8b1c 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -7389,9 +7389,6 @@ vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) { rtx target = regno_reg_rtx[regno]; machine_mode mode = GET_MODE (target); - poly_uint16 nunits = GET_MODE_NUNITS (mode); - machine_mode mask_mode - = riscv_vector::get_vector_mode (BImode, nunits).require (); if (!emitted_vlmax_vsetvl) { @@ -7399,8 +7396,9 @@ vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) emitted_vlmax_vsetvl = true; } - riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (mode), target, - CONST0_RTX (mode), vl, mask_mode); + rtx ops[3] = {target, CONST0_RTX (mode), vl}; + riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (mode), + RVV_UNOP_NUM, ops); SET_HARD_REG_BIT (zeroed_hardregs, regno); } diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index e8bb7c5dec1..b6663973ba1 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -662,7 +662,7 @@ before spilling. The clobber scratch is used by spilling fractional registers in IRA/LRA so it's too early. */ - if (riscv_vector::legitimize_move (operands[0], operands[1], mode)) + if (riscv_vector::legitimize_move (operands[0], operands[1])) DONE; }) @@ -718,7 +718,7 @@ (match_operand:VB 1 "general_operand"))] "TARGET_VECTOR" { - if (riscv_vector::legitimize_move (operands[0], operands[1], mode)) + if (riscv_vector::legitimize_move (operands[0], operands[1])) DONE; }) @@ -760,9 +760,8 @@ else { riscv_vector::emit_vlmax_vsetvl (mode, operands[2]); - riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (mode), - operands[0], operands[1], operands[2], - mode); + riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (mode), + RVV_UNOP_NUM, operands); } DONE; }) @@ -781,9 +780,8 @@ else { riscv_vector::emit_vlmax_vsetvl (mode, operands[2]); - riscv_vector::emit_vlmax_reg_op (code_for_pred_mov (mode), - operands[0], operands[1], operands[2], - mode); + riscv_vector::emit_vlmax_tany_many (code_for_pred_mov (mode), + RVV_UNOP_NUM, operands); } DONE; }) @@ -806,7 +804,7 @@ if (GET_CODE (operands[1]) == CONST_VECTOR) { - riscv_vector::expand_tuple_move (mode, operands); + riscv_vector::expand_tuple_move (operands); DONE; } @@ -826,7 +824,7 @@ "&& reload_completed" [(const_int 0)] { - riscv_vector::expand_tuple_move (mode, operands); + riscv_vector::expand_tuple_move (operands); DONE; } [(set_attr "type" "vmov,vlde,vste") @@ -846,8 +844,8 @@ (match_operand: 1 "direct_broadcast_operand")))] "TARGET_VECTOR" { - riscv_vector::emit_vlmax_op (code_for_pred_broadcast (mode), - operands[0], operands[1], mode); + riscv_vector::emit_vlmax_tany_many (code_for_pred_broadcast (mode), + RVV_UNOP_NUM, operands); DONE; } ) @@ -1272,7 +1270,6 @@ /* scalar op */&operands[3], /* vl */operands[5], mode, - mode, riscv_vector::simm5_p (operands[3]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_merge (operands[0], operands[1], @@ -1983,7 +1980,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, riscv_vector::has_vi_variant_p (, operands[4]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_ (operands[0], operands[1], @@ -2059,7 +2055,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, riscv_vector::has_vi_variant_p (, operands[4]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_ (operands[0], operands[1], @@ -2135,7 +2130,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, riscv_vector::neg_simm5_p (operands[4]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_sub (operands[0], operands[1], @@ -2253,7 +2247,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_mulh (operands[0], operands[1], @@ -2428,7 +2421,6 @@ /* scalar op */&operands[3], /* vl */operands[5], mode, - mode, riscv_vector::simm5_p (operands[3]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_adc (operands[0], operands[1], @@ -2512,7 +2504,6 @@ /* scalar op */&operands[3], /* vl */operands[5], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_sbc (operands[0], operands[1], @@ -2671,7 +2662,6 @@ /* scalar op */&operands[2], /* vl */operands[4], mode, - mode, riscv_vector::simm5_p (operands[2]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_madc (operands[0], operands[1], @@ -2741,7 +2731,6 @@ /* scalar op */&operands[2], /* vl */operands[4], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_msbc (operands[0], operands[1], @@ -2884,7 +2873,6 @@ /* scalar op */&operands[2], /* vl */operands[3], mode, - mode, riscv_vector::simm5_p (operands[2]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_madc_overflow (operands[0], operands[1], @@ -2951,7 +2939,6 @@ /* scalar op */&operands[2], /* vl */operands[3], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_msbc_overflow (operands[0], operands[1], @@ -3449,7 +3436,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, riscv_vector::has_vi_variant_p (, operands[4]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_ (operands[0], operands[1], @@ -3531,7 +3517,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, riscv_vector::has_vi_variant_p (, operands[4]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_ (operands[0], operands[1], @@ -3681,7 +3666,6 @@ /* scalar op */&operands[4], /* vl */operands[5], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_ (operands[0], operands[1], @@ -4141,7 +4125,6 @@ /* scalar op */&operands[5], /* vl */operands[6], mode, - mode, riscv_vector::has_vi_variant_p (code, operands[5]), code == LT || code == LTU ? [] (rtx *operands, rtx boardcast_scalar) { @@ -4181,7 +4164,6 @@ /* scalar op */&operands[5], /* vl */operands[6], mode, - mode, riscv_vector::has_vi_variant_p (code, operands[5]), [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_cmp (operands[0], operands[1], @@ -4880,7 +4862,6 @@ /* scalar op */&operands[2], /* vl */operands[6], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_mul_plus (operands[0], operands[1], @@ -5301,7 +5282,6 @@ /* scalar op */&operands[2], /* vl */operands[6], mode, - mode, false, [] (rtx *operands, rtx boardcast_scalar) { emit_insn (gen_pred_minus_mul (operands[0], operands[1],