From patchwork Fri Apr 5 08:23:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ajit Agarwal X-Patchwork-Id: 1920075 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=TbFsnnM3; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V9s3M0x3Pz1yZH for ; Fri, 5 Apr 2024 19:24:31 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55A423858402 for ; Fri, 5 Apr 2024 08:24:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 95D193858C32 for ; Fri, 5 Apr 2024 08:23:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 95D193858C32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 95D193858C32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712305428; cv=none; b=uoUlxj++LVzVt96Cer5VfaLAYe5Sy2LNQ1orCdDTI/UxGbBW6k/FiQwfsUl6//ehv2FDgEvqlBT381+ncxPhzEYnxd2rRbuWf2gcESW6ts8ASRHnJfTkIyVWSMCeKZWakq7VRDkQlXUf40ybj8fjeKxSOAXmCw3v174ITumGTfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712305428; c=relaxed/simple; bh=nTpyEBg05KIxjy9BbS2zJJ3Yo3ir4qxLrGqUWE612eU=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=Qq8+u01yTPj6Kivqcn5BU/LgV/8tn6YszXSvDtIQXcxwA3j65ww+EQqg97eeEa2MAbhZdC67Ig7JAzSQPOuYwtuuJ6w5YQMjXjpotv67a1VJegCEfakH7wr+it8XexpXohKnsSET7DNDr6VOfzpYcVYmVOMalSnKD2LLyYhLDAI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4357vjMx016887; Fri, 5 Apr 2024 08:23:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : to : from : subject : content-type : content-transfer-encoding; s=pp1; bh=gq6W0ZDFCS0k4/IFkQwxKmcjx9vsVQKRObaUPjNwWBs=; b=TbFsnnM3CdjuJJGp8t16t9rdxGtx8krKcgu6QmRkqPwIVanEJS2mWpJlbKDbaw1PTCWD RvygPCvYiRVJGykSUKzSYUMHLmakCmkGtD8qRIb/dVF2WG23yXE+MPG/KbVvLAqnbMTD /BX+May+UjOoX8Hu1q4GUbdTbA8Pf/58tSuHJmNNW8oE6vxgIvGkP3WDBKlbi4HCN2xx 2HV8HzrnZaLMAcjTxDM1cZbJaVL55qNgbxB5B4bCXe9ZiElC1UpmTUjpK4CNw9YvQUhb ErLIt3WrG346L79GPLBI3RdjcOJFgeC7lyY7tKQyYFdpUaP+QOzj7LryHFDCv2ABfweL pg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xadc2g2bd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Apr 2024 08:23:35 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4358Ihp5021109; Fri, 5 Apr 2024 08:23:35 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3xadc2g2b9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Apr 2024 08:23:35 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4355w5qp008665; Fri, 5 Apr 2024 08:23:33 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([172.16.1.73]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x9epw9b8r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 05 Apr 2024 08:23:33 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4358NUsY21168722 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 5 Apr 2024 08:23:32 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C647358067; Fri, 5 Apr 2024 08:23:30 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 96BA958063; Fri, 5 Apr 2024 08:23:26 +0000 (GMT) Received: from [9.43.42.180] (unknown [9.43.42.180]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 5 Apr 2024 08:23:26 +0000 (GMT) Message-ID: Date: Fri, 5 Apr 2024 13:53:24 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Alex Coplan , Richard Sandiford , "Kewen.Lin" , Segher Boessenkool , Michael Meissner , David Edelsohn , Peter Bergner , gcc-patches From: Ajit Agarwal Subject: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MOwDWwOYNQPnNicYrDW8jGBEryw0y8tR X-Proofpoint-GUID: BIHzOqr8UeLDCzmXf65CF9Krw3kqwRTx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-05_07,2024-04-04_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 clxscore=1011 adultscore=0 malwarescore=0 bulkscore=0 spamscore=0 suspectscore=0 mlxscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2404010000 definitions=main-2404050060 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hello Alex/Richard: All review comments are incorporated. Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface betwwen target independent and dependent code. Target dependent code is the implementation of pure virtual function for aarch64 target and the call to target independent code. Thanks & Regards Ajit aarch64: Place target independent and dependent changed code in one file Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface betwwen target independent and dependent code. Target dependent code is the implementation of pure virtual function for aarch64 target and the call to target independent code. 2024-04-06 Ajit Kumar Agarwal gcc/ChangeLog: * config/aarch64/aarch64-ldp-fusion.cc: Place target independent and dependent changed code. --- gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++++++++++++++-------- 1 file changed, 249 insertions(+), 122 deletions(-) diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc index 22ed95eb743..cb21b514ef7 100644 --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc @@ -138,8 +138,122 @@ struct alt_base poly_int64 offset; }; +// Virtual base class for load/store walkers used in alias analysis. +struct alias_walker +{ + virtual bool conflict_p (int &budget) const = 0; + virtual insn_info *insn () const = 0; + virtual bool valid () const = 0; + virtual void advance () = 0; +}; + +struct pair_fusion { + + pair_fusion () {}; + virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, + bool load_p) = 0; + + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; + virtual bool pair_trailing_writeback_p () = 0; + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, + machine_mode mem_mode) = 0; + virtual int pair_mem_alias_check_limit () = 0; + virtual bool handle_writeback_opportunities () = 0 ; + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, + machine_mode mode) = 0; + virtual rtx gen_mem_pair (rtx *pats, rtx writeback, + bool load_p) = 0; + virtual bool pair_mem_promote_writeback_p (rtx pat) = 0; + virtual bool track_load_p () = 0; + virtual bool track_store_p () = 0; + virtual bool cand_insns_empty_p (std::list &insns) = 0; + virtual bool pair_mem_in_range_p (HOST_WIDE_INT off) = 0; + void ldp_fusion_bb (bb_info *bb); + + ~pair_fusion() { } +}; + +struct aarch64_pair_fusion : public pair_fusion +{ +public: + aarch64_pair_fusion () : pair_fusion () {}; + bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode, + bool load_p) override final + { + const bool fpsimd_op_p + = reload_completed + ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) + : (GET_MODE_CLASS (mem_mode) != MODE_INT + && (load_p || !aarch64_const_zero_rtx_p (reg_op))); + return fpsimd_op_p; + } + + bool pair_mem_promote_writeback_p (rtx pat) + { + if (reload_completed + && aarch64_ldp_writeback > 1 + && GET_CODE (pat) == PARALLEL + && XVECLEN (pat, 0) == 2) + return true; + + return false; + } + + bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, + machine_mode mode) + { + return aarch64_mem_ok_with_ldpstp_policy_model (first_mem, + load_p, + mode); + } + bool pair_operand_mode_ok_p (machine_mode mode); + + rtx gen_mem_pair (rtx *pats, rtx writeback, bool load_p); + + bool pair_trailing_writeback_p () + { + return aarch64_ldp_writeback > 1; + } + bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, + machine_mode mem_mode) + { + return (load_p + ? aarch64_ldp_reg_operand (reg_op, mem_mode) + : aarch64_stp_reg_operand (reg_op, mem_mode)); + } + int pair_mem_alias_check_limit () + { + return aarch64_ldp_alias_check_limit; + } + bool handle_writeback_opportunities () + { + return aarch64_ldp_writeback; + } + bool track_load_p () + { + const bool track_loads + = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; + return track_loads; + } + bool track_store_p () + { + const bool track_stores + = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; + return track_stores; + } + bool cand_insns_empty_p (std::list &insns) + { + return insns.empty(); + } + bool pair_mem_in_range_p (HOST_WIDE_INT off) + { + return (off < LDP_MIN_IMM || off > LDP_MAX_IMM); + } +}; + + // State used by the pass for a given basic block. -struct ldp_bb_info +struct pair_fusion_bb_info { using def_hash = nofree_ptr_hash; using expr_key_t = pair_hash>; @@ -160,14 +274,17 @@ struct ldp_bb_info static const size_t obstack_alignment = sizeof (void *); bb_info *m_bb; + pair_fusion *bb_state; - ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false) + pair_fusion_bb_info (bb_info *bb, + aarch64_pair_fusion *d) : m_bb (bb), + bb_state (d), m_emitted_tombstone (false) { obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, obstack_alignment, obstack_chunk_alloc, obstack_chunk_free); } - ~ldp_bb_info () + ~pair_fusion_bb_info () { obstack_free (&m_obstack, nullptr); @@ -177,10 +294,32 @@ struct ldp_bb_info bitmap_obstack_release (&m_bitmap_obstack); } } + void track_access (insn_info *, bool load, rtx mem); + void transform (); + void cleanup_tombstones (); + void merge_pairs (insn_list_t &, insn_list_t &, + bool load_p, unsigned access_size); + void transform_for_base (int load_size, access_group &group); + + bool try_fuse_pair (bool load_p, unsigned access_size, + insn_info *i1, insn_info *i2); + + bool fuse_pair (bool load_p, unsigned access_size, + int writeback, + insn_info *i1, insn_info *i2, + base_cand &base, + const insn_range_info &move_range); - inline void track_access (insn_info *, bool load, rtx mem); - inline void transform (); - inline void cleanup_tombstones (); + void do_alias_analysis (insn_info *alias_hazards[4], + alias_walker *walkers[4], + bool load_p); + + void track_tombstone (int uid); + + bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); + + template + void traverse_base_map (Map &map); private: obstack m_obstack; @@ -191,30 +330,32 @@ private: bool m_emitted_tombstone; inline splay_tree_node *node_alloc (access_record *); +}; - template - inline void traverse_base_map (Map &map); - inline void transform_for_base (int load_size, access_group &group); - - inline void merge_pairs (insn_list_t &, insn_list_t &, - bool load_p, unsigned access_size); - - inline bool try_fuse_pair (bool load_p, unsigned access_size, - insn_info *i1, insn_info *i2); - - inline bool fuse_pair (bool load_p, unsigned access_size, - int writeback, - insn_info *i1, insn_info *i2, - base_cand &base, - const insn_range_info &move_range); - - inline void track_tombstone (int uid); +rtx aarch64_pair_fusion::gen_mem_pair (rtx *pats, + rtx writeback, + bool load_p) + { + rtx pair_pat; - inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); -}; + if (writeback) + { + auto patvec = gen_rtvec (3, writeback, pats[0], pats[1]); + pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); + } + else if (load_p) + pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), + XEXP (pats[1], 0), + XEXP (pats[0], 1)); + else + pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), + XEXP (pats[0], 1), + XEXP (pats[1], 1)); + return pair_pat; + } splay_tree_node * -ldp_bb_info::node_alloc (access_record *access) +pair_fusion_bb_info::node_alloc (access_record *access) { using T = splay_tree_node; void *addr = obstack_alloc (&m_obstack, sizeof (T)); @@ -262,7 +403,7 @@ drop_writeback (rtx mem) // RTX_AUTOINC addresses. The interface is like strip_offset except we take a // MEM so that we know the mode of the access. static rtx -ldp_strip_offset (rtx mem, poly_int64 *offset) +pair_mem_strip_offset (rtx mem, poly_int64 *offset) { rtx addr = XEXP (mem, 0); @@ -332,6 +473,12 @@ ldp_operand_mode_ok_p (machine_mode mode) return reload_completed || mode != TImode; } +bool +aarch64_pair_fusion::pair_operand_mode_ok_p (machine_mode mode) +{ + return ldp_operand_mode_ok_p (mode); +} + // Given LFS (load_p, fpsimd_p, size) fields in FIELDS, encode these // into an integer for use as a hash table key. static int @@ -396,7 +543,7 @@ access_group::track (Alloc alloc_node, poly_int64 offset, insn_info *insn) // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access. // LFS is used as part of the key to the hash table, see track_access. bool -ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) +pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) { if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem)) return false; @@ -412,9 +559,10 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) const machine_mode mem_mode = GET_MODE (mem); const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); - // Punt on misaligned offsets. LDP/STP instructions require offsets to be a - // multiple of the access size, and we believe that misaligned offsets on - // MEM_EXPR bases are likely to lead to misaligned offsets w.r.t. RTL bases. + // Punt on misaligned offsets. Paired memory access instructions require + // offsets to be a multiple of the access size, and we believe that + // misaligned offsets on MEM_EXPR bases are likely to lead to misaligned + // offsets w.r.t. RTL bases. if (!multiple_p (offset, mem_size)) return false; @@ -438,46 +586,38 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs) } // Main function to begin pair discovery. Given a memory access INSN, -// determine whether it could be a candidate for fusing into an ldp/stp, +// determine whether it could be a candidate for fusing into an pair mem, // and if so, track it in the appropriate data structure for this basic // block. LOAD_P is true if the access is a load, and MEM is the mem // rtx that occurs in INSN. void -ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) +pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) { // We can't combine volatile MEMs, so punt on these. if (MEM_VOLATILE_P (mem)) return; - // Ignore writeback accesses if the param says to do so. - if (!aarch64_ldp_writeback + // Ignore writeback accesses if the param says to do so + if (!bb_state->handle_writeback_opportunities () && GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC) return; const machine_mode mem_mode = GET_MODE (mem); - if (!ldp_operand_mode_ok_p (mem_mode)) + + if (!bb_state->pair_operand_mode_ok_p (mem_mode)) return; rtx reg_op = XEXP (PATTERN (insn->rtl ()), !load_p); - // Ignore the access if the register operand isn't suitable for ldp/stp. - if (load_p - ? !aarch64_ldp_reg_operand (reg_op, mem_mode) - : !aarch64_stp_reg_operand (reg_op, mem_mode)) + if (!bb_state->pair_reg_operand_ok_p (load_p, reg_op, mem_mode)) return; - // We want to segregate FP/SIMD accesses from GPR accesses. // // Before RA, we use the modes, noting that stores of constant zero // operands use GPRs (even in non-integer modes). After RA, we use // the hard register numbers. - const bool fpsimd_op_p - = reload_completed - ? (REG_P (reg_op) && FP_REGNUM_P (REGNO (reg_op))) - : (GET_MODE_CLASS (mem_mode) != MODE_INT - && (load_p || !aarch64_const_zero_rtx_p (reg_op))); - - // Note ldp_operand_mode_ok_p already rejected VL modes. + const bool fpsimd_op_p = bb_state->fpsimd_op_p (reg_op, mem_mode, load_p); + // Note pair_operand_mode_ok_p already rejected VL modes. const HOST_WIDE_INT mem_size = GET_MODE_SIZE (mem_mode).to_constant (); const lfs_fields lfs = { load_p, fpsimd_op_p, mem_size }; @@ -487,7 +627,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) poly_int64 mem_off; rtx addr = XEXP (mem, 0); const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; - rtx base = ldp_strip_offset (mem, &mem_off); + rtx base = pair_mem_strip_offset (mem, &mem_off); if (!REG_P (base)) return; @@ -506,8 +646,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) // elimination offset pre-RA, we should postpone forming pairs on such // accesses until after RA. // - // As it stands, addresses with offsets in range for LDR but not - // in range for LDP/STP are currently reloaded inefficiently, + // As it stands, addresses in range for an individual load/store but not + // for a paired access are currently reloaded inefficiently, // ending up with a separate base register for each pair. // // In theory LRA should make use of @@ -519,8 +659,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) // that calls targetm.legitimize_address_displacement. // // So for now, it's better to punt when we can't be sure that the - // offset is in range for LDP/STP. Out-of-range cases can then be - // handled after RA by the out-of-range LDP/STP peepholes. Eventually, it + // offset is in range for paired access. Out-of-range cases can then be + // handled after RA by the out-of-range PAIR MEM peepholes. Eventually, it // would be nice to handle known out-of-range opportunities in the // pass itself (for stack accesses, this would be in the post-RA pass). if (!reload_completed @@ -573,8 +713,8 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem) gcc_unreachable (); // Base defs should be unique. } - // Punt on misaligned offsets. LDP/STP require offsets to be a multiple of - // the access size. + // Punt on misaligned offsets. Paired memory accesses require offsets + // to be a multiple of the access size. if (!multiple_p (mem_off, mem_size)) return; @@ -614,7 +754,7 @@ static bool no_ignore (insn_info *) { return false; } // making use of alias disambiguation. static insn_info * latest_hazard_before (insn_info *insn, rtx *ignore, - insn_info *ignore_insn = nullptr) + insn_info *ignore_insn = 0) { insn_info *result = nullptr; @@ -1150,7 +1290,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed) const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC; poly_int64 offset; - rtx this_base = ldp_strip_offset (mem, &offset); + rtx this_base = pair_mem_strip_offset (mem, &offset); gcc_assert (REG_P (this_base)); if (base_reg) gcc_assert (rtx_equal_p (base_reg, this_base)); @@ -1286,7 +1426,11 @@ find_trailing_add (insn_info *insns[2], off_hwi /= access_size; - if (off_hwi < LDP_MIN_IMM || off_hwi > LDP_MAX_IMM) + pair_fusion *pfuse; + aarch64_pair_fusion derived; + pfuse = &derived; + + if (pfuse->pair_mem_in_range_p (off_hwi)) return nullptr; auto dump_prefix = [&]() @@ -1328,7 +1472,7 @@ find_trailing_add (insn_info *insns[2], // We just emitted a tombstone with uid UID, track it in a bitmap for // this BB so we can easily identify it later when cleaning up tombstones. void -ldp_bb_info::track_tombstone (int uid) +pair_fusion_bb_info::track_tombstone (int uid) { if (!m_emitted_tombstone) { @@ -1528,7 +1672,7 @@ fixup_debug_uses (obstack_watermark &attempt, gcc_checking_assert (GET_RTX_CLASS (GET_CODE (XEXP (mem, 0))) == RTX_AUTOINC); - base = ldp_strip_offset (mem, &offset); + base = pair_mem_strip_offset (mem, &offset); gcc_checking_assert (REG_P (base) && REGNO (base) == base_regno); } fixup_debug_use (attempt, use, def, base, offset); @@ -1664,7 +1808,7 @@ fixup_debug_uses (obstack_watermark &attempt, // BASE gives the chosen base candidate for the pair and MOVE_RANGE is // a singleton range which says where to place the pair. bool -ldp_bb_info::fuse_pair (bool load_p, +pair_fusion_bb_info::fuse_pair (bool load_p, unsigned access_size, int writeback, insn_info *i1, insn_info *i2, @@ -1800,7 +1944,7 @@ ldp_bb_info::fuse_pair (bool load_p, { if (dump_file) fprintf (dump_file, - " ldp: i%d has wb but subsequent i%d has non-wb " + " load pair: i%d has wb but subsequent i%d has non-wb " "update of base (r%d), dropping wb\n", insns[0]->uid (), insns[1]->uid (), base_regno); gcc_assert (writeback_effect); @@ -1823,7 +1967,7 @@ ldp_bb_info::fuse_pair (bool load_p, } // If either of the original insns had writeback, but the resulting pair insn - // does not (can happen e.g. in the ldp edge case above, or if the writeback + // does not (can happen e.g. in the load pair edge case above, or if the writeback // effects cancel out), then drop the def(s) of the base register as // appropriate. // @@ -1842,7 +1986,7 @@ ldp_bb_info::fuse_pair (bool load_p, // update of the base register and try and fold it in to make this into a // writeback pair. insn_info *trailing_add = nullptr; - if (aarch64_ldp_writeback > 1 + if (bb_state->pair_trailing_writeback_p () && !writeback_effect && (!load_p || (!refers_to_regno_p (base_regno, base_regno + 1, XEXP (pats[0], 0), nullptr) @@ -1863,14 +2007,14 @@ ldp_bb_info::fuse_pair (bool load_p, } // Now that we know what base mem we're going to use, check if it's OK - // with the ldp/stp policy. + // with the pair mem policy. rtx first_mem = XEXP (pats[0], load_p); - if (!aarch64_mem_ok_with_ldpstp_policy_model (first_mem, - load_p, - GET_MODE (first_mem))) + if (!bb_state->pair_mem_ok_with_policy (first_mem, + load_p, + GET_MODE (first_mem))) { if (dump_file) - fprintf (dump_file, "punting on pair (%d,%d), ldp/stp policy says no\n", + fprintf (dump_file, "punting on pair (%d,%d), pair mem policy says no\n", i1->uid (), i2->uid ()); return false; } @@ -1878,21 +2022,10 @@ ldp_bb_info::fuse_pair (bool load_p, rtx reg_notes = combine_reg_notes (first, second, load_p); rtx pair_pat; - if (writeback_effect) - { - auto patvec = gen_rtvec (3, writeback_effect, pats[0], pats[1]); - pair_pat = gen_rtx_PARALLEL (VOIDmode, patvec); - } - else if (load_p) - pair_pat = aarch64_gen_load_pair (XEXP (pats[0], 0), - XEXP (pats[1], 0), - XEXP (pats[0], 1)); - else - pair_pat = aarch64_gen_store_pair (XEXP (pats[0], 0), - XEXP (pats[0], 1), - XEXP (pats[1], 1)); + pair_pat = bb_state->gen_mem_pair (pats, writeback_effect, load_p); insn_change *pair_change = nullptr; + auto set_pair_pat = [pair_pat,reg_notes](insn_change *change) { rtx_insn *rti = change->insn ()->rtl (); validate_unshare_change (rti, &PATTERN (rti), pair_pat, true); @@ -2133,15 +2266,6 @@ load_modified_by_store_p (insn_info *load, return false; } -// Virtual base class for load/store walkers used in alias analysis. -struct alias_walker -{ - virtual bool conflict_p (int &budget) const = 0; - virtual insn_info *insn () const = 0; - virtual bool valid () const = 0; - virtual void advance () = 0; -}; - // Implement some common functionality used by both store_walker // and load_walker. template @@ -2259,13 +2383,13 @@ public: // // We try to maintain the invariant that if a walker becomes invalid, we // set its pointer to null. -static void -do_alias_analysis (insn_info *alias_hazards[4], +void +pair_fusion_bb_info::do_alias_analysis (insn_info *alias_hazards[4], alias_walker *walkers[4], bool load_p) { const int n_walkers = 2 + (2 * !load_p); - int budget = aarch64_ldp_alias_check_limit; + int budget = bb_state->pair_mem_alias_check_limit (); auto next_walker = [walkers,n_walkers](int current) -> int { for (int j = 1; j <= n_walkers; j++) @@ -2365,7 +2489,7 @@ get_viable_bases (insn_info *insns[2], { const bool is_lower = (i == reversed); poly_int64 poly_off; - rtx base = ldp_strip_offset (cand_mems[i], &poly_off); + rtx base = pair_mem_strip_offset (cand_mems[i], &poly_off); if (GET_RTX_CLASS (GET_CODE (XEXP (cand_mems[i], 0))) == RTX_AUTOINC) writeback |= (1 << i); @@ -2373,7 +2497,7 @@ get_viable_bases (insn_info *insns[2], continue; // Punt on accesses relative to eliminable regs. See the comment in - // ldp_bb_info::track_access for a detailed explanation of this. + // pair_fusion_bb_info::track_access for a detailed explanation of this. if (!reload_completed && (REGNO (base) == FRAME_POINTER_REGNUM || REGNO (base) == ARG_POINTER_REGNUM)) @@ -2397,7 +2521,11 @@ get_viable_bases (insn_info *insns[2], if (!is_lower) base_off--; - if (base_off < LDP_MIN_IMM || base_off > LDP_MAX_IMM) + pair_fusion *pfuse; + aarch64_pair_fusion derived; + pfuse = &derived; + + if (pfuse->pair_mem_in_range_p (base_off)) continue; use_info *use = find_access (insns[i]->uses (), REGNO (base)); @@ -2454,12 +2582,12 @@ get_viable_bases (insn_info *insns[2], } // Given two adjacent memory accesses of the same size, I1 and I2, try -// and see if we can merge them into a ldp or stp. +// and see if we can merge them into a paired accesses load and store. // // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true // if the accesses are both loads, otherwise they are both stores. bool -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size, insn_info *i1, insn_info *i2) { if (dump_file) @@ -2494,7 +2622,7 @@ ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size, { if (dump_file) fprintf (dump_file, - "punting on ldp due to reg conflcits (%d,%d)\n", + "punting on pair mem load due to reg conflcits (%d,%d)\n", insns[0]->uid (), insns[1]->uid ()); return false; } @@ -2843,7 +2971,7 @@ debug (const insn_list_t &l) // we can't re-order them anyway, so provided earlier passes have cleaned up // redundant loads, we shouldn't miss opportunities by doing this. void -ldp_bb_info::merge_pairs (insn_list_t &left_list, +pair_fusion_bb_info::merge_pairs (insn_list_t &left_list, insn_list_t &right_list, bool load_p, unsigned access_size) @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t &left_list, // of accesses. If we find two sets of adjacent accesses, call // merge_pairs. void -ldp_bb_info::transform_for_base (int encoded_lfs, - access_group &group) +pair_fusion_bb_info::transform_for_base (int encoded_lfs, + access_group &group) { const auto lfs = decode_lfs (encoded_lfs); const unsigned access_size = lfs.size; @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, access.cand_insns, lfs.load_p, access_size); - skip_next = access.cand_insns.empty (); + skip_next = bb_state->cand_insns_empty_p (access.cand_insns); } prev_access = &access; } @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int encoded_lfs, // and remove all the tombstone insns, being sure to reparent any uses // of mem to previous defs when we do this. void -ldp_bb_info::cleanup_tombstones () +pair_fusion_bb_info::cleanup_tombstones () { // No need to do anything if we didn't emit a tombstone insn for this BB. if (!m_emitted_tombstone) @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () template void -ldp_bb_info::traverse_base_map (Map &map) +pair_fusion_bb_info::traverse_base_map (Map &map) { for (auto kv : map) { @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) } void -ldp_bb_info::transform () +pair_fusion_bb_info::transform () { traverse_base_map (expr_map); traverse_base_map (def_map); @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) // for load/store candidates. If running after RA, also try and promote // non-writeback pairs to use writeback addressing. Then try to fuse // candidates into pairs. -void ldp_fusion_bb (bb_info *bb) +void pair_fusion::ldp_fusion_bb (bb_info *bb) { - const bool track_loads - = aarch64_tune_params.ldp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; - const bool track_stores - = aarch64_tune_params.stp_policy_model != AARCH64_LDP_STP_POLICY_NEVER; + const bool track_loads = track_load_p (); + const bool track_stores = track_store_p (); - ldp_bb_info bb_state (bb); + aarch64_pair_fusion derived; + pair_fusion_bb_info bb_info (bb, &derived); for (auto insn : bb->nondebug_insns ()) { @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) continue; rtx pat = PATTERN (rti); - if (reload_completed - && aarch64_ldp_writeback > 1 - && GET_CODE (pat) == PARALLEL - && XVECLEN (pat, 0) == 2) + if (pair_mem_promote_writeback_p (pat)) try_promote_writeback (insn); if (GET_CODE (pat) != SET) continue; if (track_stores && MEM_P (XEXP (pat, 0))) - bb_state.track_access (insn, false, XEXP (pat, 0)); + bb_info.track_access (insn, false, XEXP (pat, 0)); else if (track_loads && MEM_P (XEXP (pat, 1))) - bb_state.track_access (insn, true, XEXP (pat, 1)); + bb_info.track_access (insn, true, XEXP (pat, 1)); } - bb_state.transform (); - bb_state.cleanup_tombstones (); + bb_info.transform (); + bb_info.cleanup_tombstones (); } void ldp_fusion () { ldp_fusion_init (); + pair_fusion *pfuse; + aarch64_pair_fusion derived; + pfuse = &derived; for (auto bb : crtl->ssa->bbs ()) - ldp_fusion_bb (bb); + pfuse->ldp_fusion_bb (bb); ldp_fusion_destroy (); }