From patchwork Thu Jun 30 14:39:42 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 102765 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 00FB5B6F53 for ; Fri, 1 Jul 2011 00:40:50 +1000 (EST) Received: (qmail 5875 invoked by alias); 30 Jun 2011 14:40:49 -0000 Received: (qmail 5852 invoked by uid 22791); 30 Jun 2011 14:40:41 -0000 X-SWARE-Spam-Status: No, hits=-0.0 required=5.0 tests=AWL, BAYES_50, TW_CF, TW_DB, TW_FD, TW_TM, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e1.ny.us.ibm.com (HELO e1.ny.us.ibm.com) (32.97.182.141) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 30 Jun 2011 14:40:13 +0000 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e1.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p5UES9UM005279 for ; Thu, 30 Jun 2011 10:28:09 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p5UEeAbq153764 for ; Thu, 30 Jun 2011 10:40:11 -0400 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p5UEdj00001796 for ; Thu, 30 Jun 2011 08:39:45 -0600 Received: from [9.10.86.63] (ibm-57p4op1vszq.rchland.ibm.com [9.10.86.63] (may be forged)) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p5UEdisa001683; Thu, 30 Jun 2011 08:39:44 -0600 Subject: [PATCH] Address lowering [1/3] Main patch From: "William J. Schmidt" To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com Date: Thu, 30 Jun 2011 09:39:42 -0500 Message-ID: <1309444782.26980.52.camel@oc2474580526.ibm.com> Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This is the first of three patches related to lowering addressing expressions to MEM_REFs and TARGET_MEM_REFs in late gimple. This patch contains the new pass together with supporting changes in existing modules. The second patch contains an independent change to the RTL forward propagator to keep it from undoing an optimization made in the first patch. The third patch contains new test cases and changes to existing test cases. Although I've broken it up into three patches to make the review easier, it would be best to commit at least the first and third together to avoid regressions. The second can stand alone. I've done regression tests on powerpc64 and x86_64, and have asked Andreas Krebbel to test against the IBM z (390) platform. I've done performance regression testing on powerpc64. The only performance regression of note is the 2% degradation to 188.ammp due to loss of field disambiguation information. As discussed in another thread, fixing this introduces more complexity than it's worth. I look forward to your comments! Thanks, Bill 2011-06-30 Bill Schmidt PR rtl-optimization/46556 * dbgcnt.def (lower_addr): New debug counter. * tree-ssa-lower-addr.c: New. * tree.c (may_be_unaligned_p): Moved from tree-ssa-loop-ivopts.c. * tree.h (may_be_unaligned_p): New declaration. (copy_ref_info): Likewise. * tree-pass.h (pass_tree_lower_addr): Likewise. * tree-ssa-loop-ivopts.c (may_be_unaligned_p): Moved to tree.c. (copy_ref_info): Moved to tree-ssa-address.c. * timevar.def (TV_TREE_LOWER_ADDR): New timevar. * tree-ssa-address.c (addr_to_parts): Provide for call on behalf of address lowering. (copy_ref_info): Moved from tree-ssa-loop-ivopts.c. * tree-affine.c (tree-flow.h): New include. (mem_tree_to_aff_combination_expand): New. * tree-affine.h (mem_tree_to_aff_combination-expand): New decl. * common.opt (lower-addr): New -f option. * Makefile.in (tree-ssa-lower-addr.o): New. (tree-affine.c): New dependency on tree-flow.h. * gimple.c (gimple_assign_conversion_p): New. * gimple.h (gimple_assign_conversion_p): New decl. * passes.c (init_optimization_passes): Add new pass. Index: gcc/dbgcnt.def =================================================================== --- gcc/dbgcnt.def (revision 175664) +++ gcc/dbgcnt.def (working copy) @@ -1,5 +1,5 @@ /* This file contains the list of the debug counter for GCC. - Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2006, 2007, 2008, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -184,3 +184,4 @@ DEBUG_COUNTER (sms_sched_loop) DEBUG_COUNTER (store_motion) DEBUG_COUNTER (split_for_sched2) DEBUG_COUNTER (tail_call) +DEBUG_COUNTER (lower_addr) Index: gcc/tree-ssa-lower-addr.c =================================================================== --- gcc/tree-ssa-lower-addr.c (revision 0) +++ gcc/tree-ssa-lower-addr.c (revision 0) @@ -0,0 +1,1430 @@ +/* Expose target addressing modes in late gimple representation. + Copyright (C) 2011 + Free Software Foundation, Inc. + Contributed by Bill Schmidt + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by the +Free Software Foundation; either version 3, or (at your option) any +later version. + +GCC is distributed in the hope that it will be useful, but WITHOUT +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* This pass performs a scan over the gimple representation to expose + target machine addressing in reference-class expressions. This is + necessary for some targets for which the current address + canonicalization is suboptimal. Induction variable optimizations + already expose target addressing for many (but not all) expressions + in loops, so this has most benefit in non-loop code. */ + +/* Address lowering in gimple loses a small amount of information that + is useful in disambiguating memory references. In alias.c, the + nonoverlapping_component_refs_p function examines the mem_exprs + associated with two component references and attempts to prove + they do not overlap. This is done by walking the types of the + two mem_exprs looking for a common record type, and demonstrating + that the two references address non-overlapping fields of that + type. This improves on TBAA by further disambiguating references + to fields that have the same type. + + With addresses lowered in gimple, the mem_exprs no longer contain + field references, but instead contain MEM_REFs and TARGET_MEM_REFs + that cannot necessarily be proven to be field references. In any + case, they no longer describe the full type hierarchy of the + underlying reference, so in general we lose some opportunity to + disambiguate field references. + + This has been observed to cause occasional small degradations in + generated code. For example, a 2% degradation was measured for + 188.ammp (SPEC cpu2000) on powerpc64 when the information loss led + to a suboptimal instruction schedule. However, such occurrences + are believed to be infrequent and have minimal effects compared + with the overall benefit of address lowering. + + To address this problem would require somehow carrying the lost + type information in the intermediate representation from address + lowering to the expand phase. The cost and complexity of maintaining + this information does not currently seem justified. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tree.h" +#include "basic-block.h" +#include "tree-flow.h" +#include "tree-dump.h" +#include "tree-pass.h" +#include "timevar.h" +#include "cfgloop.h" +#include "gimple.h" +#include "tree-affine.h" +#include "tree-pretty-print.h" +#include "dbgcnt.h" +#include "alloc-pool.h" + +/* Forward references. */ +static bool avail_addr_cse (gimple); + +/* Hash table for remembering expressions that contribute to address + calculations in the current block and its dominators. */ +static htab_t avail_addr_table; + +/* Expression entry. Expressions are allocated from an allocation pool. + They are addressed both out of the hash table and from vectors of + locally available expressions for the current block and its + dominators. Expressions are freed from the allocation pool when + processing for the block where they were first seen completes. */ +typedef struct avail_addr_expr_d +{ + /* The expression consists of an operation and 1 or 2 operands. */ + enum tree_code operation; + tree operand1; + tree operand2; + + /* Its representative SSA name. */ + tree repr; + + /* Cached hash code. */ + hashval_t hashcode; +} avail_addr_expr, *avail_addr_expr_t; + +typedef const struct avail_addr_expr_d *const_avail_addr_expr_t; + +/* Allocation pool for expression entries. */ +static alloc_pool avail_addr_pool; + +/* Define a vector of expression entries. */ +DEF_VEC_P (avail_addr_expr_t); +DEF_VEC_ALLOC_P (avail_addr_expr_t, heap); + +/* Auxiliary information for each basic block for this pass. */ +typedef struct tree_lower_addr_bb_aux_d +{ + /* Expressions made available by this block. */ + VEC (avail_addr_expr_t, heap) *avail_exprs; +} tree_lower_addr_bb_aux, *tree_lower_addr_bb_aux_t; + +#define AVAIL_EXPRS(BB) ((tree_lower_addr_bb_aux_t) ((BB)->aux))->avail_exprs + +/* Basic block currently being processed. */ +static basic_block curr_bb; + +/* Memory references in the current block that have zero offsets. */ +static VEC (gimple, heap) *zero_offset_refs; + +/* Flag indicating that the CFG needs to be cleaned up afterwards. */ +static bool clean_cfg; + + +/* Callback to produce a hash value for an expression. */ + +static hashval_t +avail_addr_hash (const void *p) +{ + const_avail_addr_expr_t const expr = (const_avail_addr_expr_t) p; + return expr->hashcode; +} + +/* Callback when an element is removed from the hash table. An available + expression is deleted from the hash table after its generating block + and its dominated blocks have been processed, resulting in a call to + avail_addr_free to remove it from the allocation pool. */ + +static void +avail_addr_free (void *p) +{ + pool_free (avail_addr_pool, p); +} + +/* Return true if two leaf nodes are equal. */ + +static int +leaves_eq (tree opnd1, tree opnd2) +{ + /* Null nodes are vacuously equal. */ + if (!opnd1 && !opnd2) + return true; + + if (!opnd1 || !opnd2) + return false; + + if (TREE_CODE (opnd1) != TREE_CODE (opnd2)) + return false; + + /* Integer type nodes appear as the second RHS operand for a + CONVERT_EXPR in available expressions, indicating the type of + the LHS they will be assigned to. These must match exactly. */ + if (TYPE_P (opnd1)) + return (TYPE_UNSIGNED (opnd1) == TYPE_UNSIGNED (opnd2) + && TYPE_PRECISION (opnd1) == TYPE_PRECISION (opnd2)); + + return operand_equal_p (opnd1, opnd2, 0); +} + +/* Callback for hashtab support. Return true if two expressions are + equal. We are only interested in simple expressions involving SSA + names, addresses of vars or parms, integer constants, and mathematical + operations that can reasonably appear in addressing expressions. */ + +static int +avail_addr_eq (const void *p1, const void *p2) +{ + const_avail_addr_expr_t const entry1 = (const_avail_addr_expr_t) p1; + const_avail_addr_expr_t const entry2 = (const_avail_addr_expr_t) p2; + tree left_opnd1 = entry1->operand1; + tree left_opnd2 = entry2->operand1; + tree right_opnd1 = entry1->operand2; /* NULL_TREE for unary ops. */ + tree right_opnd2 = entry2->operand2; /* NULL_TREE for unary ops. */ + + if (entry1->operation != entry2->operation) + return false; + + if (TREE_CODE (left_opnd1) != TREE_CODE (left_opnd2)) + return false; + + if (entry1->operation == SSA_NAME || entry1->operation == NOP_EXPR) + return operand_equal_p (left_opnd1, left_opnd2, 0); + + if (right_opnd1 && right_opnd2 && + TREE_CODE (right_opnd1) != TREE_CODE (right_opnd2)) + return false; + + if (leaves_eq (left_opnd1, left_opnd2) && + leaves_eq (right_opnd1, right_opnd2)) + return true; + + /* Handle commutative operations. */ + if (!commutative_tree_code (entry1->operation)) + return false; + + return (leaves_eq (left_opnd1, right_opnd2) && + leaves_eq (right_opnd1, left_opnd2)); +} + +/* Provide a hash code for SSA_NAME, INTEGER_CST, type, and ADDR_EXPR of + var/parm operands. */ + +static hashval_t +hash_simple_op (tree operand) +{ + if (TREE_CODE (operand) == SSA_NAME) + return DECL_UID (SSA_NAME_VAR (operand)) * SSA_NAME_VERSION (operand); + else if (TREE_CODE (operand) == INTEGER_CST) + return (hashval_t)TREE_INT_CST_LOW (operand); + else if (TREE_CODE (operand) == ADDR_EXPR) + { + tree subopnd = TREE_OPERAND (operand, 0); + gcc_assert (TREE_CODE (subopnd) == VAR_DECL || + TREE_CODE (subopnd) == PARM_DECL); + return DECL_UID (subopnd); + } + else if (TYPE_P (operand)) + return TYPE_PRECISION (operand); + + gcc_unreachable (); + return 0; +} + +/* Determine whether hash_simple_op can be called on OPERAND. */ + +static bool +valid_simple_op (tree operand) +{ + if (TREE_CODE (operand) == SSA_NAME + || TREE_CODE (operand) == INTEGER_CST + || TYPE_P (operand)) + return true; + + if (TREE_CODE (operand) == ADDR_EXPR) + { + tree subopnd = TREE_OPERAND (operand, 0); + if (TREE_CODE (subopnd) == VAR_DECL || TREE_CODE (subopnd) == PARM_DECL) + return true; + } + + return false; +} + +/* Allocate a new avail_addr_expr from the pool and fill it in. */ + +static avail_addr_expr_t +alloc_avail_addr_expr (enum tree_code operation_in, tree operand1_in, + tree operand2_in, tree repr_in, + hashval_t hashcode_in) +{ + avail_addr_expr_t entry = (avail_addr_expr_t)pool_alloc (avail_addr_pool); + + entry->operation = operation_in; + entry->operand1 = operand1_in; + entry->operand2 = operand2_in; + entry->repr = repr_in; + entry->hashcode = hashcode_in; + + return entry; +} + +/* Free all available expressions generated locally in BB. */ + +static void +free_avail_exprs (basic_block bb) +{ + unsigned i, length = VEC_length (avail_addr_expr_t, AVAIL_EXPRS (bb)); + avail_addr_expr_t entry; + void **slot; + + for (i = 0; i < length; i++) + { + entry = VEC_index (avail_addr_expr_t, AVAIL_EXPRS (bb), i); + slot = htab_find_slot_with_hash (avail_addr_table, entry, + entry->hashcode, NO_INSERT); + /* This removes the entry from the hash table, and indirectly + removes it from the allocation pool via the callback to + avail_addr_free. */ + htab_clear_slot (avail_addr_table, slot); + } + + /* Reclaim the available expressions vector for this block. */ + VEC_free (avail_addr_expr_t, heap, AVAIL_EXPRS (bb)); +} + +/* Look up NAME in the hash table and, if found, return the entry of its + representative. Otherwise make it its own representative and return + that entry. */ + +static avail_addr_expr_t +avail_addr_ssa_name_cse (tree name) +{ + avail_addr_expr_t new_entry = + alloc_avail_addr_expr (SSA_NAME, name, NULL_TREE, name, + hash_simple_op (name)); + + void **slot = htab_find_slot_with_hash (avail_addr_table, new_entry, + new_entry->hashcode, INSERT); + if (!*slot) + { + *slot = new_entry; + VEC_safe_push (avail_addr_expr_t, heap, + AVAIL_EXPRS (curr_bb), new_entry); + } + + return (avail_addr_expr_t)*slot; +} + +/* The given RHS was found on a code-free type conversion. Look it up + in the hash table; if it wasn't there previously, make LHS its + representative. Otherwise keep the existing representative. Return + a pointer to its hash table entry. */ + +static avail_addr_expr_t +avail_addr_nop_expr_cse (tree lhs, tree rhs) +{ + avail_addr_expr_t new_entry = + alloc_avail_addr_expr (NOP_EXPR, rhs, NULL_TREE, lhs, + hash_simple_op (rhs)); + void **slot = htab_find_slot_with_hash (avail_addr_table, new_entry, + new_entry->hashcode, INSERT); + if (!*slot) + { + *slot = new_entry; + VEC_safe_push (avail_addr_expr_t, heap, AVAIL_EXPRS (curr_bb), new_entry); + } + + return (avail_addr_expr_t)*slot; +} + +/* The given RHS was found on an integer type conversion that will generate + code. Look it up in the hash table using the type of the LHS; if it + wasn't there previously, make LHS its representative. Otherwise keep the + existing representative. Return a pointer to its hash table entry. */ + +static avail_addr_expr_t +avail_addr_convert_expr_cse (tree lhs, tree rhs) +{ + hashval_t hash = hash_simple_op (rhs) ^ hash_simple_op (TREE_TYPE (lhs)); + avail_addr_expr_t new_entry = + alloc_avail_addr_expr (CONVERT_EXPR, rhs, TREE_TYPE (lhs), lhs, hash); + void **slot = htab_find_slot_with_hash (avail_addr_table, new_entry, + hash, INSERT); + if (!*slot) + { + *slot = new_entry; + VEC_safe_push (avail_addr_expr_t, heap, AVAIL_EXPRS (curr_bb), new_entry); + } + + return (avail_addr_expr_t)*slot; +} + +/* Perform CSE on OPND, possibly replacing it with a representative + SSA name. Returns true iff the tree rooted at *OPND has been + modified. */ + +static bool +avail_addr_expr_cse (tree *opnd) +{ + enum tree_code code; + avail_addr_expr_t entry_ptr; + unsigned int opnd_idx; + bool changed = false; + + if (!*opnd) + return false; + + code = TREE_CODE (*opnd); + + switch (code) + { + case SSA_NAME: + entry_ptr = avail_addr_ssa_name_cse (*opnd); + if (!operand_equal_p (*opnd, entry_ptr->repr, 0)) + { + *opnd = entry_ptr->repr; + return true; + } + return false; + + case INTEGER_CST: + return false; + + default: + for (opnd_idx = 0; opnd_idx < TREE_CODE_LENGTH (code); opnd_idx++) + changed = avail_addr_expr_cse (&TREE_OPERAND (*opnd, opnd_idx)) + || changed; + } + return changed; +} + +/* If STMT is some sort of copy operation, attempt CSE on it and return + TRUE; else return FALSE. Set *CHANGED to TRUE iff STMT is modified + by performing CSE. */ + +static bool +avail_addr_cse_copy (gimple stmt, tree lhs, bool *changed) +{ + tree rhs, *ops; + avail_addr_expr_t rhs_entry, new_entry; + void **slot; + + /* If copying one SSA name to another, first find the representative + of the RHS, and then make that the representative of the LHS. */ + if (gimple_assign_ssa_name_copy_p (stmt)) + { + rhs = gimple_assign_rhs1 (stmt); + + /* We're only interested in simple SSA names, such as might have + been generated during address expansion. Ignore complex + memory references. */ + if (TREE_CODE (rhs) == COMPONENT_REF || + TREE_CODE (rhs) == ARRAY_REF || + TREE_CODE (rhs) == MEM_REF || + TREE_CODE (rhs) == TARGET_MEM_REF) + { + *changed = false; + return true; + } + + rhs_entry = avail_addr_ssa_name_cse (rhs); + new_entry = alloc_avail_addr_expr (SSA_NAME, lhs, NULL_TREE, + rhs_entry->repr, + hash_simple_op (lhs)); + slot = htab_find_slot_with_hash (avail_addr_table, new_entry, + new_entry->hashcode, INSERT); + gcc_assert (!*slot); + *slot = new_entry; + VEC_safe_push (avail_addr_expr_t, heap, + AVAIL_EXPRS (curr_bb), new_entry); + *changed = false; + return true; + } + + /* Simple converts that generate no code are treated slightly + differently. Even if the RHS is a simple variable, the conversion + means we can't propagate it directly. Track these as NOP_EXPRs + in the hash table. The first occurrence of a NOP_EXPR pattern + will receive its LHS SSA_NAME as its representative. */ + if (gimple_assign_unary_nop_p (stmt)) + { + ops = gimple_ops (stmt); + rhs = ops[1]; + + if (TREE_CODE (lhs) != SSA_NAME || TREE_CODE (rhs) != SSA_NAME) + { + *changed = false; + return true; + } + + /* First do any replacement on the SSA_NAME of the RHS. */ + *changed = avail_addr_expr_cse (&rhs); + + /* Now CSE the conversion, tracking it as a NOP_EXPR. */ + rhs_entry = avail_addr_nop_expr_cse (lhs, rhs); + + if (rhs_entry->repr != lhs) + { + gimple_set_op (stmt, 1, rhs_entry->repr); + if (TREE_TYPE (lhs) == TREE_TYPE (rhs_entry->repr)) + gimple_assign_set_rhs_code (stmt, SSA_NAME); + (void)avail_addr_cse (stmt); + *changed = true; + } + return true; + } + + /* Conversions that require code generation are recorded in the hash + table with two operands: the SSA name of the RHS, and the tree + type of the LHS. The first occurrence of a CONVERT_EXPR pattern + will receive its LHS SSA_NAME as its representative. */ + if (gimple_assign_conversion_p (stmt)) + { + ops = gimple_ops (stmt); + rhs = ops[1]; + + if (TREE_CODE (lhs) != SSA_NAME + || TREE_CODE (rhs) != SSA_NAME + || TREE_CODE (TREE_TYPE (lhs)) != INTEGER_TYPE) + { + *changed = false; + return true; + } + + /* First do any replacement on the SSA_NAME of the RHS. */ + *changed = avail_addr_expr_cse (&rhs); + + /* Now CSE the conversion, tracking it as a CONVERT_EXPR. */ + rhs_entry = avail_addr_convert_expr_cse (lhs, rhs); + + if (rhs_entry->repr != lhs) + { + gimple_set_op (stmt, 1, rhs_entry->repr); + gimple_assign_set_rhs_code (stmt, SSA_NAME); + (void)avail_addr_cse (stmt); + *changed = true; + } + return true; + } + + /* If it's some other form of copy to a simple SSA_NAME, make the + LHS its own representative. The RHS will be a constant or + something equally simple that doesn't need to be examined. + If the LHS is not simple, perform CSE substitution on its + suboperands. */ + if (gimple_assign_copy_p (stmt)) + { + if (TREE_CODE (lhs) == SSA_NAME) + { + (void)avail_addr_ssa_name_cse (lhs); + *changed = false; + } + else + *changed = avail_addr_expr_cse (&lhs); + return true; + } + + /* It's not a copy, so defer to avail_addr_cse_noncopy. */ + return false; +} + +/* Return TRUE iff CODE may reasonably appear on the RHS of a non-copy + statement used to construct an address. This includes codes that + may be used to calculate an array index. */ +static bool +is_relevant_rhs_code (enum tree_code code) +{ + switch (code) + { + case PLUS_EXPR: + case POINTER_PLUS_EXPR: + case MINUS_EXPR: + case MULT_EXPR: + case EXACT_DIV_EXPR: + case TRUNC_DIV_EXPR: + case CEIL_DIV_EXPR: + case FLOOR_DIV_EXPR: + case ROUND_DIV_EXPR: + case TRUNC_MOD_EXPR: + case CEIL_MOD_EXPR: + case FLOOR_MOD_EXPR: + case ROUND_MOD_EXPR: + case NEGATE_EXPR: + case MIN_EXPR: + case MAX_EXPR: + case ABS_EXPR: + case LSHIFT_EXPR: + case RSHIFT_EXPR: + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + case BIT_AND_EXPR: + case BIT_NOT_EXPR: + return true; + default: + break; + } + return false; +} + +/* Attempt CSE on STMT, which is not some form of copy. Return TRUE + iff STMT has been modified as a result. */ + +static bool +avail_addr_cse_noncopy (gimple stmt, tree lhs) +{ + tree *ops; + unsigned i, num_ops, initial_i = 0, imax; + bool changed = false; + enum tree_code subcode, code[2]; + hashval_t hash; + avail_addr_expr_t new_entry; + void **slot; + + /* Walk each tree on the RHS looking for operands that are eligible to + be in the CSE table, replacing them with their representatives. + If LHS is not an SSA_NAME, walk its operands, as well. */ + ops = gimple_ops (stmt); + num_ops = gimple_num_ops (stmt); + + if (TREE_CODE (lhs) == SSA_NAME) + initial_i = 1; + + for (i = initial_i; i < num_ops; i++) + changed = avail_addr_expr_cse (&ops[i]) || changed; + + /* Now the operands are canonicalized, so try to match the expression + in the hash table. */ + subcode = gimple_assign_rhs_code (stmt); + + if (is_relevant_rhs_code (subcode)) + { + if (TREE_CODE_CLASS (subcode) == tcc_unary) + imax = 1; + else + imax = 2; + + /* Take a quick exit if this is obviously not part of an address + expression. */ + for (i = 0; i < imax; i++) + { + code[i] = TREE_CODE (ops[i+1]); + if (code[i] == REAL_CST || code[i] == FIXED_CST || + code[i] == COMPLEX_CST || code[i] == VECTOR_CST) + return changed; + + /* Same for complex expressions that haven't been reduced. */ + if (code[i] == ADDR_EXPR) + { + enum tree_code subcode1 = TREE_CODE (TREE_OPERAND (ops[i+1], 0)); + if (subcode1 != VAR_DECL && subcode1 != PARM_DECL) + return changed; + } + } + + /* If all suboperands are simple, look up the expression in the + hash table. */ + if (!valid_simple_op (ops[1])) + return changed; + + hash = subcode ^ hash_simple_op (ops[1]); + + if (TREE_CODE_CLASS (subcode) == tcc_unary) + new_entry = alloc_avail_addr_expr (subcode, ops[1], NULL_TREE, + lhs, hash); + else + { + if (!valid_simple_op (ops[2])) + return changed; + + hash ^= hash_simple_op (ops[2]); + new_entry = alloc_avail_addr_expr (subcode, ops[1], ops[2], + lhs, hash); + } + + slot = htab_find_slot_with_hash (avail_addr_table, new_entry, + new_entry->hashcode, INSERT); + + /* If we've seen it before, attempt to change this statement to an + SSA_NAME with the representative appearing as RHS1. If the + types of the LHS and the representative differ, attempt to use + a NOP_EXPR for some simple cases. Otherwise leave the RHS + alone. */ + if (*slot) + { + avail_addr_expr_t cached_entry_ptr = (avail_addr_expr_t)*slot; + tree lhs_type, repr_type; + enum tree_code new_code; + + lhs_type = TREE_TYPE (lhs); + repr_type = TREE_TYPE (cached_entry_ptr->repr); + + if (lhs_type == repr_type) + new_code = SSA_NAME; + + else if ((INTEGRAL_TYPE_P (lhs_type) || + POINTER_TYPE_P (lhs_type)) && + (INTEGRAL_TYPE_P (repr_type) || + POINTER_TYPE_P (repr_type)) && + (TYPE_SIZE (lhs_type) == TYPE_SIZE (repr_type))) + new_code = NOP_EXPR; + + else + return changed; + + gimple_assign_set_rhs_code (stmt, new_code); + gimple_set_num_ops (stmt, 2); + gimple_set_op (stmt, 1, cached_entry_ptr->repr); + + /* Reprocess this statement so the LHS picks up the + new RHS as its representative. */ + (void)avail_addr_cse (stmt); + changed = true; + } + else + { + *slot = new_entry; + VEC_safe_push (avail_addr_expr_t, heap, + AVAIL_EXPRS (curr_bb), new_entry); + } + } + + return changed; +} + +/* Remember expressions that may contribute to address calculation, + and replace repeated occurrences with a representative SSA name. + Return true iff STMT has been modified. */ + +static bool +avail_addr_cse (gimple stmt) +{ + tree lhs; + bool changed = false; + + /* We are only interested in assignments. */ + if (!is_gimple_assign (stmt)) + return false; + + lhs = gimple_assign_lhs (stmt); + + if (avail_addr_cse_copy (stmt, lhs, &changed)) + return changed; + + return avail_addr_cse_noncopy (stmt, lhs); + +} + +/* Return TRUE iff EXPR contains a direct reference to a field of a small, + nonvolatile structure (i.e., something we hope to keep in registers). */ +static bool +is_small_struct_reference (tree expr) +{ + bool ref_found = false; + + while (expr) + { + if (TREE_THIS_VOLATILE (expr)) + return false; + + switch (TREE_CODE (expr)) + { + case MEM_REF: + case TARGET_MEM_REF: + /* Indirection is present; not what we're looking for. */ + return false; + + case COMPONENT_REF: + if (TREE_CODE (TREE_OPERAND (expr, 1)) != FIELD_DECL) + return false; + if (TYPE_MODE (TREE_TYPE (TREE_OPERAND (expr, 0))) == SImode) + ref_found = true; + /* We found a small struct ref, but look down the base expression + for indirection before returning TRUE. */ + break; + + default: + break; + } + if (TREE_OPERAND_LENGTH (expr) > 0) + expr = TREE_OPERAND (expr, 0); + else + expr = NULL_TREE; + } + + return ref_found; +} + +/* Return TRUE iff any of the elements of AFF contain an ADDR_EXPR of + a VAR_DECL, PARM_DECL, or RESULT_DECL that is not already marked + as addressable. */ +static bool +aff_comb_causes_addr_taken (aff_tree aff) +{ + unsigned i; + + for (i = 0; i < aff.n; i++) + { + tree expr = aff.elts[i].val; + + while (expr) + { + if (TREE_CODE (expr) == ADDR_EXPR) + { + tree base = TREE_OPERAND (expr, 0); + enum tree_code code = TREE_CODE (base); + if ((code == VAR_DECL || code == PARM_DECL || code == RESULT_DECL) + && (! TREE_ADDRESSABLE (base))) + return true; + } + if (TREE_OPERAND_LENGTH (expr) > 0) + expr = TREE_OPERAND (expr, 0); + else + expr = NULL_TREE; + } + } + + return false; +} + +/* Return TRUE iff the affine combination of trees AFF relies on an update + of an induction variable. If this is the case, AFF's offset will be + nonzero, and one of its elements will reference a variable that is + directly or indirectly defined by a phi expression. That phi expression + will occur in a loop header. */ +static bool +aff_comb_relies_on_ivar_update (aff_tree aff) +{ + unsigned i; + + if (double_int_zero_p (aff.offset)) + return false; + + for (i = 0; i < aff.n; i++) + { + tree expr = aff.elts[i].val; + + if (TREE_CODE (expr) != SSA_NAME) + continue; + + while (expr) + { + gimple def_stmt = SSA_NAME_DEF_STMT (expr); + + if (gimple_code (def_stmt) == GIMPLE_PHI) + { + basic_block bb = gimple_bb (def_stmt); + return (bb == bb->loop_father->header); + } + + if (gimple_assign_ssa_name_copy_p (def_stmt) + || gimple_assign_unary_nop_p (def_stmt)) + { + expr = gimple_assign_rhs1 (def_stmt); + if (TREE_CODE (expr) != SSA_NAME) + expr = NULL_TREE; + } + else + expr = NULL_TREE; + } + } + + return false; +} + +/* Examine AFF to pick a component to use as the base hint to + create_mem_ref. */ + +static tree +choose_base_hint (aff_tree *aff) +{ + unsigned i; + + /* Pick the first tree with pointer type. If none exists, + pick the first tree that is either an SSA_NAME, NOP_EXPR, or + CONVERT_EXPR. The latter two can exist because, when a + cast loses precision, the affine tree expansion stops. */ + for (i = 0; i < aff->n; i++) + if (POINTER_TYPE_P (TREE_TYPE (aff->elts[i].val))) + return aff->elts[i].val; + + for (i = 0; i < aff->n; i++) + if (TREE_CODE (aff->elts[i].val) == SSA_NAME || + CONVERT_EXPR_CODE_P (TREE_CODE (aff->elts[i].val))) + return aff->elts[i].val; + + /* If all else fails, default to the first element. */ + return aff->elts[i].val; +} + +/* Return TRUE iff EXPR might be a hidden global store as defined by + the dead code eliminator. See tree-ssa-sink.c:is_hidden_global_store. */ + +static bool +may_be_global_store (tree expr) +{ + if (REFERENCE_CLASS_P (expr)) + expr = get_base_address (expr); + + /* If the base address couldn't be obtained, don't perform the + replacement. This shouldn't be possible if we've gotten this far. */ + if (expr == NULL_TREE) + return false; + + if (DECL_P (expr) && !is_global_var (expr)) + return false; + + if (INDIRECT_REF_P (expr) || + TREE_CODE (expr) == MEM_REF || + TREE_CODE (expr) == TARGET_MEM_REF) + return ptr_deref_may_alias_global_p (TREE_OPERAND (expr, 0)); + + if (CONSTANT_CLASS_P (expr)) + return true; + + return false; +} + +/* We are considering replacing ORIG with REPL. Return TRUE iff + ORIG will be recognized as a hidden global store by the dead + code eliminator, but REPL will not. */ +/* This should not be necessary to check, though it is harmless. + Without the check, the test in libstdc++-v3/testsuite/23_containers/ + vector/ext_pointer_modifiers/insert.cc fails. However, this is + because the affine expansion simplifies some complex addressing + sufficiently that the entire procedure uninitialized_fill_n_a + is proven to have no side effects. Examining the gimple prior + to this phase shows that it indeed does not have side effects at + that point. There may be a front end problem or an upstream + optimization problem here. */ + +static bool +global_store_may_be_lost (tree orig, tree repl) +{ + return (may_be_global_store (orig) && !may_be_global_store (repl)); +} + +/* Expose target addressing modes in EXPR. Logic extracted and + simplified from tree-ssa-loop-ivopts.c to handle non-ivar + opportunities. IS_VDEF_STORE is true iff EXPR is the lhs of + a virtual definition. */ + +static bool +tree_ssa_lower_addr_tree (tree *expr, gimple_stmt_iterator *gsi, + bool speed, bool is_vdef_store) +{ + tree addr = *expr, mem_ref, base_hint; + aff_tree aff; + enum tree_code code = TREE_CODE (addr); + gimple_stmt_iterator save_gsi; + gimple save_stmt; + /* + unsigned i; + */ + + if (!REFERENCE_CLASS_P (addr)) + return false; + + /* TODO: Bitfields are not yet handled. These will be lowered + separately to read-modify-write sequences in pass "memlower" in + gimple-low.c (work in progress by Richard Guenther). It would + be preferable for that lowering to happen prior to this pass. + It is also possible that this pass should be merged with that one; + but the two passes may have different ideal positions in the phase + ordering. */ + if (code == BIT_FIELD_REF) + return false; + if (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (addr, 1))) + return false; + + /* If the expression contains a direct field reference from a small + nonvolatile structure, don't expose the addressing. We assume such + things will remain in registers where possible. */ + if (is_small_struct_reference (*expr)) + return false; + + /* Express the address of the expression as an affine combination of + trees, looking through SSA defs. If the expression may not be + addressable or is otherwise not a candidate, detect that and exit. */ + if (!mem_tree_to_aff_combination_expand (&addr, &aff)) + return false; + + /* It is possible for aff to have no elements, in which case do + nothing. Example: lhs = MEM[(struct foo *) 0B].bar does not + produce an affine combination. */ + if (!aff.n) + return false; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Original expression = \n"); + print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\n\naff_comb =\n"); + print_aff (dump_file, &aff); + fprintf (dump_file, "\n"); + } + + /* If the affine combination contains any ADDR_EXPRs for decls that + are not yet marked as addressable, do not replace. We don't want + to interfere with downstream optimizations, particularly on autos. */ + if (aff_comb_causes_addr_taken (aff)) + return false; + + /* We must take care when inside a loop to avoid calculating induction + expressions twice. For example, an induction expression i = i + 1 + may appear prior to an addressing expression based on i. Assume + that IVOPTS will have already done the best possible job on such + addressing expressions, and leave them alone. */ + if (aff_comb_relies_on_ivar_update (aff)) + return false; + + /* Choose the best SSA name in the affine combination of trees + to coerce into the base position on the MEM_REF. If this isn't + done, a base expression of zero may occur, which confuses the + may-aliasing in subsequent dead store elimination. Furthermore, + if we choose badly, we can end up with a base expression that + points below the beginning of the object, which produces suboptimal + code. */ + base_hint = choose_base_hint (&aff); + + /* Copy the current statement iterator, and memorize the position + one statement previous. */ + save_stmt = gsi_stmt (*gsi); + save_gsi = *gsi; + gsi_prev (&save_gsi); + + /* Replace the input expression with an equivalent memory reference + legal on the target machine. As a side effect, feeding expressions + may be generated prior to GSI. */ + mem_ref = create_mem_ref (gsi, TREE_TYPE (*expr), &aff, + reference_alias_ptr_type (*expr), + NULL_TREE, base_hint, speed); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\nmem_ref = "); + print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\n\n"); + } + + /* Ensure the memory reference was not built with a base expression + of zero, which confuses the may-aliasing in subsequent dead + store elimination. This can happen with questionable coding + practices, such as treating an integer parameter as a pointer. + TODO: create_mem_ref should perhaps be modified to avoid + producing a zero base. */ + if (integer_zerop (TREE_OPERAND (mem_ref, 0))) + return false; + + /* If the original expression is a mem_ref that will be recognized + as a hidden global store (see tree-ssa-sink.c:is_hidden_global_store), + but the replacement will not be so recognized, the replacement + is unsafe. The statement may not be marked as always-necessary + in the dead code eliminator. */ + if (is_vdef_store && global_store_may_be_lost (*expr, mem_ref)) + return false; + + /* Chicken switch for debug. Use -fdbg-cnt=lower_addr:N to disable + after N replacements. */ + if (!dbg_cnt (lower_addr)) + return false; + + /* We're going to use the replacement. CSE any new gimple stmts + that were added, since we are likely exposing redundant + computations. */ + if (gsi_end_p (save_gsi)) + save_gsi = gsi_start (gsi->seq); + else + gsi_next (&save_gsi); + for (; gsi_stmt (save_gsi) != save_stmt; gsi_next (&save_gsi)) + (void)avail_addr_cse (gsi_stmt (save_gsi)); + + /* Copy reference data from the replaced expression. */ + copy_ref_info (mem_ref, *expr); + + /* Finish the replacement. */ + *expr = mem_ref; + + return true; +} + +/* Return TRUE iff EXPR is a memory reference with a zero offset, + a zero step, a non-zero index, and a zero index2. */ +static bool +is_zero_offset_ref (tree expr) +{ + tree offset, index, step, index2; + + /* A MEM_REF has only a base and an offset, so it doesn't apply. */ + if (TREE_CODE (expr) != TARGET_MEM_REF) + return false; + + offset = TREE_OPERAND (expr, 1); + + if (!offset + || TREE_CODE (offset) != INTEGER_CST + || !double_int_zero_p (TREE_INT_CST (offset))) + return false; + + index = TREE_OPERAND (expr, 2); + + if (!index || (TREE_CODE (index) == INTEGER_CST + && double_int_zero_p (TREE_INT_CST (index)))) + return false; + + step = TREE_OPERAND (expr, 3); + + if (step && (TREE_CODE (step) != INTEGER_CST + || !double_int_zero_p (TREE_INT_CST (step)))) + return false; + + index2 = TREE_OPERAND (expr, 4); + + if (index2 && (TREE_CODE (index2) != INTEGER_CST + || !double_int_zero_p (TREE_INT_CST (index2)))) + return false; + + return true; +} + +/* Expose target addressing modes in STMT, which must be an assignment. */ + +static bool +tree_ssa_lower_addr_stmt (gimple stmt, gimple_stmt_iterator *gsi, + basic_block bb, bool speed) +{ + tree *lhs, *rhs1; + bool changed, rhs_changed; + + /* If this is the last statement of the block, it may have an associated + exception handling landing pad. The original statement may appear + to possibly throw an exception, but after we simplify it this may no + longer be true. */ + bool orig_may_throw = stmt_could_throw_p (stmt) + && lookup_stmt_eh_lp (stmt) > 0; + + lhs = gimple_assign_lhs_ptr (stmt); + changed = tree_ssa_lower_addr_tree (lhs, gsi, speed, + gimple_vdef (stmt) != NULL); + + /* If the LHS now contains a simple target_mem_ref with zero offset, + record this. */ + if (changed && is_zero_offset_ref (gimple_assign_lhs (stmt))) + VEC_safe_push (gimple, heap, zero_offset_refs, stmt); + + rhs1 = gimple_assign_rhs1_ptr (stmt); + rhs_changed = tree_ssa_lower_addr_tree (rhs1, gsi, speed, false); + + /* Record zero-offset mem_refs on the RHS. */ + if (rhs_changed && is_zero_offset_ref (gimple_assign_rhs1 (stmt))) + VEC_safe_push (gimple, heap, zero_offset_refs, stmt); + + changed = changed || rhs_changed; + + /* If the original statement was in the landing pad table, but the + revised one can't signal an exception, remove it from the table. + We may have to clean up the CFG afterwards. */ + if (changed + && orig_may_throw + && maybe_clean_eh_stmt (stmt) + && gimple_purge_dead_eh_edges (bb)) + clean_cfg = true; + + return changed; +} + +/* Given that EXPR in STMT is a TARGET_MEM_REF with zero offset, zero + step, non-zero index, and zero index2, determine whether the sum of + EXPR's base and index components is available in the CSE table. If + so, convert EXPR to use that sum as its base component and eliminate + the index component. If the sum is defined following STMT, move the + defining statement ahead of STMT. This is safe since the base and + index components must be available. */ + +static void +process_zero_offset_ref (tree expr, gimple stmt) +{ + tree base = TREE_OPERAND (expr, 0); + tree index = TREE_OPERAND (expr, 2); + avail_addr_expr entry; + hashval_t hash; + void **slot; + avail_addr_expr_t sum_entry; + gimple_stmt_iterator gsi, gsi2; + gimple repr_def_stmt; + bool use_seen, def_seen; + + /* Base and index need to take certain simple forms. If they do not, + skip this ref. */ + if (!valid_simple_op (base) || !valid_simple_op (index)) + return; + + /* Look up the sum in the hash table. Try first as a POINTER_PLUS_EXPR, + then as a PLUS_EXPR. */ + entry.operation = POINTER_PLUS_EXPR; + entry.operand1 = base; + entry.operand2 = index; + entry.repr = NULL_TREE; + hash = hash_simple_op (base) ^ hash_simple_op (index); + entry.hashcode = POINTER_PLUS_EXPR ^ hash; + + slot = htab_find_slot_with_hash (avail_addr_table, &entry, + entry.hashcode, NO_INSERT); + + if (!slot) + { + entry.operation = PLUS_EXPR; + entry.hashcode = PLUS_EXPR ^ hash; + slot = htab_find_slot_with_hash (avail_addr_table, &entry, + entry.hashcode, NO_INSERT); + } + + /* If the sum is not available, the index form is preferable. */ + if (!slot) + return; + + /* Otherwise, set base = sum and turn this into a MEM_REF. + This causes the index operand to no longer be visible. */ + sum_entry = (avail_addr_expr_t)*slot; + TREE_OPERAND (expr, 0) = sum_entry->repr; + TREE_SET_CODE (expr, MEM_REF); + + /* Find the definition of the sum. */ + repr_def_stmt = SSA_NAME_DEF_STMT (sum_entry->repr); + + /* If it's defined in a different block, the sum is available from + a dominator. */ + if (gimple_bb (repr_def_stmt) != curr_bb) + return; + + /* Otherwise, determine whether the defining statement appears after + EXPR. If so, move it. */ + /* ??? In the worst case, this requires a full block scan for each + MEM_REF in the block for which the optimization is appropriate. + This is unlikely to be too bad in practice, but something more + clever may be called for. */ + use_seen = def_seen = false; + + for (gsi = gsi_start_bb (curr_bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple cand = gsi_stmt (gsi); + + if (cand == stmt) + if (def_seen) + return; + else + { + use_seen = true; + gsi2 = gsi; + } + else if (cand == repr_def_stmt) + { + if (use_seen) + { + gsi_move_before (&gsi, &gsi2); + return; + } + else + return; + } + } + + /* If we don't see both the def and the use in this block, something + is badly wrong. */ + gcc_unreachable(); +} + +/* Examine the current basic block for opportunities to profitably coerce + TARGET_MEM_REFs with zero offsets into non-indexed forms. */ + +static void +process_zero_offset_refs (void) +{ + unsigned i, length = VEC_length (gimple, zero_offset_refs); + gimple stmt; + tree lhs, rhs; + + for (i = 0; i < length; i++) + { + stmt = VEC_index (gimple, zero_offset_refs, i); + lhs = gimple_assign_lhs (stmt); + + if (is_zero_offset_ref (lhs)) + process_zero_offset_ref (lhs, stmt); + else + { + rhs = gimple_assign_rhs1 (stmt); + gcc_assert (is_zero_offset_ref (rhs)); + process_zero_offset_ref (rhs, stmt); + } + } +} + +/* Process the statements in block BB. */ + +static void +tree_ssa_lower_addr_block (basic_block bb) +{ + gimple_stmt_iterator gsi; + basic_block son; + bool changed = false; + bool speed = optimize_bb_for_speed_p (bb); + + curr_bb = bb; + AVAIL_EXPRS (bb) = VEC_alloc (avail_addr_expr_t, heap, 32); + VEC_truncate (gimple, zero_offset_refs, 0); + + /* Walk the statements in the block, gathering CSE data and making + address replacements as opportunities arise. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + + if (gimple_vuse (stmt) && gimple_assign_single_p (stmt)) + changed = tree_ssa_lower_addr_stmt (stmt, &gsi, bb, speed) || changed; + + /* Whether changed or not, gather potential information for CSE. */ + changed = avail_addr_cse (stmt) || changed; + } + + /* Following is a fairly common pattern that results from address + lowering with CSE: + + : + D.2154_8 = n_2(D) * 4; + D.2107_3 = MEM[base: p_1(D), index: D.2154_8, offset: 0B]; + D.2156_10 = p_1(D) + D.2154_8; + D.2108_4 = MEM[(struct x *)D.2156_10 + 128B]; + D.2109_5 = MEM[(struct x *)D.2156_10 + 64B]; + foo (D.2107_3, D.2108_4, D.2109_5); + return; + + When the addressing expression consists of two tree addends and + a non-zero offset, the addends are pre-added and used as the base + expression, with the offset becoming the immediate operand in the + storage op. When the offset is zero, the base and index are + added implicitly in an indexed-form storage op. + + The bias to the indexed form with a zero offset is good with no + knowledge of the surrounding code; it saves an instruction + and a dependency with a possible small increase in register + pressure. As can be seen here, though, we end up with a missed + commoning opportunity, since the second instruction could become: + + D.2107_3 = MEM[(struct x *)D.2156_10 + 0B]; + + Since the add into D.2156_10 is going to occur anyway, the + advantage of the bias to indexed form vanishes. Instead, it + leads to an increase in register pressure. This is more noticeable + as expressions are CSEd across blocks. + + Detect zero_offset memory references that can profitably be + converted into non-indexed forms. */ + process_zero_offset_refs (); + + /* If anything changed in the block, the SSA immediate use lists may + be hosed up. Clean the whole block. */ + if (changed) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + update_stmt (gsi_stmt (gsi)); + + if (dump_file && changed && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Changed bb %d:\n", bb->index); + dump_bb (bb, dump_file, 0); + fprintf (dump_file, "\n"); + } + + /* Process dominated blocks. */ + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + tree_ssa_lower_addr_block (son); + + /* We're done with this block, so remove expressions local to this + block from the allocation pool. */ + free_avail_exprs (bb); +} + +/* Main entry point for exposing target addressing modes not caught + by tree-ssa-loop-ivopts.c. */ + +static unsigned int +tree_ssa_lower_addr (void) +{ + /* A hash table of subexpressions that contribute to addressing + computation is maintained for the current block and all of its + dominators. The entries addressed out of the hash table are + stored in an allocation pool. At any given time, the allocation + pool likewise contains entries for the current block and its + dominators. */ + avail_addr_table = htab_create (500, avail_addr_hash, + avail_addr_eq, avail_addr_free); + avail_addr_pool = create_alloc_pool ("avail_addr_pool", + sizeof (avail_addr_expr), + 500); + + /* Allocate auxiliary CFG information for this pass. */ + alloc_aux_for_blocks (sizeof (tree_lower_addr_bb_aux)); + + /* Allocate a vector to track memory references with zero offsets. */ + zero_offset_refs = VEC_alloc (gimple, heap, 32); + + /* Track whether we purged any exception handling edges. */ + clean_cfg = false; + + /* We need loop information for heuristics. IVOPTS has already lowered + some addressing expressions in loops, and we don't want to step on + what's already been done. */ + loop_optimizer_init (AVOID_CFG_MODIFICATIONS); + + /* Walk the CFG in dominator order. */ + calculate_dominance_info (CDI_DOMINATORS); + tree_ssa_lower_addr_block (ENTRY_BLOCK_PTR); + + /* Free resources. */ + loop_optimizer_finalize (); + htab_delete (avail_addr_table); + free_alloc_pool (avail_addr_pool); + free_aux_for_blocks (); + VEC_free (gimple, heap, zero_offset_refs); + + if (clean_cfg) + return TODO_cleanup_cfg; + + return 0; +} + +/* Gating function for this pass. Currently we disable the phase + if mudflap instrumentation is to be done. Exposing target addresses + before mudflap confuses mudflap about the base addresses of arrays + and so forth. */ + +static bool +gate_lower_addr (void) +{ + return (optimize && flag_lower_addr == 1 && flag_mudflap == 0); +} + +struct gimple_opt_pass pass_tree_lower_addr = +{ + { + GIMPLE_PASS, + "loweraddr", /* name */ + gate_lower_addr, /* gate */ + tree_ssa_lower_addr, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_TREE_LOWER_ADDR, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_dump_func | TODO_update_ssa /* todo_flags_finish */ + } +}; + Index: gcc/tree.c =================================================================== --- gcc/tree.c (revision 175664) +++ gcc/tree.c (working copy) @@ -5753,6 +5753,53 @@ build_aligned_type (tree type, unsigned int align) return t; } +/* Returns true if memory reference REF with step STEP may be unaligned. */ + +bool +may_be_unaligned_p (tree ref, tree step) +{ + tree base; + tree base_type; + HOST_WIDE_INT bitsize; + HOST_WIDE_INT bitpos; + tree toffset; + enum machine_mode mode; + int unsignedp, volatilep; + unsigned base_align; + + /* TARGET_MEM_REFs are translated directly to valid MEMs on the target, + thus they are not misaligned. */ + if (TREE_CODE (ref) == TARGET_MEM_REF) + return false; + + /* The test below is basically copy of what expr.c:normal_inner_ref + does to check whether the object must be loaded by parts when + STRICT_ALIGNMENT is true. */ + base = get_inner_reference (ref, &bitsize, &bitpos, &toffset, &mode, + &unsignedp, &volatilep, true); + base_type = TREE_TYPE (base); + base_align = TYPE_ALIGN (base_type); + + if (mode != BLKmode) + { + unsigned mode_align = GET_MODE_ALIGNMENT (mode); + + if (base_align < mode_align + || (bitpos % mode_align) != 0 + || (bitpos % BITS_PER_UNIT) != 0) + return true; + + if (toffset + && (highest_pow2_factor (toffset) * BITS_PER_UNIT) < mode_align) + return true; + + if ((highest_pow2_factor (step) * BITS_PER_UNIT) < mode_align) + return true; + } + + return false; +} + /* Create a new distinct copy of TYPE. The new type is made its own MAIN_VARIANT. If TYPE requires structural equality checks, the resulting type requires structural equality checks; otherwise, its Index: gcc/tree.h =================================================================== --- gcc/tree.h (revision 175664) +++ gcc/tree.h (working copy) @@ -5139,6 +5139,7 @@ extern tree tree_strip_sign_nop_conversions (tree) extern tree lhd_gcc_personality (void); extern void assign_assembler_name_if_neeeded (tree); extern void warn_deprecated_use (tree, tree); +extern bool may_be_unaligned_p (tree, tree); /* In cgraph.c */ @@ -5763,6 +5764,7 @@ tree target_for_debug_bind (tree); /* In tree-ssa-address.c. */ extern tree tree_mem_ref_addr (tree, tree); extern void copy_mem_ref_info (tree, tree); +extern void copy_ref_info (tree, tree); /* In tree-vrp.c */ extern bool ssa_name_nonnegative_p (const_tree); Index: gcc/tree-pass.h =================================================================== --- gcc/tree-pass.h (revision 175664) +++ gcc/tree-pass.h (working copy) @@ -386,6 +386,7 @@ extern struct gimple_opt_pass pass_complete_unroll extern struct gimple_opt_pass pass_parallelize_loops; extern struct gimple_opt_pass pass_loop_prefetch; extern struct gimple_opt_pass pass_iv_optimize; +extern struct gimple_opt_pass pass_tree_lower_addr; extern struct gimple_opt_pass pass_tree_loop_done; extern struct gimple_opt_pass pass_ch; extern struct gimple_opt_pass pass_ccp; Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 175664) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -1610,53 +1610,6 @@ constant_multiple_of (tree top, tree bot, double_i } } -/* Returns true if memory reference REF with step STEP may be unaligned. */ - -static bool -may_be_unaligned_p (tree ref, tree step) -{ - tree base; - tree base_type; - HOST_WIDE_INT bitsize; - HOST_WIDE_INT bitpos; - tree toffset; - enum machine_mode mode; - int unsignedp, volatilep; - unsigned base_align; - - /* TARGET_MEM_REFs are translated directly to valid MEMs on the target, - thus they are not misaligned. */ - if (TREE_CODE (ref) == TARGET_MEM_REF) - return false; - - /* The test below is basically copy of what expr.c:normal_inner_ref - does to check whether the object must be loaded by parts when - STRICT_ALIGNMENT is true. */ - base = get_inner_reference (ref, &bitsize, &bitpos, &toffset, &mode, - &unsignedp, &volatilep, true); - base_type = TREE_TYPE (base); - base_align = TYPE_ALIGN (base_type); - - if (mode != BLKmode) - { - unsigned mode_align = GET_MODE_ALIGNMENT (mode); - - if (base_align < mode_align - || (bitpos % mode_align) != 0 - || (bitpos % BITS_PER_UNIT) != 0) - return true; - - if (toffset - && (highest_pow2_factor (toffset) * BITS_PER_UNIT) < mode_align) - return true; - - if ((highest_pow2_factor (step) * BITS_PER_UNIT) < mode_align) - return true; - } - - return false; -} - /* Return true if EXPR may be non-addressable. */ bool @@ -6031,64 +5984,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da } } -/* Copies the reference information from OLD_REF to NEW_REF. */ - -static void -copy_ref_info (tree new_ref, tree old_ref) -{ - tree new_ptr_base = NULL_TREE; - - TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref); - TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref); - - new_ptr_base = TREE_OPERAND (new_ref, 0); - - /* We can transfer points-to information from an old pointer - or decl base to the new one. */ - if (new_ptr_base - && TREE_CODE (new_ptr_base) == SSA_NAME - && !SSA_NAME_PTR_INFO (new_ptr_base)) - { - tree base = get_base_address (old_ref); - if (!base) - ; - else if ((TREE_CODE (base) == MEM_REF - || TREE_CODE (base) == TARGET_MEM_REF) - && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME - && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0))) - { - struct ptr_info_def *new_pi; - duplicate_ssa_name_ptr_info - (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0))); - new_pi = SSA_NAME_PTR_INFO (new_ptr_base); - /* We have to be careful about transfering alignment information. */ - if (TREE_CODE (old_ref) == MEM_REF - && !(TREE_CODE (new_ref) == TARGET_MEM_REF - && (TMR_INDEX2 (new_ref) - || (TMR_STEP (new_ref) - && (TREE_INT_CST_LOW (TMR_STEP (new_ref)) - < new_pi->align))))) - { - new_pi->misalign += double_int_sub (mem_ref_offset (old_ref), - mem_ref_offset (new_ref)).low; - new_pi->misalign &= (new_pi->align - 1); - } - else - { - new_pi->align = 1; - new_pi->misalign = 0; - } - } - else if (TREE_CODE (base) == VAR_DECL - || TREE_CODE (base) == PARM_DECL - || TREE_CODE (base) == RESULT_DECL) - { - struct ptr_info_def *pi = get_ptr_info (new_ptr_base); - pt_solution_set_var (&pi->pt, base); - } - } -} - /* Performs a peephole optimization to reorder the iv update statement with a mem ref to enable instruction combining in later phases. The mem ref uses the iv value before the update, so the reordering transformation requires Index: gcc/timevar.def =================================================================== --- gcc/timevar.def (revision 175664) +++ gcc/timevar.def (working copy) @@ -1,7 +1,7 @@ /* This file contains the definitions for timing variables used to measure run-time performance of the compiler. Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, - 2009, 2010 + 2009, 2010, 2011 Free Software Foundation, Inc. Contributed by Alex Samuel @@ -250,6 +250,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-co DEFTIMEVAR (TV_TREE_UNINIT , "uninit var analysis") DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization") DEFTIMEVAR (TV_PLUGIN_RUN , "plugin execution") +DEFTIMEVAR (TV_TREE_LOWER_ADDR , "expose addressing modes") /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes") Index: gcc/tree-ssa-address.c =================================================================== --- gcc/tree-ssa-address.c (revision 175664) +++ gcc/tree-ssa-address.c (working copy) @@ -612,7 +612,7 @@ most_expensive_mult_to_index (tree type, struct me static void addr_to_parts (tree type, aff_tree *addr, tree iv_cand, tree base_hint, struct mem_address *parts, - bool speed) + bool speed) { tree part; unsigned i; @@ -632,9 +632,12 @@ addr_to_parts (tree type, aff_tree *addr, tree iv_ /* No need to do address parts reassociation if the number of parts is <= 2 -- in that case, no loop invariant code motion can be - exposed. */ + exposed. - if (!base_hint && (addr->n > 2)) + Ensure iv_cand exists before attempting this, since this code + is also called from outside IVopts. */ + + if (!base_hint && (addr->n > 2) && iv_cand) move_variant_to_index (parts, addr, iv_cand); /* First move the most expensive feasible multiplication @@ -839,6 +842,64 @@ copy_mem_ref_info (tree to, tree from) TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from); } +/* Copies the reference information from OLD_REF to NEW_REF. */ + +void +copy_ref_info (tree new_ref, tree old_ref) +{ + tree new_ptr_base = NULL_TREE; + + TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref); + TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref); + + new_ptr_base = TREE_OPERAND (new_ref, 0); + + /* We can transfer points-to information from an old pointer + or decl base to the new one. */ + if (new_ptr_base + && TREE_CODE (new_ptr_base) == SSA_NAME + && !SSA_NAME_PTR_INFO (new_ptr_base)) + { + tree base = get_base_address (old_ref); + if (!base) + ; + else if ((TREE_CODE (base) == MEM_REF + || TREE_CODE (base) == TARGET_MEM_REF) + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME + && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0))) + { + struct ptr_info_def *new_pi; + duplicate_ssa_name_ptr_info + (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0))); + new_pi = SSA_NAME_PTR_INFO (new_ptr_base); + /* We have to be careful about transfering alignment information. */ + if (TREE_CODE (old_ref) == MEM_REF + && !(TREE_CODE (new_ref) == TARGET_MEM_REF + && (TMR_INDEX2 (new_ref) + || (TMR_STEP (new_ref) + && (TREE_INT_CST_LOW (TMR_STEP (new_ref)) + < new_pi->align))))) + { + new_pi->misalign += double_int_sub (mem_ref_offset (old_ref), + mem_ref_offset (new_ref)).low; + new_pi->misalign &= (new_pi->align - 1); + } + else + { + new_pi->align = 1; + new_pi->misalign = 0; + } + } + else if (TREE_CODE (base) == VAR_DECL + || TREE_CODE (base) == PARM_DECL + || TREE_CODE (base) == RESULT_DECL) + { + struct ptr_info_def *pi = get_ptr_info (new_ptr_base); + pt_solution_set_var (&pi->pt, base); + } + } +} + /* Move constants in target_mem_ref REF to offset. Returns the new target mem ref if anything changes, NULL_TREE otherwise. */ Index: gcc/tree-affine.c =================================================================== --- gcc/tree-affine.c (revision 175664) +++ gcc/tree-affine.c (working copy) @@ -1,5 +1,5 @@ /* Operations with affine combinations of trees. - Copyright (C) 2005, 2007, 2008, 2010 Free Software Foundation, Inc. + Copyright (C) 2005, 2007, 2008, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-affine.h" #include "gimple.h" #include "flags.h" +#include "tree-flow.h" /* Extends CST as appropriate for the affine combinations COMB. */ @@ -712,6 +713,45 @@ tree_to_aff_combination_expand (tree expr, tree ty aff_combination_expand (comb, cache); } +/* Expand a memory reference into an affine combination, returning + false if EXPR is possibly not addressable, or if EXPR might be + misaligned on a STRICT_ALIGNMENT target. EXPR is modified to + an addressing expression. */ +bool +mem_tree_to_aff_combination_expand (tree *expr, aff_tree *comb) +{ + struct pointer_map_t *cache; + + if (TREE_CODE (*expr) == TARGET_MEM_REF) + { + tree type = build_pointer_type (TREE_TYPE (*expr)); + *expr = tree_mem_ref_addr (type, *expr); + } + else + { + /* Check that the expression is addressable, since the access + may include a VIEW_CONVERT_EXPR or a bitfield reference. */ + if (may_be_nonaddressable_p (*expr)) + return false; + + /* On strict alignment platforms, check that the base expression + is sufficiently aligned. There is no additional "step" + information required, so specify single-byte alignment for + that parameter. */ + if (STRICT_ALIGNMENT && may_be_unaligned_p (*expr, size_one_node)) + return false; + + *expr = build_fold_addr_expr (*expr); + } + + /* Expand to affine combination, looking throuh SSA defs. */ + cache = pointer_map_create (); + tree_to_aff_combination_expand (*expr, TREE_TYPE (*expr), comb, &cache); + pointer_map_destroy (cache); + + return true; +} + /* Frees memory occupied by struct name_expansion in *VALUE. Callback for pointer_map_traverse. */ Index: gcc/tree-affine.h =================================================================== --- gcc/tree-affine.h (revision 175664) +++ gcc/tree-affine.h (working copy) @@ -1,5 +1,5 @@ /* Operations with affine combinations of trees. - Copyright (C) 2005, 2007, 2008 Free Software Foundation, Inc. + Copyright (C) 2005, 2007, 2008, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -74,6 +74,7 @@ bool aff_combination_constant_multiple_p (aff_tree void aff_combination_expand (aff_tree *, struct pointer_map_t **); void tree_to_aff_combination_expand (tree, tree, aff_tree *, struct pointer_map_t **); +bool mem_tree_to_aff_combination_expand (tree *, aff_tree *); void get_inner_reference_aff (tree, aff_tree *, double_int *); void free_affine_expand_cache (struct pointer_map_t **); Index: gcc/common.opt =================================================================== --- gcc/common.opt (revision 175664) +++ gcc/common.opt (working copy) @@ -1369,6 +1369,10 @@ floop-optimize Common Ignore Does nothing. Preserved for backward compatibility. +flower-addr +Common Report Var(flag_lower_addr) Init(1) Optimization +Expose target addressing modes in gimple + flto Common Enable link-time optimization. Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 175664) +++ gcc/Makefile.in (working copy) @@ -1436,6 +1436,7 @@ OBJS = \ tree-sra.o \ tree-switch-conversion.o \ tree-ssa-address.o \ + tree-ssa-lower-addr.o \ tree-ssa-alias.o \ tree-ssa-ccp.o \ tree-ssa-coalesce.o \ @@ -2657,7 +2658,7 @@ tree-ssa-loop-ivopts.o : tree-ssa-loop-ivopts.c $( tree-affine.o : tree-affine.c tree-affine.h $(CONFIG_H) pointer-set.h \ $(SYSTEM_H) $(TREE_H) $(GIMPLE_H) \ output.h $(DIAGNOSTIC_H) coretypes.h $(TREE_DUMP_H) $(FLAGS_H) \ - tree-pretty-print.h + tree-pretty-print.h $(TREE_FLOW_H) tree-ssa-loop-manip.o : tree-ssa-loop-manip.c $(TREE_FLOW_H) $(CONFIG_H) \ $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ $(BASIC_BLOCK_H) output.h $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \ @@ -2669,6 +2670,10 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLO $(TREE_DUMP_H) $(TREE_PASS_H) $(FLAGS_H) $(BASIC_BLOCK_H) \ pointer-set.h tree-affine.h tree-ssa-propagate.h gimple-pretty-print.h \ tree-pretty-print.h +tree-ssa-lower-addr.o : tree-ssa-lower-addr.c $(CONFIG_H) $(SYSTEM_H) \ + coretypes.h $(TREE_H) $(BASIC_BLOCK_H) $(TREE_FLOW_H) $(TREE_DUMP_H) \ + $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) $(GIMPLE_H) $(TREE_AFFINE_H) \ + tree-pretty-print.h $(DBGCNT_H) alloc-pool.h gt-tree-ssa-lower-addr.h tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \ $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \ @@ -3844,6 +3849,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(src $(srcdir)/tree-iterator.c $(srcdir)/gimplify.c \ $(srcdir)/tree-chrec.h \ $(srcdir)/tree-scalar-evolution.c \ + $(srcdir)/tree-ssa-lower-addr.c \ $(srcdir)/tree-ssa-operands.h \ $(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \ $(srcdir)/varpool.c \ Index: gcc/gimple.c =================================================================== --- gcc/gimple.c (revision 175664) +++ gcc/gimple.c (working copy) @@ -2018,6 +2018,19 @@ gimple_assign_unary_nop_p (gimple gs) == TYPE_MODE (TREE_TYPE (gimple_assign_rhs1 (gs))))); } +/* Return true if GS is an assignment that performs a true type + conversion, i.e., one that requires code. */ + +bool +gimple_assign_conversion_p (gimple gs) +{ + return (is_gimple_assign (gs) + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)) + && gimple_assign_rhs1 (gs) != error_mark_node + && (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs))) + != TYPE_MODE (TREE_TYPE (gimple_assign_rhs1 (gs))))); +} + /* Set BB to be the basic block holding G. */ void Index: gcc/gimple.h =================================================================== --- gcc/gimple.h (revision 175664) +++ gcc/gimple.h (working copy) @@ -884,6 +884,7 @@ void gimple_call_reset_alias_info (gimple); bool gimple_assign_copy_p (gimple); bool gimple_assign_ssa_name_copy_p (gimple); bool gimple_assign_unary_nop_p (gimple); +bool gimple_assign_conversion_p (gimple); void gimple_set_bb (gimple, struct basic_block_def *); void gimple_assign_set_rhs_from_tree (gimple_stmt_iterator *, tree); void gimple_assign_set_rhs_with_ops_1 (gimple_stmt_iterator *, enum tree_code, Index: gcc/passes.c =================================================================== --- gcc/passes.c (revision 175664) +++ gcc/passes.c (working copy) @@ -1,7 +1,7 @@ /* Top level of GCC compilers (cc1, cc1plus, etc.) Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 - Free Software Foundation, Inc. + 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, + 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -1375,6 +1375,7 @@ init_optimization_passes (void) only examines PHIs to discover const/copy propagation opportunities. */ NEXT_PASS (pass_phi_only_cprop); + NEXT_PASS (pass_tree_lower_addr); NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_tracer);