From patchwork Wed Oct 5 16:13:32 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 117887 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id A3061B6FA6 for ; Thu, 6 Oct 2011 03:14:25 +1100 (EST) Received: (qmail 31879 invoked by alias); 5 Oct 2011 16:14:21 -0000 Received: (qmail 31860 invoked by uid 22791); 5 Oct 2011 16:14:17 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, TW_FW, TW_HG, TW_TM X-Spam-Check-By: sourceware.org Received: from e5.ny.us.ibm.com (HELO e5.ny.us.ibm.com) (32.97.182.145) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 05 Oct 2011 16:13:58 +0000 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e5.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p95Fgh4p027353; Wed, 5 Oct 2011 11:42:43 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p95GDrkD191626; Wed, 5 Oct 2011 12:13:54 -0400 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p95GDXGU021212; Wed, 5 Oct 2011 10:13:33 -0600 Received: from [9.10.86.210] (ibm-23d1855fdbb.rchland.ibm.com [9.10.86.210] (may be forged)) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p95GDXiW021079; Wed, 5 Oct 2011 10:13:33 -0600 Subject: [PATCH] Fix PR46556 (poor address generation) From: "William J. Schmidt" To: gcc-patches@gcc.gnu.org Cc: bergner@vnet.ibm.com, rguenth@gcc.gnu.org Date: Wed, 05 Oct 2011 11:13:32 -0500 Message-ID: <1317831212.4199.45.camel@oc2474580526.ibm.com> Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch addresses the poor code generation in PR46556 for the following code: struct x { int a[16]; int b[16]; int c[16]; }; extern void foo (int, int, int); void f (struct x *p, unsigned int n) { foo (p->a[n], p->c[n], p->b[n]); } Prior to the fix for PR32698, gcc calculated the offset for accessing the array elements as: n*4; 64+n*4; 128+n*4. Following that fix, the offsets are calculated as: n*4; (n+16)*4; (n +32)*4. This led to poor code generation on powerpc64 targets, among others. The poor code generation was observed to not occur in loops, as the IVOPTS code does a good job of lowering these expressions to MEM_REFs. It was previously suggested that perhaps a general pass to lower memory accesses to MEM_REFs in GIMPLE would solve not only this, but other similar problems. I spent some time looking into various approaches to this, and reviewing some previous attempts to do similar things. In the end, I've concluded that this is a bad idea in practice because of the loss of useful aliasing information. In particular, early lowering of component references causes us to lose the ability to disambiguate non-overlapping references in the same structure, and there is no simple way to carry the necessary aliasing information along with the replacement MEM_REFs to avoid this. While some performance gains are available with GIMPLE lowering of memory accesses, there are also offsetting performance losses, and I suspect this would just be a continuous source of bug reports into the future. Therefore the current patch is a much simpler approach to solve the specific problem noted in the PR. There are two pieces to the patch: * The offending addressing pattern is matched in GIMPLE and transformed into a restructured MEM_REF that distributes the multiply, so that (n +32)*4 becomes 4*n+128 as before. This is done during the reassociation pass, for reasons described below. The transformation only occurs in non-loop blocks, since IVOPTS does a good job on such things within loops. * A tweak is added to the RTL forward-propagator to avoid propagating into memory references based on a single base register with no offset, under certain circumstances. This improves sharing of base registers for accesses within the same structure and slightly lowers register pressure. It would be possible to separate these into two patches if that's preferred. I chose to combine them because together they provide the ideal code generation that the new test cases test for. I initially implemented the pattern matcher during expand, but I found that the expanded code for two accesses to the same structure was often not being CSEd well. So I moved it back into the GIMPLE phases prior to DOM to take advantage of its CSE. To avoid an additional complete pass over the IL, I chose to piggyback on the reassociation pass. This transformation is not technically a reassociation, but it is related enough to not be a complete wart. One noob question about this: It would probably be preferable to have this transformation only take place during the second reassociation pass, so the ARRAY_REFs are seen by earlier optimization phases. Is there an easy way to detect that it's the second pass without having to generate a separate pass entry point? One other general question about the pattern-match transformation: Is this an appropriate transformation for all targets, or should it be somehow gated on available addressing modes on the target processor? Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no performance degradations on that target for SPEC CPU2000 and CPU2006. I'm looking for eventual approval for trunk after any comments are resolved. Thanks! Bill 2011-10-05 Bill Schmidt gcc: PR rtl-optimization/46556 * fwprop.c (fwprop_bb_aux_d): New struct. (MEM_PLUS_REGS): New macro. (record_mem_plus_reg): New function. (record_mem_plus_regs): Likewise. (single_def_use_enter_block): Record mem-plus-reg patterns. (build_single_def_use_links): Allocate aux storage. (locally_poor_mem_replacement): New function. (forward_propagate_and_simplify): Call locally_poor_mem_replacement. (fwprop_init): Free storage. * tree.h (copy_ref_info): Expose existing function. * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. * tree-ssa-reassoc.c (restructure_base_and_offset): New function. (restructure_mem_ref): Likewise. (reassociate_bb): Look for opportunities to call restructure_mem_ref; clean up immediate use lists. gcc/testsuite: PR rtl-optimization/46556 * gcc.target/powerpc/ppc-pr46556-1.c: New testcase. * gcc.target/powerpc/ppc-pr46556-2.c: Likewise. * gcc.target/powerpc/ppc-pr46556-3.c: Likewise. * gcc.target/powerpc/ppc-pr46556-4.c: Likewise. * gcc.dg/tree-ssa/pr46556-1.c: Likewise. * gcc.dg/tree-ssa/pr46556-2.c: Likewise. * gcc.dg/tree-ssa/pr46556-3.c: Likewise. Index: gcc/fwprop.c =================================================================== --- gcc/fwprop.c (revision 179317) +++ gcc/fwprop.c (working copy) @@ -131,6 +131,15 @@ static VEC(df_ref,heap) *reg_defs_stack; static bitmap local_md; static bitmap local_lr; +/* Auxiliary information for each block for this pass. */ +typedef struct GTY(()) fwprop_bb_aux_d +{ + /* Registers appearing in (mem (plus (reg ... patterns in this block. */ + bitmap mem_plus_regs; +} fwprop_bb_aux, *fwprop_bb_aux_t; + +#define MEM_PLUS_REGS(BB) ((fwprop_bb_aux_t) ((BB)->aux))->mem_plus_regs + /* Return the only def in USE's use-def chain, or NULL if there is more than one def in the chain. */ @@ -212,7 +221,43 @@ process_uses (df_ref *use_rec, int top_flag) } +/* Record whether EXPR in block BB is of the form (mem (plus (reg ... */ static void +record_mem_plus_reg (basic_block bb, rtx expr) +{ + rtx addr, reg; + + if (GET_CODE (expr) != MEM) + return; + + addr = XEXP (expr, 0); + if (GET_CODE (addr) != PLUS) + return; + + reg = XEXP (addr, 0); + if (!REG_P (reg)) + return; + + (void)bitmap_set_bit (MEM_PLUS_REGS (bb), REGNO (reg)); +} + + +/* Record whether INSN in block BB contains any patterns of the form + (mem (plus (reg ... */ +static void +record_mem_plus_regs (basic_block bb, rtx insn) +{ + rtx insn_set = single_set (insn); + + if (insn_set) + { + record_mem_plus_reg (bb, SET_DEST (insn_set)); + record_mem_plus_reg (bb, SET_SRC (insn_set)); + } +} + + +static void single_def_use_enter_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED, basic_block bb) { @@ -230,6 +275,8 @@ single_def_use_enter_block (struct dom_walk_data * process_uses (df_get_artificial_uses (bb_index), DF_REF_AT_TOP); process_defs (df_get_artificial_defs (bb_index), DF_REF_AT_TOP); + MEM_PLUS_REGS (bb) = BITMAP_ALLOC (NULL); + /* We don't call df_simulate_initialize_forwards, as it may overestimate the live registers if there are unused artificial defs. We prefer liveness to be underestimated. */ @@ -242,8 +289,15 @@ single_def_use_enter_block (struct dom_walk_data * process_uses (DF_INSN_UID_EQ_USES (uid), 0); process_defs (DF_INSN_UID_DEFS (uid), 0); df_simulate_one_insn_forwards (bb, insn, local_lr); + record_mem_plus_regs (bb, insn); } + if (dump_file) + { + fprintf (dump_file, "mem_plus_regs (%d): ", bb->index); + dump_bitmap (dump_file, MEM_PLUS_REGS (bb)); + } + process_uses (df_get_artificial_uses (bb_index), 0); process_defs (df_get_artificial_defs (bb_index), 0); } @@ -295,6 +349,8 @@ build_single_def_use_links (void) local_md = BITMAP_ALLOC (NULL); local_lr = BITMAP_ALLOC (NULL); + alloc_aux_for_blocks (sizeof (fwprop_bb_aux)); + /* Walk the dominator tree looking for single reaching definitions dominating the uses. This is similar to how SSA form is built. */ walk_data.dom_direction = CDI_DOMINATORS; @@ -1207,6 +1263,69 @@ forward_propagate_asm (df_ref use, rtx def_insn, r return true; } +/* We are proposing to replace a USE of REG in USE_SET. Determine + whether we have a situation where two storage uses of REG occur + in the same block as follows. Each use can be either a memory + store or a memory load. + + ... (mem (reg REG)) + + ... (mem (plus (reg REG) + (...))) + + The problem is that it will look profitable to propagate something + like + + (set (reg REG) + (plus (reg X) + (reg Y))) + + into the first use, but not into the second one. This breaks a + CSE opportunity and raises register pressure by extending the + lifetimes of X and Y. To avoid this, don't propagate into the + first use when this very specific situation arises. */ +static bool +locally_poor_mem_replacement (df_ref use, rtx use_set, rtx reg, rtx def_set) +{ + rtx rhs, addend, mem, base; + unsigned i; + + /* We're only concerned if the RHS of def_set is the sum of two + registers. */ + rhs = SET_SRC (def_set); + + if (GET_CODE (rhs) != PLUS) + return false; + + for (i = 0; i < 2; i++) + { + addend = XEXP (rhs, i); + if (!REG_P (addend)) + return false; + } + + /* The use must have a single SET and be used in addressing. */ + if (!use_set || !DF_REF_REG_MEM_P (use)) + return false; + + if (DF_REF_REG_MEM_STORE_P (use)) + mem = SET_DEST (use_set); + else + mem = SET_SRC (use_set); + + if (GET_CODE (mem) != MEM) + return false; + + /* We are specifically interested in the base address. */ + base = XEXP (mem, 0); + if (reg != base) + return false; + + /* We've found a use of the first form. Check whether uses of the + second form exist in the same block. If found, don't replace. */ + return bitmap_bit_p (MEM_PLUS_REGS (DF_REF_BB (use)), DF_REF_REGNO (use)); +} + /* Try to replace USE with SRC (defined in DEF_INSN) and simplify the result. */ @@ -1273,6 +1392,11 @@ forward_propagate_and_simplify (df_ref use, rtx de return false; } + /* Check for a specific unprofitable propagation into a memory + reference, where the propagation will break a CSE opportunity. */ + if (locally_poor_mem_replacement (use, use_set, reg, def_set)) + return false; + if (asm_use >= 0) return forward_propagate_asm (use, def_insn, def_set, reg); @@ -1403,6 +1527,12 @@ fwprop_init (void) static void fwprop_done (void) { + basic_block bb; + + FOR_EACH_BB (bb) + BITMAP_FREE (MEM_PLUS_REGS (bb)); + free_aux_for_blocks (); + loop_optimizer_finalize (); VEC_free (df_ref, heap, use_def_ref); Index: gcc/tree.h =================================================================== --- gcc/tree.h (revision 179317) +++ gcc/tree.h (working copy) @@ -5775,6 +5775,9 @@ tree target_for_debug_bind (tree); extern tree tree_mem_ref_addr (tree, tree); extern void copy_mem_ref_info (tree, tree); +/* In tree-ssa-loop-ivopts.c. */ +extern void copy_ref_info (tree, tree); + /* In tree-vrp.c */ extern bool ssa_name_nonnegative_p (const_tree); Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-assembler-times "lw.*3," 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*4,128\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*5,64\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->a[n], p->c[n], p->b[n]); + else if (n > 3) + foo (p->b[n], p->a[n], p->c[n]); +} + +/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 3) + { + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->b[n], p->a[n], p->c[n]); + } +} + +/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->a[n], p->c[n], p->b[n]); + else if (n > 3) + foo (p->b[n], p->a[n], p->c[n]); +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 3) + { + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->b[n], p->a[n], p->c[n]); + } +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-fwprop1" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-rtl-dump-times "\\(mem.\*:.\* \\(reg.\*p_1\\(D\\)->a" 1 "fwprop1" } } */ +/* { dg-final { cleanup-rtl-dump "fwprop1" } } */ Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 179317) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -6280,7 +6280,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da /* Copies the reference information from OLD_REF to NEW_REF. */ -static void +void copy_ref_info (tree new_ref, tree old_ref) { tree new_ptr_base = NULL_TREE; Index: gcc/tree-ssa-reassoc.c =================================================================== --- gcc/tree-ssa-reassoc.c (revision 179317) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb) break_up_subtract_bb (son); } +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI, + determine whether there is a profitable opportunity to restructure + address arithmetic within BASE and OFFSET. If so, produce such + a restructuring in *MEM_REF and return true; else return false. */ + +static tree +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi, + tree base, tree offset, HOST_WIDE_INT bitpos) +{ + tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref; + HOST_WIDE_INT c, c1, c2, c3, c4; + + /* Look for the following pattern: + + base = MEM_REF (T1, C1); + offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3) + bitpos = C4 * BITS_PER_UNIT + + If found, convert it to: + + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), + C1 + (C2 * C3) + C4) + */ + if (!base + || !offset + || TREE_CODE (base) != MEM_REF + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST + || TREE_CODE (offset) != MULT_EXPR) + return NULL_TREE; + + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) + return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); + c4 = bitpos / BITS_PER_UNIT; + c = c1 + c2 * c3 + c4; + + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); + + mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, + build_int_cst (sizetype, c3)); + mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL, + true, GSI_SAME_STMT); + add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); + add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL, + true, GSI_SAME_STMT); + mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr, + build_int_cst (offset_type, c)); + return mem_ref; +} + +/* If *EXPR contains a memory reference, determine whether there is an + opportunity to restructure the base and offset expressions for + improved performance. */ + +static bool +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi) +{ + enum tree_code code = TREE_CODE (*expr); + tree base, offset, mem_ref; + HOST_WIDE_INT bitsize, bitpos; + enum machine_mode mode; + int unsignedp, volatilep; + + /* Only expressions that reference memory are of interest. Bitfield + references should eventually be lowered prior to this point and + are not dealt with. */ + if (! handled_component_p (*expr) + || code == BIT_FIELD_REF + || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))) + return false; + + /* Determine whether the expression can be represented with base and + offset components. */ + base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode, + &unsignedp, &volatilep, false); + if (!base || !offset) + return false; + + /* Look for a restructuring opportunity. */ + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, + offset, bitpos)) == NULL_TREE) + return false; + + /* Found one. Record the replacement in the dump file and complete + the replacement. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\nOriginal_expr = "); + print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\nmem_ref = "); + print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\n\n"); + } + + copy_ref_info (mem_ref, *expr); + *expr = mem_ref; + + return true; +} + /* Reassociate expressions in basic block BB and its post-dominator as children. */ @@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + bool chgd_mem_ref = false; + bool in_loop = loop_depth (bb->loop_father) != 0; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { gimple stmt = gsi_stmt (gsi); - if (is_gimple_assign (stmt) - && !stmt_could_throw_p (stmt)) + /* Look for restructuring opportunities within an expression + that references memory. We only do this for blocks not + contained in loops, since the ivopts machinery does a + good job on loop expressions, and we don't want to interfere + with other loop optimizations. */ + if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt)) { + tree *lhs, *rhs; + lhs = gimple_assign_lhs_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref; + rhs = gimple_assign_rhs1_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref; + } + + else if (is_gimple_assign (stmt) + && !stmt_could_throw_p (stmt)) + { tree lhs, rhs1, rhs2; enum tree_code rhs_code = gimple_assign_rhs_code (stmt); @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb) } } } + /* If memory references have been restructured, immediate uses need + to be cleaned up. */ + if (chgd_mem_ref) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + update_stmt (gsi_stmt (gsi)); + for (son = first_dom_son (CDI_POST_DOMINATORS, bb); son; son = next_dom_son (CDI_POST_DOMINATORS, son))