Message ID | 1317831212.4199.45.camel@oc2474580526.ibm.com |
---|---|
State | New |
Headers | show |
On 10/05/2011 06:13 PM, William J. Schmidt wrote: > One other general question about the pattern-match transformation: Is > this an appropriate transformation for all targets, or should it be > somehow gated on available addressing modes on the target processor? > > Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no > performance degradations on that target for SPEC CPU2000 and CPU2006. > > I'm looking for eventual approval for trunk after any comments are > resolved. Thanks! How do the costs look like for the two transforms you mention in the head comment of locally_poor_mem_replacement? Paolo
On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
Rather than this, why not move the function to common code somewhere?
Ciao!
Steven
On Wed, 2011-10-05 at 18:29 +0200, Steven Bosscher wrote: > On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt > <wschmidt@linux.vnet.ibm.com> wrote: > > * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. > > Rather than this, why not move the function to common code somewhere? > > Ciao! > Steven An alternative would be to move it into tree-ssa-address.c, where there is already a simpler version called copy_mem_ref_info. I'm open to that if it's preferable. Bill
On Wed, 2011-10-05 at 18:21 +0200, Paolo Bonzini wrote: > On 10/05/2011 06:13 PM, William J. Schmidt wrote: > > One other general question about the pattern-match transformation: Is > > this an appropriate transformation for all targets, or should it be > > somehow gated on available addressing modes on the target processor? > > > > Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no > > performance degradations on that target for SPEC CPU2000 and CPU2006. > > > > I'm looking for eventual approval for trunk after any comments are > > resolved. Thanks! > > How do the costs look like for the two transforms you mention in the > head comment of locally_poor_mem_replacement? > > Paolo > I don't know off the top of my head -- I'll have to gather that information. The issue is that the profitability is really context-sensitive, so just the isolated costs of insns aren't enough. The forward propagation of the add into (mem (reg REG)) looks like a slam dunk in the absence of other information, but if there are other nearby references using nonzero offsets from REG, this just extends the lifetimes of X and Y without eliminating the need for REG.
On 10/05/2011 07:22 PM, William J. Schmidt wrote: > I don't know off the top of my head -- I'll have to gather that > information. The issue is that the profitability is really > context-sensitive, so just the isolated costs of insns aren't enough. > The forward propagation of the add into (mem (reg REG)) looks like a > slam dunk in the absence of other information, but if there are other > nearby references using nonzero offsets from REG, this just extends the > lifetimes of X and Y without eliminating the need for REG. True, however there are other passes that do this kind of un-CSE and lifetime reduction. Paolo
On Wed, 2011-10-05 at 21:01 +0200, Paolo Bonzini wrote: > On 10/05/2011 07:22 PM, William J. Schmidt wrote: > > I don't know off the top of my head -- I'll have to gather that > > information. The issue is that the profitability is really > > context-sensitive, so just the isolated costs of insns aren't enough. > > The forward propagation of the add into (mem (reg REG)) looks like a > > slam dunk in the absence of other information, but if there are other > > nearby references using nonzero offsets from REG, this just extends the > > lifetimes of X and Y without eliminating the need for REG. > > True, however there are other passes that do this kind of un-CSE and > lifetime reduction. > > Paolo OK, I see. If there's a better place downstream to make a swizzle, I'm certainly fine with that. I disabled locally_poor_mem_replacement and added some dump information in should_replace_address to show the costs for the replacement I'm trying to avoid: In should_replace_address: old_rtx = (reg/f:DI 125 [ D.2036 ]) new_rtx = (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) address_cost (old_rtx) = 0 address_cost (new_rtx) = 0 set_src_cost (old_rtx) = 0 set_src_cost (new_rtx) = 4 In insn 11, replacing (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32]) with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ]) (reg:DI 128)) [2 p_1(D)->a S4 A32]) Changed insn 11 deferring rescan insn with uid = 11. deferring rescan insn with uid = 11. Hope this helps, Bill
On Wed, 5 Oct 2011, William J. Schmidt wrote: > This patch addresses the poor code generation in PR46556 for the > following code: > > struct x > { > int a[16]; > int b[16]; > int c[16]; > }; > > extern void foo (int, int, int); > > void > f (struct x *p, unsigned int n) > { > foo (p->a[n], p->c[n], p->b[n]); > } > > Prior to the fix for PR32698, gcc calculated the offset for accessing > the array elements as: n*4; 64+n*4; 128+n*4. > > Following that fix, the offsets are calculated as: n*4; (n+16)*4; (n > +32)*4. This led to poor code generation on powerpc64 targets, among > others. > > The poor code generation was observed to not occur in loops, as the > IVOPTS code does a good job of lowering these expressions to MEM_REFs. > It was previously suggested that perhaps a general pass to lower memory > accesses to MEM_REFs in GIMPLE would solve not only this, but other > similar problems. I spent some time looking into various approaches to > this, and reviewing some previous attempts to do similar things. In the > end, I've concluded that this is a bad idea in practice because of the > loss of useful aliasing information. In particular, early lowering of > component references causes us to lose the ability to disambiguate > non-overlapping references in the same structure, and there is no simple > way to carry the necessary aliasing information along with the > replacement MEM_REFs to avoid this. While some performance gains are > available with GIMPLE lowering of memory accesses, there are also > offsetting performance losses, and I suspect this would just be a > continuous source of bug reports into the future. > > Therefore the current patch is a much simpler approach to solve the > specific problem noted in the PR. There are two pieces to the patch: > > * The offending addressing pattern is matched in GIMPLE and transformed > into a restructured MEM_REF that distributes the multiply, so that (n > +32)*4 becomes 4*n+128 as before. This is done during the reassociation > pass, for reasons described below. The transformation only occurs in > non-loop blocks, since IVOPTS does a good job on such things within > loops. > * A tweak is added to the RTL forward-propagator to avoid propagating > into memory references based on a single base register with no offset, > under certain circumstances. This improves sharing of base registers > for accesses within the same structure and slightly lowers register > pressure. > > It would be possible to separate these into two patches if that's > preferred. I chose to combine them because together they provide the > ideal code generation that the new test cases test for. > > I initially implemented the pattern matcher during expand, but I found > that the expanded code for two accesses to the same structure was often > not being CSEd well. So I moved it back into the GIMPLE phases prior to > DOM to take advantage of its CSE. To avoid an additional complete pass > over the IL, I chose to piggyback on the reassociation pass. This > transformation is not technically a reassociation, but it is related > enough to not be a complete wart. > > One noob question about this: It would probably be preferable to have > this transformation only take place during the second reassociation > pass, so the ARRAY_REFs are seen by earlier optimization phases. Is > there an easy way to detect that it's the second pass without having to > generate a separate pass entry point? > > One other general question about the pattern-match transformation: Is > this an appropriate transformation for all targets, or should it be > somehow gated on available addressing modes on the target processor? > > Bootstrapped and regression tested on powerpc64-linux-gnu. Verified no > performance degradations on that target for SPEC CPU2000 and CPU2006. > > I'm looking for eventual approval for trunk after any comments are > resolved. Thanks! People have already commented on pieces, so I'm looking only at the tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs instead? The idea is to expose additional CSE opportunities, right? So it's sort-of a strength-reduction optimization on scalar code (classically strength reduction in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). That might be worth in general, even for non-address cases. So - if you rename that thing to tree-ssa-strength-reduce.c you can get away without piggy-backing on anything ;) If you structure it to detect a strength reduction opportunity (thus, you'd need to match two/multiple of the patterns at the same time) that would be a bonus ... generalizing it a little bit would be another. Now some comments on the patch ... > Bill > > > 2011-10-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > gcc: > > PR rtl-optimization/46556 > * fwprop.c (fwprop_bb_aux_d): New struct. > (MEM_PLUS_REGS): New macro. > (record_mem_plus_reg): New function. > (record_mem_plus_regs): Likewise. > (single_def_use_enter_block): Record mem-plus-reg patterns. > (build_single_def_use_links): Allocate aux storage. > (locally_poor_mem_replacement): New function. > (forward_propagate_and_simplify): Call > locally_poor_mem_replacement. > (fwprop_init): Free storage. > * tree.h (copy_ref_info): Expose existing function. > * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token. > * tree-ssa-reassoc.c (restructure_base_and_offset): New function. > (restructure_mem_ref): Likewise. > (reassociate_bb): Look for opportunities to call > restructure_mem_ref; clean up immediate use lists. > > gcc/testsuite: > > PR rtl-optimization/46556 > * gcc.target/powerpc/ppc-pr46556-1.c: New testcase. > * gcc.target/powerpc/ppc-pr46556-2.c: Likewise. > * gcc.target/powerpc/ppc-pr46556-3.c: Likewise. > * gcc.target/powerpc/ppc-pr46556-4.c: Likewise. > * gcc.dg/tree-ssa/pr46556-1.c: Likewise. > * gcc.dg/tree-ssa/pr46556-2.c: Likewise. > * gcc.dg/tree-ssa/pr46556-3.c: Likewise. > Index: gcc/tree-ssa-reassoc.c > =================================================================== > --- gcc/tree-ssa-reassoc.c (revision 179317) > +++ gcc/tree-ssa-reassoc.c (working copy) > @@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb) > break_up_subtract_bb (son); > } > > +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI, > + determine whether there is a profitable opportunity to restructure > + address arithmetic within BASE and OFFSET. If so, produce such > + a restructuring in *MEM_REF and return true; else return false. */ > + > +static tree > +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi, > + tree base, tree offset, HOST_WIDE_INT bitpos) > +{ > + tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref; > + HOST_WIDE_INT c, c1, c2, c3, c4; > + > + /* Look for the following pattern: > + > + base = MEM_REF (T1, C1); > + offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3) > + bitpos = C4 * BITS_PER_UNIT > + > + If found, convert it to: > + > + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > + C1 + (C2 * C3) + C4) > + */ > + if (!base > + || !offset > + || TREE_CODE (base) != MEM_REF > + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST > + || TREE_CODE (offset) != MULT_EXPR) > + return NULL_TREE; > + > + mult_op0 = TREE_OPERAND (offset, 0); > + mult_op1 = TREE_OPERAND (offset, 1); > + > + if (TREE_CODE (mult_op0) != PLUS_EXPR > + || TREE_CODE (mult_op1) != INTEGER_CST > + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) > + return NULL_TREE; > + > + t1 = TREE_OPERAND (base, 0); > + t2 = TREE_OPERAND (mult_op0, 0); > + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); > + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); > + c3 = TREE_INT_CST_LOW (mult_op1); Before accessing TREE_INT_CST_LOW you need to make sure the constants fit into a HWI using host_integerp () (which conveniently includes the check for INTEGER_CST). Note that you need to sign-extend the MEM_REF offset, thus use mem_ref_offset (base).low instead of TREE_INT_CST_LOW (TREE_OPERAND (base, 1)). Might be worth to add a testcase with negative offset ;) > + c4 = bitpos / BITS_PER_UNIT; > + c = c1 + c2 * c3 + c4; And you don't know whether this operation overflows. Thus it's probably easiest to use double_ints instead of HOST_WIDE_INTs in all of the code. > + > + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); > + > + mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, > + build_int_cst (sizetype, c3)); > + mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL, > + true, GSI_SAME_STMT); > + add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); > + add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL, > + true, GSI_SAME_STMT); > + mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr, > + build_int_cst (offset_type, c)); > + return mem_ref; > +} > + > +/* If *EXPR contains a memory reference, determine whether there is an > + opportunity to restructure the base and offset expressions for > + improved performance. */ > + > +static bool > +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi) > +{ > + enum tree_code code = TREE_CODE (*expr); > + tree base, offset, mem_ref; > + HOST_WIDE_INT bitsize, bitpos; > + enum machine_mode mode; > + int unsignedp, volatilep; > + > + /* Only expressions that reference memory are of interest. Bitfield > + references should eventually be lowered prior to this point and > + are not dealt with. */ > + if (! handled_component_p (*expr) > + || code == BIT_FIELD_REF > + || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))) > + return false; I'd say you want to handle references with a non-BLKmode only here, thus || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode > + /* Determine whether the expression can be represented with base and > + offset components. */ > + base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode, > + &unsignedp, &volatilep, false); > + if (!base || !offset) > + return false; > + > + /* Look for a restructuring opportunity. */ > + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, > + offset, bitpos)) == NULL_TREE) > + return false; What I'm missing is a check whether the old address computation stmts will be dead after the transform. > + /* Found one. Record the replacement in the dump file and complete > + the replacement. */ > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "\nOriginal_expr = "); > + print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS); > + fprintf (dump_file, "\nmem_ref = "); > + print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS); > + fprintf (dump_file, "\n\n"); > + } > + > + copy_ref_info (mem_ref, *expr); > + *expr = mem_ref; > + > + return true; > +} > + > /* Reassociate expressions in basic block BB and its post-dominator as > children. */ > > @@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb) > { > gimple_stmt_iterator gsi; > basic_block son; > + bool chgd_mem_ref = false; > + bool in_loop = loop_depth (bb->loop_father) != 0; > > for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) > { > gimple stmt = gsi_stmt (gsi); > > - if (is_gimple_assign (stmt) > - && !stmt_could_throw_p (stmt)) > + /* Look for restructuring opportunities within an expression > + that references memory. We only do this for blocks not > + contained in loops, since the ivopts machinery does a > + good job on loop expressions, and we don't want to interfere > + with other loop optimizations. */ > + if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt)) > { > + tree *lhs, *rhs; > + lhs = gimple_assign_lhs_ptr (stmt); > + chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref; > + rhs = gimple_assign_rhs1_ptr (stmt); > + chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref; It will either be a store or a load, but never both (unless it's an aggregate copy which I think we should not handle). So ... if (gimple_vdef (stmt)) ... lhs else if (gimple_vuse (stmt)) ... rhs > + } > + > + else if (is_gimple_assign (stmt) > + && !stmt_could_throw_p (stmt)) > + { > tree lhs, rhs1, rhs2; > enum tree_code rhs_code = gimple_assign_rhs_code (stmt); > > @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb) > } > } > } > + /* If memory references have been restructured, immediate uses need > + to be cleaned up. */ > + if (chgd_mem_ref) > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + update_stmt (gsi_stmt (gsi)); ICK. Definitely a no ;) Why does a update_stmt () after the restructure_mem_ref call not work? Richard.
On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote: > People have already commented on pieces, so I'm looking only > at the tree-ssa-reassoc.c pieces (did you consider piggy-backing > on IVOPTs instead? The idea is to expose additional CSE > opportunities, right? So it's sort-of a strength-reduction > optimization on scalar code (classically strength reduction > in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). > That might be worth in general, even for non-address cases. > So - if you rename that thing to tree-ssa-strength-reduce.c you > can get away without piggy-backing on anything ;) If you > structure it to detect a strength reduction opportunity > (thus, you'd need to match two/multiple of the patterns at the same time) > that would be a bonus ... generalizing it a little bit would be > another. These are all good ideas. I will think about casting this as a more general strength reduction over extended basic blocks outside of loops. First I'll put together some simple tests to see whether we're currently missing some non-address opportunities. <snip> > > + mult_op0 = TREE_OPERAND (offset, 0); > > + mult_op1 = TREE_OPERAND (offset, 1); > > + > > + if (TREE_CODE (mult_op0) != PLUS_EXPR > > + || TREE_CODE (mult_op1) != INTEGER_CST > > + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) > > + return NULL_TREE; > > + > > + t1 = TREE_OPERAND (base, 0); > > + t2 = TREE_OPERAND (mult_op0, 0); > > + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); > > + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); > > + c3 = TREE_INT_CST_LOW (mult_op1); > > Before accessing TREE_INT_CST_LOW you need to make sure the > constants fit into a HWI using host_integerp () (which > conveniently includes the check for INTEGER_CST). > > Note that you need to sign-extend the MEM_REF offset, > thus use mem_ref_offset (base).low instead of > TREE_INT_CST_LOW (TREE_OPERAND (base, 1)). Might be worth > to add a testcase with negative offset ;) D'oh! >.< > > > + c4 = bitpos / BITS_PER_UNIT; > > + c = c1 + c2 * c3 + c4; > > And you don't know whether this operation overflows. Thus it's > probably easiest to use double_ints instead of HOST_WIDE_INTs > in all of the code. OK, thanks, will do. <snip> > > > + /* Determine whether the expression can be represented with base and > > + offset components. */ > > + base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode, > > + &unsignedp, &volatilep, false); > > + if (!base || !offset) > > + return false; > > + > > + /* Look for a restructuring opportunity. */ > > + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, > > + offset, bitpos)) == NULL_TREE) > > + return false; > > What I'm missing is a check whether the old address computation stmts > will be dead after the transform. Hm, not quite sure what to do here. Prior to the transformation I'll have an assignment with something like: ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td) on LHS or RHS. Ta and Td will be part of the replacement. What should I be checking for? <snip> > > > > - if (is_gimple_assign (stmt) > > - && !stmt_could_throw_p (stmt)) > > + /* Look for restructuring opportunities within an expression > > + that references memory. We only do this for blocks not > > + contained in loops, since the ivopts machinery does a > > + good job on loop expressions, and we don't want to interfere > > + with other loop optimizations. */ > > + if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt)) > > { > > + tree *lhs, *rhs; > > + lhs = gimple_assign_lhs_ptr (stmt); > > + chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref; > > + rhs = gimple_assign_rhs1_ptr (stmt); > > + chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref; > > It will either be a store or a load, but never both (unless it's an > aggregate copy which I think we should not handle). So ... > > if (gimple_vdef (stmt)) > ... lhs > else if (gimple_vuse (stmt)) > ... rhs OK, with your suggested gating on non-BLKmode I agree. > > + } > > + > > + else if (is_gimple_assign (stmt) > > + && !stmt_could_throw_p (stmt)) > > + { > > tree lhs, rhs1, rhs2; > > enum tree_code rhs_code = gimple_assign_rhs_code (stmt); > > > > @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb) > > } > > } > > } > > + /* If memory references have been restructured, immediate uses need > > + to be cleaned up. */ > > + if (chgd_mem_ref) > > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > + update_stmt (gsi_stmt (gsi)); > > ICK. Definitely a no ;) > > Why does a update_stmt () after the restructure_mem_ref call not work? Ah, yeah, I meant to check again on that before submitting. >.< IIRC, at some point the update_stmt () following restructure_mem_ref was still giving me verify errors. I thought perhaps the statements created by force_gimple_operand_gsi might be giving me trouble -- threw this in to work around it and missed it on my pre-submission review. I'll re-check this. If for some reason it is the inserted statements, I can more carefully update just those -- but I just need to go back and sort it out. Sorry for letting that slip through. Thanks for the review! Bill > > Richard.
On Thu, 6 Oct 2011, William J. Schmidt wrote: > On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote: > > People have already commented on pieces, so I'm looking only > > at the tree-ssa-reassoc.c pieces (did you consider piggy-backing > > on IVOPTs instead? The idea is to expose additional CSE > > opportunities, right? So it's sort-of a strength-reduction > > optimization on scalar code (classically strength reduction > > in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }). > > That might be worth in general, even for non-address cases. > > So - if you rename that thing to tree-ssa-strength-reduce.c you > > can get away without piggy-backing on anything ;) If you > > structure it to detect a strength reduction opportunity > > (thus, you'd need to match two/multiple of the patterns at the same time) > > that would be a bonus ... generalizing it a little bit would be > > another. > > These are all good ideas. I will think about casting this as a more > general strength reduction over extended basic blocks outside of loops. > First I'll put together some simple tests to see whether we're currently > missing some non-address opportunities. > > <snip> > > > > + mult_op0 = TREE_OPERAND (offset, 0); > > > + mult_op1 = TREE_OPERAND (offset, 1); > > > + > > > + if (TREE_CODE (mult_op0) != PLUS_EXPR > > > + || TREE_CODE (mult_op1) != INTEGER_CST > > > + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) > > > + return NULL_TREE; > > > + > > > + t1 = TREE_OPERAND (base, 0); > > > + t2 = TREE_OPERAND (mult_op0, 0); > > > + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); > > > + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); > > > + c3 = TREE_INT_CST_LOW (mult_op1); > > > > Before accessing TREE_INT_CST_LOW you need to make sure the > > constants fit into a HWI using host_integerp () (which > > conveniently includes the check for INTEGER_CST). > > > > Note that you need to sign-extend the MEM_REF offset, > > thus use mem_ref_offset (base).low instead of > > TREE_INT_CST_LOW (TREE_OPERAND (base, 1)). Might be worth > > to add a testcase with negative offset ;) > > D'oh! >.< > > > > > > + c4 = bitpos / BITS_PER_UNIT; > > > + c = c1 + c2 * c3 + c4; > > > > And you don't know whether this operation overflows. Thus it's > > probably easiest to use double_ints instead of HOST_WIDE_INTs > > in all of the code. > > OK, thanks, will do. > > <snip> > > > > > > + /* Determine whether the expression can be represented with base and > > > + offset components. */ > > > + base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode, > > > + &unsignedp, &volatilep, false); > > > + if (!base || !offset) > > > + return false; > > > + > > > + /* Look for a restructuring opportunity. */ > > > + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, > > > + offset, bitpos)) == NULL_TREE) > > > + return false; > > > > What I'm missing is a check whether the old address computation stmts > > will be dead after the transform. > > Hm, not quite sure what to do here. Prior to the transformation I'll > have an assignment with something like: > > ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td) > > on LHS or RHS. Ta and Td will be part of the replacement. What should > I be checking for? Doh, I thought you were matching gimple stmts that do the address computation. But now I see you are matching the tree returned from get_inner_reference. So no need to check anything for that case. But that keeps me wondering what you'll do if the accesses were all pointer arithmetic, not arrays. Thus, extern void foo (int, int, int); void f (int *p, unsigned int n) { foo (p[n], p[n+64], p[n+128]); } wouldn't that have the same issue and you wouldn't handle it? Richard.
On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote: <snip> > > Doh, I thought you were matching gimple stmts that do the address > computation. But now I see you are matching the tree returned from > get_inner_reference. So no need to check anything for that case. > > But that keeps me wondering what you'll do if the accesses were > all pointer arithmetic, not arrays. Thus, > > extern void foo (int, int, int); > > void > f (int *p, unsigned int n) > { > foo (p[n], p[n+64], p[n+128]); > } > > wouldn't that have the same issue and you wouldn't handle it? > > Richard. > Good point. This indeed gets missed here, and that's more fuel for doing a generalized strength reduction along with the special cases like p->a[n] that are only exposed with get_inner_reference. (The pointer arithmetic cases were picked up in my earlier "big-hammer" approach using the aff-comb machinery, but that had too many problems in the end, as you know.) So for the long term I will look into a full strength reducer for non-loop code. For the short term, what do you think about keeping this single transformation in reassoc to make sure it gets into 4.7? I would plan to strip it back out and fold it into the strength reducer thereafter, which might or might not make 4.7 depending on my other responsibilities and how the 4.7 schedule goes. I haven't seen anything official, but I'm guessing we're getting towards the end of 4.7 stage 1?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/06/11 04:13, Richard Guenther wrote: > > People have already commented on pieces, so I'm looking only at the > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs > instead? The idea is to expose additional CSE opportunities, > right? So it's sort-of a strength-reduction optimization on scalar > code (classically strength reduction in loops transforms for (i) { > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in > general, even for non-address cases. So - if you rename that thing > to tree-ssa-strength-reduce.c you can get away without > piggy-backing on anything ;) If you structure it to detect a > strength reduction opportunity (thus, you'd need to match > two/multiple of the patterns at the same time) that would be a > bonus ... generalizing it a little bit would be another. There's a variety of literature that uses PRE to detect and optimize straightline code strength reduction. I poked at it at one time (RTL gcse framework) and it looked reasonably promising. Never pushed it all the way through. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOjebJAAoJEBRtltQi2kC71ogH/AkMNzXpYK1GXp2EhoS+3Dhn T1mWDKdHT5+ozpuAxRFzuCSQ8HmkbLJk8fGpOyUuLr15zEnT1isE7cU3i4ZzY3o0 lduo9Ck23rMWNroYgxbV+zPvArW5MG9qrGO6XSBynfipmlpznEo8zQPiaoaASlHz 8G7gd9P2la1QHha9OVtiCMKs0zgckU55RqiwV7d8DMi5tgoq5wkN+qcKCoSI7+b0 jxAukIcp6O8QZ6ADcHyAdav+zZzGDBycEhgakam71WifjFlysah2TG05SsK75Dxi h3S13yPpx/A8zBuex5osL0qOGn0H7L93uAsTxcv4dTEpUl4Jx7Y5FoPOEp5D1Z4= =LcZy -----END PGP SIGNATURE-----
On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/06/11 04:13, Richard Guenther wrote: > > > > > People have already commented on pieces, so I'm looking only at the > > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs > > instead? The idea is to expose additional CSE opportunities, > > right? So it's sort-of a strength-reduction optimization on scalar > > code (classically strength reduction in loops transforms for (i) { > > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in > > general, even for non-address cases. So - if you rename that thing > > to tree-ssa-strength-reduce.c you can get away without > > piggy-backing on anything ;) If you structure it to detect a > > strength reduction opportunity (thus, you'd need to match > > two/multiple of the patterns at the same time) that would be a > > bonus ... generalizing it a little bit would be another. > There's a variety of literature that uses PRE to detect and optimize > straightline code strength reduction. I poked at it at one time (RTL > gcse framework) and it looked reasonably promising. Never pushed it > all the way through. > > jeff I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this question has come up before. The former also suggested a PRE-based approach.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/06/11 12:02, William J. Schmidt wrote: > On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> On 10/06/11 04:13, Richard Guenther wrote: >> >>> >>> People have already commented on pieces, so I'm looking only at >>> the tree-ssa-reassoc.c pieces (did you consider piggy-backing >>> on IVOPTs instead? The idea is to expose additional CSE >>> opportunities, right? So it's sort-of a strength-reduction >>> optimization on scalar code (classically strength reduction in >>> loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += >>> x }). That might be worth in general, even for non-address >>> cases. So - if you rename that thing to >>> tree-ssa-strength-reduce.c you can get away without >>> piggy-backing on anything ;) If you structure it to detect a >>> strength reduction opportunity (thus, you'd need to match >>> two/multiple of the patterns at the same time) that would be a >>> bonus ... generalizing it a little bit would be another. >> There's a variety of literature that uses PRE to detect and >> optimize straightline code strength reduction. I poked at it at >> one time (RTL gcse framework) and it looked reasonably promising. >> Never pushed it all the way through. >> >> jeff > > I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this > question has come up before. The former also suggested a > PRE-based approach. Yea. We've kicked it around several times over the last 15 or so years. When I briefly looked at it, I was doing so more in the context of eliminating all the optimize_related_values crap, purely as a cleanup and utlimately couldn't justify the time. IIRC Morgan & Muchnick both had write-ups on the basic concepts. There's probably other literature as well. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOje4wAAoJEBRtltQi2kC7q9UIAIXdiGG5Reu75PBkMPO9KKhn RRGKQMMNSinGDyyORGqxfqwrirtFuQqzn+ITRsfjegHydUTwwDDaAtTyqEqFmpt0 2phGYBOS5pN4VImKzrd2fxlwbuW0jUlOpDuWFK+K10W8jU3SlJSIZhfaSMh1PwC5 IQm6FLDiuRuNSgyYattUnI5KZ2chN2QEkfUBQgDvbxHXfPDqjNQymIfv1K9iymrG j3Wq7i47fBkYbnPYtAQ9GCYsmT6Wo2v+2/ZeFWE417FYYhgCdBeYu2iZPE6Nm8Pb SypPDyi1AQ3QRfg+LPiN1bdQk40MhfPlMhHZtnVtq8nEa9+fLTgO/ERzCD0G+r8= =XWLf -----END PGP SIGNATURE-----
On Thu, 6 Oct 2011, William J. Schmidt wrote: > On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote: > > <snip> > > > > > Doh, I thought you were matching gimple stmts that do the address > > computation. But now I see you are matching the tree returned from > > get_inner_reference. So no need to check anything for that case. > > > > But that keeps me wondering what you'll do if the accesses were > > all pointer arithmetic, not arrays. Thus, > > > > extern void foo (int, int, int); > > > > void > > f (int *p, unsigned int n) > > { > > foo (p[n], p[n+64], p[n+128]); > > } > > > > wouldn't that have the same issue and you wouldn't handle it? > > > > Richard. > > > > Good point. This indeed gets missed here, and that's more fuel for > doing a generalized strength reduction along with the special cases like > p->a[n] that are only exposed with get_inner_reference. > > (The pointer arithmetic cases were picked up in my earlier "big-hammer" > approach using the aff-comb machinery, but that had too many problems in > the end, as you know.) Yeah, I know. It's a very tricky area ;) > So for the long term I will look into a full strength reducer for > non-loop code. For the short term, what do you think about keeping this > single transformation in reassoc to make sure it gets into 4.7? I would > plan to strip it back out and fold it into the strength reducer > thereafter, which might or might not make 4.7 depending on my other > responsibilities and how the 4.7 schedule goes. I haven't seen anything > official, but I'm guessing we're getting towards the end of 4.7 stage 1? It's a reasonable plan - you'd have to introduce a late reassoc pass though. Can you separate out the RTL fwprop changes? So we can iterate over the tree parts separately. Thanks, Richard.
On Thu, 6 Oct 2011, Jeff Law wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/06/11 04:13, Richard Guenther wrote: > > > > > People have already commented on pieces, so I'm looking only at the > > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs > > instead? The idea is to expose additional CSE opportunities, > > right? So it's sort-of a strength-reduction optimization on scalar > > code (classically strength reduction in loops transforms for (i) { > > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in > > general, even for non-address cases. So - if you rename that thing > > to tree-ssa-strength-reduce.c you can get away without > > piggy-backing on anything ;) If you structure it to detect a > > strength reduction opportunity (thus, you'd need to match > > two/multiple of the patterns at the same time) that would be a > > bonus ... generalizing it a little bit would be another. > There's a variety of literature that uses PRE to detect and optimize > straightline code strength reduction. I poked at it at one time (RTL > gcse framework) and it looked reasonably promising. Never pushed it > all the way through. I poked at it in the tree PRE framework at some point but never pushed it through either. At least it didn't feel like it "fits" PRE (and PRE should also be able to handle the loop case). Richard.
On 10/07/2011 10:00 AM, Richard Guenther wrote: > It's a reasonable plan - you'd have to introduce a late reassoc > pass though. Can you separate out the RTL fwprop changes? So > we can iterate over the tree parts separately. That's indeed better, also because they became CSE changes. Paolo
On Fri, 2011-10-07 at 11:17 +0200, Paolo Bonzini wrote: > On 10/07/2011 10:00 AM, Richard Guenther wrote: > > It's a reasonable plan - you'd have to introduce a late reassoc > > pass though. Can you separate out the RTL fwprop changes? So > > we can iterate over the tree parts separately. > > That's indeed better, also because they became CSE changes. > > Paolo > Yes, I'll plan to work these two separately. Paolo, I should be able to shepherd your suggested patch through. Normally company policy frowns on committing other people's patches, but in a case like this where you're no longer actively doing GCC development, I should be able to overcome that. BTW, Richard, whatever my former issues with update_stmt were are gone, so I'll be able to clean up that wart. Thanks, Bill
Index: gcc/fwprop.c =================================================================== --- gcc/fwprop.c (revision 179317) +++ gcc/fwprop.c (working copy) @@ -131,6 +131,15 @@ static VEC(df_ref,heap) *reg_defs_stack; static bitmap local_md; static bitmap local_lr; +/* Auxiliary information for each block for this pass. */ +typedef struct GTY(()) fwprop_bb_aux_d +{ + /* Registers appearing in (mem (plus (reg ... patterns in this block. */ + bitmap mem_plus_regs; +} fwprop_bb_aux, *fwprop_bb_aux_t; + +#define MEM_PLUS_REGS(BB) ((fwprop_bb_aux_t) ((BB)->aux))->mem_plus_regs + /* Return the only def in USE's use-def chain, or NULL if there is more than one def in the chain. */ @@ -212,7 +221,43 @@ process_uses (df_ref *use_rec, int top_flag) } +/* Record whether EXPR in block BB is of the form (mem (plus (reg ... */ static void +record_mem_plus_reg (basic_block bb, rtx expr) +{ + rtx addr, reg; + + if (GET_CODE (expr) != MEM) + return; + + addr = XEXP (expr, 0); + if (GET_CODE (addr) != PLUS) + return; + + reg = XEXP (addr, 0); + if (!REG_P (reg)) + return; + + (void)bitmap_set_bit (MEM_PLUS_REGS (bb), REGNO (reg)); +} + + +/* Record whether INSN in block BB contains any patterns of the form + (mem (plus (reg ... */ +static void +record_mem_plus_regs (basic_block bb, rtx insn) +{ + rtx insn_set = single_set (insn); + + if (insn_set) + { + record_mem_plus_reg (bb, SET_DEST (insn_set)); + record_mem_plus_reg (bb, SET_SRC (insn_set)); + } +} + + +static void single_def_use_enter_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED, basic_block bb) { @@ -230,6 +275,8 @@ single_def_use_enter_block (struct dom_walk_data * process_uses (df_get_artificial_uses (bb_index), DF_REF_AT_TOP); process_defs (df_get_artificial_defs (bb_index), DF_REF_AT_TOP); + MEM_PLUS_REGS (bb) = BITMAP_ALLOC (NULL); + /* We don't call df_simulate_initialize_forwards, as it may overestimate the live registers if there are unused artificial defs. We prefer liveness to be underestimated. */ @@ -242,8 +289,15 @@ single_def_use_enter_block (struct dom_walk_data * process_uses (DF_INSN_UID_EQ_USES (uid), 0); process_defs (DF_INSN_UID_DEFS (uid), 0); df_simulate_one_insn_forwards (bb, insn, local_lr); + record_mem_plus_regs (bb, insn); } + if (dump_file) + { + fprintf (dump_file, "mem_plus_regs (%d): ", bb->index); + dump_bitmap (dump_file, MEM_PLUS_REGS (bb)); + } + process_uses (df_get_artificial_uses (bb_index), 0); process_defs (df_get_artificial_defs (bb_index), 0); } @@ -295,6 +349,8 @@ build_single_def_use_links (void) local_md = BITMAP_ALLOC (NULL); local_lr = BITMAP_ALLOC (NULL); + alloc_aux_for_blocks (sizeof (fwprop_bb_aux)); + /* Walk the dominator tree looking for single reaching definitions dominating the uses. This is similar to how SSA form is built. */ walk_data.dom_direction = CDI_DOMINATORS; @@ -1207,6 +1263,69 @@ forward_propagate_asm (df_ref use, rtx def_insn, r return true; } +/* We are proposing to replace a USE of REG in USE_SET. Determine + whether we have a situation where two storage uses of REG occur + in the same block as follows. Each use can be either a memory + store or a memory load. + + ... (mem (reg REG)) + + ... (mem (plus (reg REG) + (...))) + + The problem is that it will look profitable to propagate something + like + + (set (reg REG) + (plus (reg X) + (reg Y))) + + into the first use, but not into the second one. This breaks a + CSE opportunity and raises register pressure by extending the + lifetimes of X and Y. To avoid this, don't propagate into the + first use when this very specific situation arises. */ +static bool +locally_poor_mem_replacement (df_ref use, rtx use_set, rtx reg, rtx def_set) +{ + rtx rhs, addend, mem, base; + unsigned i; + + /* We're only concerned if the RHS of def_set is the sum of two + registers. */ + rhs = SET_SRC (def_set); + + if (GET_CODE (rhs) != PLUS) + return false; + + for (i = 0; i < 2; i++) + { + addend = XEXP (rhs, i); + if (!REG_P (addend)) + return false; + } + + /* The use must have a single SET and be used in addressing. */ + if (!use_set || !DF_REF_REG_MEM_P (use)) + return false; + + if (DF_REF_REG_MEM_STORE_P (use)) + mem = SET_DEST (use_set); + else + mem = SET_SRC (use_set); + + if (GET_CODE (mem) != MEM) + return false; + + /* We are specifically interested in the base address. */ + base = XEXP (mem, 0); + if (reg != base) + return false; + + /* We've found a use of the first form. Check whether uses of the + second form exist in the same block. If found, don't replace. */ + return bitmap_bit_p (MEM_PLUS_REGS (DF_REF_BB (use)), DF_REF_REGNO (use)); +} + /* Try to replace USE with SRC (defined in DEF_INSN) and simplify the result. */ @@ -1273,6 +1392,11 @@ forward_propagate_and_simplify (df_ref use, rtx de return false; } + /* Check for a specific unprofitable propagation into a memory + reference, where the propagation will break a CSE opportunity. */ + if (locally_poor_mem_replacement (use, use_set, reg, def_set)) + return false; + if (asm_use >= 0) return forward_propagate_asm (use, def_insn, def_set, reg); @@ -1403,6 +1527,12 @@ fwprop_init (void) static void fwprop_done (void) { + basic_block bb; + + FOR_EACH_BB (bb) + BITMAP_FREE (MEM_PLUS_REGS (bb)); + free_aux_for_blocks (); + loop_optimizer_finalize (); VEC_free (df_ref, heap, use_def_ref); Index: gcc/tree.h =================================================================== --- gcc/tree.h (revision 179317) +++ gcc/tree.h (working copy) @@ -5775,6 +5775,9 @@ tree target_for_debug_bind (tree); extern tree tree_mem_ref_addr (tree, tree); extern void copy_mem_ref_info (tree, tree); +/* In tree-ssa-loop-ivopts.c. */ +extern void copy_ref_info (tree, tree); + /* In tree-vrp.c */ extern bool ssa_name_nonnegative_p (const_tree); Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-assembler-times "lw.*3," 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*4,128\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*5,64\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->a[n], p->c[n], p->b[n]); + else if (n > 3) + foo (p->b[n], p->a[n], p->c[n]); +} + +/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 3) + { + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->b[n], p->a[n], p->c[n]); + } +} + +/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ +/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->a[n], p->c[n], p->b[n]); + else if (n > 3) + foo (p->b[n], p->a[n], p->c[n]); +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc2" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); + if (n > 3) + { + foo (p->a[n], p->c[n], p->b[n]); + if (n > 12) + foo (p->b[n], p->a[n], p->c[n]); + } +} + +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */ +/* { dg-final { cleanup-tree-dump "reassoc2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-fwprop1" } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p->a[n], p->c[n], p->b[n]); +} + +/* { dg-final { scan-rtl-dump-times "\\(mem.\*:.\* \\(reg.\*p_1\\(D\\)->a" 1 "fwprop1" } } */ +/* { dg-final { cleanup-rtl-dump "fwprop1" } } */ Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 179317) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -6280,7 +6280,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da /* Copies the reference information from OLD_REF to NEW_REF. */ -static void +void copy_ref_info (tree new_ref, tree old_ref) { tree new_ptr_base = NULL_TREE; Index: gcc/tree-ssa-reassoc.c =================================================================== --- gcc/tree-ssa-reassoc.c (revision 179317) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb) break_up_subtract_bb (son); } +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI, + determine whether there is a profitable opportunity to restructure + address arithmetic within BASE and OFFSET. If so, produce such + a restructuring in *MEM_REF and return true; else return false. */ + +static tree +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi, + tree base, tree offset, HOST_WIDE_INT bitpos) +{ + tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref; + HOST_WIDE_INT c, c1, c2, c3, c4; + + /* Look for the following pattern: + + base = MEM_REF (T1, C1); + offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3) + bitpos = C4 * BITS_PER_UNIT + + If found, convert it to: + + MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), + C1 + (C2 * C3) + C4) + */ + if (!base + || !offset + || TREE_CODE (base) != MEM_REF + || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST + || TREE_CODE (offset) != MULT_EXPR) + return NULL_TREE; + + mult_op0 = TREE_OPERAND (offset, 0); + mult_op1 = TREE_OPERAND (offset, 1); + + if (TREE_CODE (mult_op0) != PLUS_EXPR + || TREE_CODE (mult_op1) != INTEGER_CST + || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST) + return NULL_TREE; + + t1 = TREE_OPERAND (base, 0); + t2 = TREE_OPERAND (mult_op0, 0); + c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1)); + c3 = TREE_INT_CST_LOW (mult_op1); + c4 = bitpos / BITS_PER_UNIT; + c = c1 + c2 * c3 + c4; + + offset_type = TREE_TYPE (TREE_OPERAND (base, 1)); + + mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, + build_int_cst (sizetype, c3)); + mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL, + true, GSI_SAME_STMT); + add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); + add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL, + true, GSI_SAME_STMT); + mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr, + build_int_cst (offset_type, c)); + return mem_ref; +} + +/* If *EXPR contains a memory reference, determine whether there is an + opportunity to restructure the base and offset expressions for + improved performance. */ + +static bool +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi) +{ + enum tree_code code = TREE_CODE (*expr); + tree base, offset, mem_ref; + HOST_WIDE_INT bitsize, bitpos; + enum machine_mode mode; + int unsignedp, volatilep; + + /* Only expressions that reference memory are of interest. Bitfield + references should eventually be lowered prior to this point and + are not dealt with. */ + if (! handled_component_p (*expr) + || code == BIT_FIELD_REF + || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))) + return false; + + /* Determine whether the expression can be represented with base and + offset components. */ + base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode, + &unsignedp, &volatilep, false); + if (!base || !offset) + return false; + + /* Look for a restructuring opportunity. */ + if ((mem_ref = restructure_base_and_offset (expr, gsi, base, + offset, bitpos)) == NULL_TREE) + return false; + + /* Found one. Record the replacement in the dump file and complete + the replacement. */ + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\nOriginal_expr = "); + print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\nmem_ref = "); + print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS); + fprintf (dump_file, "\n\n"); + } + + copy_ref_info (mem_ref, *expr); + *expr = mem_ref; + + return true; +} + /* Reassociate expressions in basic block BB and its post-dominator as children. */ @@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + bool chgd_mem_ref = false; + bool in_loop = loop_depth (bb->loop_father) != 0; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { gimple stmt = gsi_stmt (gsi); - if (is_gimple_assign (stmt) - && !stmt_could_throw_p (stmt)) + /* Look for restructuring opportunities within an expression + that references memory. We only do this for blocks not + contained in loops, since the ivopts machinery does a + good job on loop expressions, and we don't want to interfere + with other loop optimizations. */ + if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt)) { + tree *lhs, *rhs; + lhs = gimple_assign_lhs_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref; + rhs = gimple_assign_rhs1_ptr (stmt); + chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref; + } + + else if (is_gimple_assign (stmt) + && !stmt_could_throw_p (stmt)) + { tree lhs, rhs1, rhs2; enum tree_code rhs_code = gimple_assign_rhs_code (stmt); @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb) } } } + /* If memory references have been restructured, immediate uses need + to be cleaned up. */ + if (chgd_mem_ref) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + update_stmt (gsi_stmt (gsi)); + for (son = first_dom_son (CDI_POST_DOMINATORS, bb); son; son = next_dom_son (CDI_POST_DOMINATORS, son))