Message ID | 201209131511.46623.ebotcazou@adacore.com |
---|---|
State | New |
Headers | show |
On Thu, Sep 13, 2012 at 3:11 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: > Hi, > > the attached testcase triggers a memory exhaustion at -O2 during the cunrolli > pass on the mainline and 4.7 branch. The problem is that the size estimates > disregard induction variable computations on the ground that they will be > folded later. But they aren't folded between the iterations of the loop so > they can add up and exhaust the memory during SSA updating if stars are > properly aligned. > > The patch is a somewhat simple-minded fix... Bootstrapped/regtested on x86_64- > suse-linux. OK for mainline and 4.7 branch? Indeed somewhat simple-minded - when originally fixing a similar testcase (heh ...) I improved things by improving CFG cleanup to fold some more conditions by looking at SSA defs, that improved things a lot. I also thought the real fix should involve some scalar optimization on a sub-range of the CFG. That should be easiest when performing the copy in the first place - after all we keep copy tables and such for the purpose of update-SSA so we might as well create a lattice from PHI nodes we disassemble for use by copy_bb ... On the patch itself - can you call the simple CCP before we call cleanup_tree_cfg () please? We might get rid of that weirdo SSA lookup there again then: static bool cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi) { ... /* For conditions try harder and lookup single-argument PHI nodes. Only do so from the same basic-block though as other basic-blocks may be dead already. */ if (TREE_CODE (lhs) == SSA_NAME && !name_registered_for_update_p (lhs)) ... + FOR_EACH_IMM_USE_ON_STMT (use, iter) + propagate_value (use, gimple_assign_rhs1 (stmt)); + + fold_stmt_inplace (&use_stmt_gsi); + update_stmt (use_stmt); Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere. if (fold_stmt_inplace (&use_stmt_gsi)) update_stmt (use_stmt); Thanks, Richard. > > 2012-09-13 Eric Botcazou <ebotcazou@adacore.com> > > * tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling): New. > (tree_unroll_loops_completely): Starting from the second iteration, > propagate constants within the innermost loops. > > > 2012-09-13 Eric Botcazou <ebotcazou@adacore.com> > > * gnat.dg/loop_optimization12.ad[sb]: New test. > > > -- > Eric Botcazou
> Indeed somewhat simple-minded - when originally fixing a similar testcase > (heh ...) I improved things by improving CFG cleanup to fold some more > conditions by looking at SSA defs, that improved things a lot. I also > thought the real fix should involve some scalar optimization on a > sub-range of the CFG. That should be easiest when performing the copy in > the first place - after all we keep copy tables and such for the purpose > of update-SSA so we might as well create a lattice from PHI nodes we > disassemble for use by copy_bb ... I also thought of similar approaches, but couldn't really come up with something satisfactory. > On the patch itself - can you call the simple CCP before we call > cleanup_tree_cfg () please? That was the original implementation (in fact, in the original implementation the simple CCP was conditionalized on canonicalize_loop_induction_variables, but you broke it a few days ago by moving the call to update_ssa :-) The problem is that, if it is moved to before cleanup_tree_cfg, then: 1) you have more basic blocks (15 vs 2 for the testcase), 2) you also need to do simple CCP for degenerate PHI nodes. That being said, I can certainly save in a bitmap the loop fathers for which canonicalize_loop_induction_variables unrolled a child and do the simple CPP (augmented for degenerate PHI nodes) on them between the calls to update_ssa and cleanup_tree_cfg. > We might get rid of that weirdo SSA lookup > there again then: > > static bool > cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi) > { > ... > /* For conditions try harder and lookup single-argument > PHI nodes. Only do so from the same basic-block though > as other basic-blocks may be dead already. */ > if (TREE_CODE (lhs) == SSA_NAME > && !name_registered_for_update_p (lhs)) I'll investigate. > + FOR_EACH_IMM_USE_ON_STMT (use, iter) > + propagate_value (use, gimple_assign_rhs1 (stmt)); > + > + fold_stmt_inplace (&use_stmt_gsi); > + update_stmt (use_stmt); > > Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere. > > if (fold_stmt_inplace (&use_stmt_gsi)) > update_stmt (use_stmt); OK, will adjust, thanks.
On Thu, Sep 13, 2012 at 4:00 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> Indeed somewhat simple-minded - when originally fixing a similar testcase >> (heh ...) I improved things by improving CFG cleanup to fold some more >> conditions by looking at SSA defs, that improved things a lot. I also >> thought the real fix should involve some scalar optimization on a >> sub-range of the CFG. That should be easiest when performing the copy in >> the first place - after all we keep copy tables and such for the purpose >> of update-SSA so we might as well create a lattice from PHI nodes we >> disassemble for use by copy_bb ... > > I also thought of similar approaches, but couldn't really come up with > something satisfactory. > >> On the patch itself - can you call the simple CCP before we call >> cleanup_tree_cfg () please? > > That was the original implementation (in fact, in the original implementation > the simple CCP was conditionalized on canonicalize_loop_induction_variables, > but you broke it a few days ago by moving the call to update_ssa :-) > > The problem is that, if it is moved to before cleanup_tree_cfg, then: > 1) you have more basic blocks (15 vs 2 for the testcase), > 2) you also need to do simple CCP for degenerate PHI nodes. Yes - now cfg_cleanup does that (and it really shouldn't be its job). That was the improvement I talked about - reducing the number of BBs a lot. > That being said, I can certainly save in a bitmap the loop fathers for which > canonicalize_loop_induction_variables unrolled a child and do the simple CPP > (augmented for degenerate PHI nodes) on them between the calls to update_ssa > and cleanup_tree_cfg. Ah, indeed ;) Or just push struct loop of changed loops onto a stack. >> We might get rid of that weirdo SSA lookup >> there again then: >> >> static bool >> cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi) >> { >> ... >> /* For conditions try harder and lookup single-argument >> PHI nodes. Only do so from the same basic-block though >> as other basic-blocks may be dead already. */ >> if (TREE_CODE (lhs) == SSA_NAME >> && !name_registered_for_update_p (lhs)) > > I'll investigate. > >> + FOR_EACH_IMM_USE_ON_STMT (use, iter) >> + propagate_value (use, gimple_assign_rhs1 (stmt)); >> + >> + fold_stmt_inplace (&use_stmt_gsi); >> + update_stmt (use_stmt); >> >> Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere. >> >> if (fold_stmt_inplace (&use_stmt_gsi)) >> update_stmt (use_stmt); > > OK, will adjust, thanks. Thanks, Richard. > -- > Eric Botcazou
Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 191198) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -503,6 +503,48 @@ canonicalize_induction_variables (void) return 0; } +/* Propagate constant SSA_NAMEs defined in basic block BB. */ + +static void +propagate_constants_for_unrolling (basic_block bb) +{ + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); ) + { + gimple stmt = gsi_stmt (gsi); + tree lhs; + + /* Look for assignments to SSA names with constant RHS. */ + if (is_gimple_assign (stmt) + && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME) + && gimple_assign_rhs_code (stmt) == INTEGER_CST) + { + imm_use_iterator iter; + gimple use_stmt; + + /* Propagate the RHS into all the uses of the SSA name. */ + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) + { + gimple_stmt_iterator use_stmt_gsi = gsi_for_stmt (use_stmt); + use_operand_p use; + + FOR_EACH_IMM_USE_ON_STMT (use, iter) + propagate_value (use, gimple_assign_rhs1 (stmt)); + + fold_stmt_inplace (&use_stmt_gsi); + update_stmt (use_stmt); + } + + /* And get rid of the now unused SSA name. */ + gsi_remove (&gsi, true); + release_ssa_name (lhs); + } + else + gsi_next (&gsi); + } +} + /* Unroll LOOPS completely if they iterate just few times. Unless MAY_INCREASE_SIZE is true, perform the unrolling only if the size of the code does not increase. */ @@ -522,6 +564,19 @@ tree_unroll_loops_completely (bool may_i FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) { + /* If we have already unrolled, we need to propagate constants + within the new basic blocks to fold away induction variable + computations; otherwise, the size might blow up before the + iteration is complete and the IR eventually cleaned up. */ + if (iteration > 0) + { + unsigned i; + basic_block *body = get_loop_body_in_dom_order (loop); + for (i = 0; i < loop->num_nodes; i++) + propagate_constants_for_unrolling (body[i]); + free (body); + } + if (may_increase_size && optimize_loop_for_speed_p (loop) /* Unroll outermost loops only if asked to do so or they do not cause code growth. */ @@ -530,6 +585,7 @@ tree_unroll_loops_completely (bool may_i ul = UL_ALL; else ul = UL_NO_GROWTH; + changed |= canonicalize_loop_induction_variables (loop, false, ul, !flag_tree_loop_ivcanon); }