diff mbox

Fix memory exhaustion during cunrolli

Message ID 201209131511.46623.ebotcazou@adacore.com
State New
Headers show

Commit Message

Eric Botcazou Sept. 13, 2012, 1:11 p.m. UTC
Hi,

the attached testcase triggers a memory exhaustion at -O2 during the cunrolli 
pass on the mainline and 4.7 branch.  The problem is that the size estimates 
disregard induction variable computations on the ground that they will be 
folded later.  But they aren't folded between the iterations of the loop so 
they can add up and exhaust the memory during SSA updating if stars are 
properly aligned.

The patch is a somewhat simple-minded fix...  Bootstrapped/regtested on x86_64-
suse-linux.  OK for mainline and 4.7 branch?


2012-09-13  Eric Botcazou  <ebotcazou@adacore.com>

	* tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling): New.
	(tree_unroll_loops_completely): Starting from the second iteration,
	propagate constants within the innermost loops.


2012-09-13  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/loop_optimization12.ad[sb]: New test.

Comments

Richard Biener Sept. 13, 2012, 1:34 p.m. UTC | #1
On Thu, Sep 13, 2012 at 3:11 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> the attached testcase triggers a memory exhaustion at -O2 during the cunrolli
> pass on the mainline and 4.7 branch.  The problem is that the size estimates
> disregard induction variable computations on the ground that they will be
> folded later.  But they aren't folded between the iterations of the loop so
> they can add up and exhaust the memory during SSA updating if stars are
> properly aligned.
>
> The patch is a somewhat simple-minded fix...  Bootstrapped/regtested on x86_64-
> suse-linux.  OK for mainline and 4.7 branch?

Indeed somewhat simple-minded - when originally fixing a similar testcase
(heh ...) I improved things by improving CFG cleanup to fold some more
conditions by looking at SSA defs, that improved things a lot.  I also thought
the real fix should involve some scalar optimization on a sub-range of the CFG.
That should be easiest when performing the copy in the first place - after all
we keep copy tables and such for the purpose of update-SSA so we might as
well create a lattice from PHI nodes we disassemble for use by copy_bb ...

On the patch itself - can you call the simple CCP before we call
cleanup_tree_cfg () please?  We might get rid of that weirdo SSA lookup
there again then:

static bool
cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
{
...
            /* For conditions try harder and lookup single-argument
               PHI nodes.  Only do so from the same basic-block though
               as other basic-blocks may be dead already.  */
            if (TREE_CODE (lhs) == SSA_NAME
                && !name_registered_for_update_p (lhs))
...

+             FOR_EACH_IMM_USE_ON_STMT (use, iter)
+               propagate_value (use, gimple_assign_rhs1 (stmt));
+
+             fold_stmt_inplace (&use_stmt_gsi);
+             update_stmt (use_stmt);

Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.

  if (fold_stmt_inplace (&use_stmt_gsi))
    update_stmt (use_stmt);

Thanks,
Richard.

>
> 2012-09-13  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling): New.
>         (tree_unroll_loops_completely): Starting from the second iteration,
>         propagate constants within the innermost loops.
>
>
> 2012-09-13  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gnat.dg/loop_optimization12.ad[sb]: New test.
>
>
> --
> Eric Botcazou
Eric Botcazou Sept. 13, 2012, 2 p.m. UTC | #2
> Indeed somewhat simple-minded - when originally fixing a similar testcase
> (heh ...) I improved things by improving CFG cleanup to fold some more
> conditions by looking at SSA defs, that improved things a lot.  I also
> thought the real fix should involve some scalar optimization on a
> sub-range of the CFG. That should be easiest when performing the copy in
> the first place - after all we keep copy tables and such for the purpose
> of update-SSA so we might as well create a lattice from PHI nodes we
> disassemble for use by copy_bb ...

I also thought of similar approaches, but couldn't really come up with 
something satisfactory.

> On the patch itself - can you call the simple CCP before we call
> cleanup_tree_cfg () please?

That was the original implementation (in fact, in the original implementation 
the simple CCP was conditionalized on canonicalize_loop_induction_variables, 
but you broke it a few days ago by moving the call to update_ssa :-)

The problem is that, if it is moved to before cleanup_tree_cfg, then:
 1) you have more basic blocks (15 vs 2 for the testcase),
 2) you also need to do simple CCP for degenerate PHI nodes.

That being said, I can certainly save in a bitmap the loop fathers for which
canonicalize_loop_induction_variables unrolled a child and do the simple CPP 
(augmented for degenerate PHI nodes) on them between the calls to update_ssa 
and cleanup_tree_cfg.

> We might get rid of that weirdo SSA lookup
> there again then:
> 
> static bool
> cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
> {
> ...
>             /* For conditions try harder and lookup single-argument
>                PHI nodes.  Only do so from the same basic-block though
>                as other basic-blocks may be dead already.  */
>             if (TREE_CODE (lhs) == SSA_NAME
>                 && !name_registered_for_update_p (lhs))

I'll investigate.

> +             FOR_EACH_IMM_USE_ON_STMT (use, iter)
> +               propagate_value (use, gimple_assign_rhs1 (stmt));
> +
> +             fold_stmt_inplace (&use_stmt_gsi);
> +             update_stmt (use_stmt);
> 
> Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.
> 
>   if (fold_stmt_inplace (&use_stmt_gsi))
>     update_stmt (use_stmt);

OK, will adjust, thanks.
Richard Biener Sept. 13, 2012, 2:13 p.m. UTC | #3
On Thu, Sep 13, 2012 at 4:00 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Indeed somewhat simple-minded - when originally fixing a similar testcase
>> (heh ...) I improved things by improving CFG cleanup to fold some more
>> conditions by looking at SSA defs, that improved things a lot.  I also
>> thought the real fix should involve some scalar optimization on a
>> sub-range of the CFG. That should be easiest when performing the copy in
>> the first place - after all we keep copy tables and such for the purpose
>> of update-SSA so we might as well create a lattice from PHI nodes we
>> disassemble for use by copy_bb ...
>
> I also thought of similar approaches, but couldn't really come up with
> something satisfactory.
>
>> On the patch itself - can you call the simple CCP before we call
>> cleanup_tree_cfg () please?
>
> That was the original implementation (in fact, in the original implementation
> the simple CCP was conditionalized on canonicalize_loop_induction_variables,
> but you broke it a few days ago by moving the call to update_ssa :-)
>
> The problem is that, if it is moved to before cleanup_tree_cfg, then:
>  1) you have more basic blocks (15 vs 2 for the testcase),
>  2) you also need to do simple CCP for degenerate PHI nodes.

Yes - now cfg_cleanup does that (and it really shouldn't be its job).  That
was the improvement I talked about - reducing the number of BBs a lot.

> That being said, I can certainly save in a bitmap the loop fathers for which
> canonicalize_loop_induction_variables unrolled a child and do the simple CPP
> (augmented for degenerate PHI nodes) on them between the calls to update_ssa
> and cleanup_tree_cfg.

Ah, indeed ;)  Or just push struct loop of changed loops onto a stack.

>> We might get rid of that weirdo SSA lookup
>> there again then:
>>
>> static bool
>> cleanup_control_expr_graph (basic_block bb, gimple_stmt_iterator gsi)
>> {
>> ...
>>             /* For conditions try harder and lookup single-argument
>>                PHI nodes.  Only do so from the same basic-block though
>>                as other basic-blocks may be dead already.  */
>>             if (TREE_CODE (lhs) == SSA_NAME
>>                 && !name_registered_for_update_p (lhs))
>
> I'll investigate.
>
>> +             FOR_EACH_IMM_USE_ON_STMT (use, iter)
>> +               propagate_value (use, gimple_assign_rhs1 (stmt));
>> +
>> +             fold_stmt_inplace (&use_stmt_gsi);
>> +             update_stmt (use_stmt);
>>
>> Use SET_USE (use, rhs1) and cache gimple_assign_rhs1 somewhere.
>>
>>   if (fold_stmt_inplace (&use_stmt_gsi))
>>     update_stmt (use_stmt);
>
> OK, will adjust, thanks.

Thanks,
Richard.

> --
> Eric Botcazou
diff mbox

Patch

Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 191198)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -503,6 +503,48 @@  canonicalize_induction_variables (void)
   return 0;
 }
 
+/* Propagate constant SSA_NAMEs defined in basic block BB.  */
+
+static void
+propagate_constants_for_unrolling (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
+    {
+      gimple stmt = gsi_stmt (gsi);
+      tree lhs;
+
+      /* Look for assignments to SSA names with constant RHS.  */
+      if (is_gimple_assign (stmt)
+	  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
+	  && gimple_assign_rhs_code (stmt) == INTEGER_CST)
+	{
+	  imm_use_iterator iter;
+	  gimple use_stmt;
+
+	  /* Propagate the RHS into all the uses of the SSA name.  */
+	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+	    {
+	      gimple_stmt_iterator use_stmt_gsi = gsi_for_stmt (use_stmt);
+	      use_operand_p use;
+
+	      FOR_EACH_IMM_USE_ON_STMT (use, iter)
+	        propagate_value (use, gimple_assign_rhs1 (stmt));
+
+	      fold_stmt_inplace (&use_stmt_gsi);
+	      update_stmt (use_stmt);
+	    }
+
+	  /* And get rid of the now unused SSA name.  */
+	  gsi_remove (&gsi, true);
+	  release_ssa_name (lhs);
+	}
+      else
+	gsi_next (&gsi);
+    }
+}
+
 /* Unroll LOOPS completely if they iterate just few times.  Unless
    MAY_INCREASE_SIZE is true, perform the unrolling only if the
    size of the code does not increase.  */
@@ -522,6 +564,19 @@  tree_unroll_loops_completely (bool may_i
 
       FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
 	{
+	  /* If we have already unrolled, we need to propagate constants
+	     within the new basic blocks to fold away induction variable
+	     computations; otherwise, the size might blow up before the
+	     iteration is complete and the IR eventually cleaned up.  */
+	  if (iteration > 0)
+	    {
+	      unsigned i;
+	      basic_block *body = get_loop_body_in_dom_order (loop);
+	      for (i = 0; i < loop->num_nodes; i++)
+		propagate_constants_for_unrolling (body[i]);
+	      free (body);
+	    }
+
 	  if (may_increase_size && optimize_loop_for_speed_p (loop)
 	      /* Unroll outermost loops only if asked to do so or they do
 		 not cause code growth.  */
@@ -530,6 +585,7 @@  tree_unroll_loops_completely (bool may_i
 	    ul = UL_ALL;
 	  else
 	    ul = UL_NO_GROWTH;
+
 	  changed |= canonicalize_loop_induction_variables
 		       (loop, false, ul, !flag_tree_loop_ivcanon);
 	}