Patchwork Make try_unroll_loop_completely to use loop bounds recorded

login
register
mail settings
Submitter Jan Hubicka
Date Oct. 11, 2012, 4:21 p.m.
Message ID <20121011162109.GC32179@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/190954/
State New
Headers show

Comments

Jan Hubicka - Oct. 11, 2012, 4:21 p.m.
Hi,
while looking into RTL loop peeling micopmilation I found that we now do a lot of
RTL loop peeling for loops of the form

int a[2];
test(int c)
{
  int i;
  for (i=0;i<c;i++)
    a[i]=5;
}
this is because tree-ssa-loop-niter is able to prove that the loop iterates at
most twice.  This is better to be done on gimple level because destroying the
loop enables more propagation.

This patch started as simple attempt to make try_unroll_loop_completely
to use max_loop_iterations_int.  I had to track several issues

1) try_unroll_loop_completely handles only inner loops.  I am fine with not
   peeling loops with subloops, but if the loop is known to iterate 0 times,
   we should turn it into non-loop.
2) try_unroll_loop_completely now handles removal of the loop by making
   exit edge conditional to be true and relying on cleanup_cfg to do
   the job.  This does not work for all the cases we can handle
   by max_loop_iterations_int.  When loop has multiple possible exits
   we don't really know what exit will be taken (we know it will be one
   of them).

   I added logic handling simple case where loop has single exit and
   in generic case I unloop it by adding __builtin_unreachable on the
   loopback path.  While this seems bit involved it works well.
3) The last iteration has parts that are provably unreachable and should
   be optimized out later.  VRP however tends to report array bound warnings
   that breaks -O3 bootstrap.
   I sent separate fix for that.  Still I need to add 3 initializations
   to avoid maybe used uninitialized warning to get -O3 bootstrap working
   with this patch.

The patch got quite a lot of snowballing effect, so I decided to not include:

1) the cost model is skewed for last iteration.  It is often just duplicated
   loop header - i.e. all code dominated by the optmized out exit test should
   not be accounted. This would make cunroll-4.c testcase bellow to be unlooped
   during cunrolli rather than during complette unroling.
2) profile updating is broken for any loop containing non-trivial control flow.
   We need to teach loop duplication code to update sanely the profile when
   wont_exit is true and we need to update profile
3) we probably want to do less unrolling when result is not a single basic
   block. Current limit of 400 isnsns seems bit too large; making one basic
   block is important win. Making sequence of basic blocks with exits is
   less so.

	* gcc.dg/tree-ssa/cunroll-1.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-2.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-3.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-4.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-5.c: New testcase.

	* cfgloopmanip.c (unloop): Export.
	* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate
	also with unknown exit conditoinal.
	(try_unroll_loop_completely): Use max_loop_iterations_int to unroll
	also loops with low upper bound; handle unlooping of the last loop
	even when exit conditional is not known; unloop loop that do not loop
	even if they are not innermost.
	(canonicalize_loop_induction_variables): Record niter bounds known;
	try unrolling even if number of iterations is not known;
	(canonicalize_induction_variables): Be ready for loops disappearing.
	(tree_unroll_loops_completely): Likewise.
	* cfgloop.h (unloop): Declare.

	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
Richard Guenther - Oct. 12, 2012, 8:05 a.m.
On Thu, 11 Oct 2012, Jan Hubicka wrote:

> Hi,
> while looking into RTL loop peeling micopmilation I found that we now do a lot of
> RTL loop peeling for loops of the form
> 
> int a[2];
> test(int c)
> {
>   int i;
>   for (i=0;i<c;i++)
>     a[i]=5;
> }
> this is because tree-ssa-loop-niter is able to prove that the loop iterates at
> most twice.  This is better to be done on gimple level because destroying the
> loop enables more propagation.
> 
> This patch started as simple attempt to make try_unroll_loop_completely
> to use max_loop_iterations_int.  I had to track several issues
> 
> 1) try_unroll_loop_completely handles only inner loops.  I am fine with not
>    peeling loops with subloops, but if the loop is known to iterate 0 times,
>    we should turn it into non-loop.
> 2) try_unroll_loop_completely now handles removal of the loop by making
>    exit edge conditional to be true and relying on cleanup_cfg to do
>    the job.  This does not work for all the cases we can handle
>    by max_loop_iterations_int.  When loop has multiple possible exits
>    we don't really know what exit will be taken (we know it will be one
>    of them).
> 
>    I added logic handling simple case where loop has single exit and
>    in generic case I unloop it by adding __builtin_unreachable on the
>    loopback path.  While this seems bit involved it works well.
> 3) The last iteration has parts that are provably unreachable and should
>    be optimized out later.  VRP however tends to report array bound warnings
>    that breaks -O3 bootstrap.
>    I sent separate fix for that.  Still I need to add 3 initializations
>    to avoid maybe used uninitialized warning to get -O3 bootstrap working
>    with this patch.
> 
> The patch got quite a lot of snowballing effect, so I decided to not include:
> 
> 1) the cost model is skewed for last iteration.  It is often just duplicated
>    loop header - i.e. all code dominated by the optmized out exit test should
>    not be accounted. This would make cunroll-4.c testcase bellow to be unlooped
>    during cunrolli rather than during complette unroling.
> 2) profile updating is broken for any loop containing non-trivial control flow.
>    We need to teach loop duplication code to update sanely the profile when
>    wont_exit is true and we need to update profile
> 3) we probably want to do less unrolling when result is not a single basic
>    block. Current limit of 400 isnsns seems bit too large; making one basic
>    block is important win. Making sequence of basic blocks with exits is
>    less so.

Nice - this was on my TODO list as well, btw ;)

> 	* gcc.dg/tree-ssa/cunroll-1.c: New testcase.
> 	* gcc.dg/tree-ssa/cunroll-2.c: New testcase.
> 	* gcc.dg/tree-ssa/cunroll-3.c: New testcase.
> 	* gcc.dg/tree-ssa/cunroll-4.c: New testcase.
> 	* gcc.dg/tree-ssa/cunroll-5.c: New testcase.
> 
> 	* cfgloopmanip.c (unloop): Export.
> 	* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate
> 	also with unknown exit conditoinal.
> 	(try_unroll_loop_completely): Use max_loop_iterations_int to unroll
> 	also loops with low upper bound; handle unlooping of the last loop
> 	even when exit conditional is not known; unloop loop that do not loop
> 	even if they are not innermost.
> 	(canonicalize_loop_induction_variables): Record niter bounds known;
> 	try unrolling even if number of iterations is not known;
> 	(canonicalize_induction_variables): Be ready for loops disappearing.
> 	(tree_unroll_loops_completely): Likewise.
> 	* cfgloop.h (unloop): Declare.
> 
> 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.

I wonder if other languages need similar adjustment?

+  /* Now destroy the loop.  First try to do so by cancelling the
+     patch from exit conditional if we identified good candidate.
+

you mean 'path' here, right?  Similar in other places.

+      /* Unloop destroys the latch edge.  */
+      unloop (loop, &irred_invalidated);
+      if (irred_invalidated)
+       mark_irreducible_loops ();

this makes the whole thing quadratic in the number of loops.
Please pass down a flag and handle this in the place we
update SSA form (thus, once per function / unroll iteration).

Or provide a more optimial implementation that only considers
parent loops of 'loop' (even that is excessive though, quadratic
in the loop depth).

-  FOR_EACH_LOOP (li, loop, 0)
+  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)

FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)

but I'm sure that IV canonicalization should happen for all loops.
So, why do you need this change?  Esp. we should get rid of
single-iteration outer loops here, too.

-      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
+      for (fel_init (&li, &next, 0); next;)

see above - FOR_EACH_LOOP (li, loop, 0)

Otherwise looks ok.  Please update and re-post.

Thanks,
Richard.


> Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{ 
> +  int i;
> +  for (i=0;i<c;i++)
> +    a[i]=5;
> +}
> +/* Array bounds says the loop will not roll much.  */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[2];
> +test(int c)
> +{ 
> +  int i;
> +  for (i=0;i<c;i++)
> +    {
> +      a[i]=5;
> +      if (test2())
> +	return;
> +    }
> +}
> +/* We are not able to get rid of the final conditional because the loop has two exits.  */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> +int a[1];
> +test(int c)
> +{ 
> +  int i;
> +  for (i=0;i<c;i++)
> +    {
> +      a[i]=5;
> +    }
> +}
> +/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
> +
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int a[1];
> +test(int c)
> +{ 
> +  int i=0,j;
> +  for (i=0;i<c;i++)
> +    {
> +      for (j=0;j<c;j++)
> +	{
> +          a[i]=5;
> +	  test2();
> +	}
> +    }
> +}
> +
> +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> +   from the last iteration.  */
> +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +int *a;
> +test(int c)
> +{ 
> +  int i;
> +  for (i=0;i<6;i++)
> +    a[i]=5;
> +}
> +/* Basic testcase for complette unrolling.  */
> +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> Index: cfgloopmanip.c
> ===================================================================
> --- cfgloopmanip.c	(revision 192360)
> +++ cfgloopmanip.c	(working copy)
> @@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
>  static void fix_loop_placements (struct loop *, bool *);
>  static bool fix_bb_placement (basic_block);
>  static void fix_bb_placements (basic_block, bool *);
> -static void unloop (struct loop *, bool *);
>  
>  /* Checks whether basic block BB is dominated by DATA.  */
>  static bool
> @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
>     If this may cause the information about irreducible regions to become
>     invalid, IRRED_INVALIDATED is set to true.  */
>  
> -static void
> +void
>  unloop (struct loop *loop, bool *irred_invalidated)
>  {
>    basic_block *body;
> Index: tree-ssa-loop-ivcanon.c
> ===================================================================
> --- tree-ssa-loop-ivcanon.c	(revision 192360)
> +++ tree-ssa-loop-ivcanon.c	(working copy)
> @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
>  	  /* Look for reasons why we might optimize this stmt away. */
>  
>  	  /* Exit conditional.  */
> -	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> +	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
>  	    {
>  	      if (dump_file && (dump_flags & TDF_DETAILS))
>  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop 
>    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
>    gimple cond;
>    struct loop_size size;
> +  bool n_unroll_found = false;
> +  HOST_WIDE_INT maxiter;
> +  basic_block latch;
> +  edge latch_edge;
> +  location_t locus;
> +  int flags;
> +  bool irred_invalidated = false;
> +  gimple stmt;
> +  gimple_stmt_iterator gsi;
> +  edge other_edge = NULL, last_exit;
> +  int num = loop->num;
> +  bool last_iteration_updated = false;
> +
> +  /* See if we proved number of iterations to be low constant.  */
> +  if (host_integerp (niter, 1))
> +    {
> +      n_unroll = tree_low_cst (niter, 1);
> +      n_unroll_found = true;
> +    }
>  
> -  if (loop->inner)
> +  /* See if we can improve our estimate by using recorded loop bounds.  */
> +  maxiter = max_loop_iterations_int (loop);
> +  if (maxiter >= 0
> +      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> +    {
> +      n_unroll = maxiter;
> +      n_unroll_found = true;
> +    }
> +
> +  if (!n_unroll_found)
>      return false;
>  
> -  if (!host_integerp (niter, 1))
> +  /* We unroll only inner loops, because we do not consider it profitable
> +     otheriwse.  We still can cancel loopback edge of not rolling loop;
> +     this is always a good idea.  */
> +  if (loop->inner && n_unroll)
>      return false;
> -  n_unroll = tree_low_cst (niter, 1);
>  
>    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
>    if (n_unroll > max_unroll)
> @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop 
>  
>    if (n_unroll)
>      {
> +      sbitmap wont_exit;
> +      edge e;
> +      unsigned i;
> +      VEC (edge, heap) *to_remove = NULL;
>        if (ul == UL_SINGLE_ITER)
>  	return false;
>  
> @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop 
>  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
>  	  return false;
>  	}
> -    }
> -
> -  if (n_unroll)
> -    {
> -      sbitmap wont_exit;
> -      edge e;
> -      unsigned i;
> -      VEC (edge, heap) *to_remove = NULL;
>  
>        initialize_original_copy_tables ();
>        wont_exit = sbitmap_alloc (n_unroll + 1);
> @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop 
>        free_original_copy_tables ();
>      }
>  
> -  cond = last_stmt (exit->src);
> -  if (exit->flags & EDGE_TRUE_VALUE)
> -    gimple_cond_make_true (cond);
> -  else
> -    gimple_cond_make_false (cond);
> -  update_stmt (cond);
> +  /* After complette unrolling we still may get rid of the conditional
> +     on exit in the last copy even if we have no idea what it does.
> +     This is quite common case for loops of form
> +
> +     int a[5];
> +     for (i=0;i<b;i++)
> +       a[i]=0;
> +
> +     Here we prove the loop to iterate 5 times but we do not know
> +     it from induction variable.
> +
> +     We want to make the conditional of exit true, so we can only 
> +     consider exits that are not part of the inner (irreducible) loops
> +     so we know that the conditional is tested at most once per iteration.
> +
> +     Situation is more complicated withloops with multiple exists:
> +
> +     int a[5];
> +     for (i=0;i<b;i++)
> +       {
> +	 if (blah)
> +	   break;
> +	 a[i]=0;
> +       }
> +
> +     Here we could cancel the conditional of if "(blah)". And:
> +
> +     int a[5];
> +     for (i=0;i<b;i++)
> +       {
> +	 a[i]=0;
> +	 if (blah)
> +	   break;
> +       }
> + 
> +     Here we can cancel the last i<b test.
> +
> +     To handle these we need to track all statements containing code that
> +     can not execute on last iteration (in tree-ssa-loop-niter).
> +     Then we can use control dependnecy (not computed here) to cancel all
> +     the paths leading to them unless they are part of the inner loops.
> +     This however seems bit like an overkill so we handle only the
> +     simple case of single exit until interesting testcases appears.  */
> +  last_exit = exit;
> +  if (!last_exit)
> +    {
> +      last_exit = single_exit (loop);
> +      if (!last_exit || last_exit->src->loop_father != loop
> +	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> +	last_exit = NULL;
> +      else 
> +	{
> +	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> +	    last_exit = NULL;
> +	}
> +    }
> +
> +  /* If loop has only one exit edge, we can remove the conditional from
> +     the last copy of the loop.  
> +     TODO: We should account this update into cost model.  */
> +  if (last_exit)
> +    {
> +      cond = last_stmt (last_exit->src);
> +      if (last_exit->flags & EDGE_TRUE_VALUE)
> +	gimple_cond_make_true (cond);
> +      else
> +	gimple_cond_make_false (cond);
> +      update_stmt (cond);
> +      other_edge = EDGE_SUCC (last_exit->src, 0);
> +      if (other_edge == last_exit)
> +	other_edge = EDGE_SUCC (last_exit->src, 1);
> +      last_iteration_updated = true;
> +    }
> +
> +  /* Now destroy the loop.  First try to do so by cancelling the
> +     patch from exit conditional if we identified good candidate.  
> +
> +     TODO: We should update the loop profile for the fact that
> +     the last iteration no longer executes.  */
> +  if (!other_edge || !remove_path (other_edge))
> +    {
> +      /* We did not manage to cancel the loop by removing the patch.
> +         The loop latch remains reachable even if it will never be reached
> +         at runtime.  We must redirect it to somewhere, so create basic
> +         block containg __builtin_unreachable call for this reason.  */
> +      latch = loop->latch;
> +      latch_edge = loop_latch_edge (loop);
> +      flags = latch_edge->flags;
> +      locus = latch_edge->goto_locus;
> +
> +      /* Unloop destroys the latch edge.  */
> +      unloop (loop, &irred_invalidated);
> +      if (irred_invalidated)
> +	mark_irreducible_loops ();
> +
> +      /* Create new basic block for the latch edge destination and wire
> +	 it in.  */
> +      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> +      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> +      gsi = gsi_start_bb (latch_edge->dest);
> +      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +      latch_edge->dest->loop_father = current_loops->tree_root;
> +      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> +      latch_edge->probability = 0;
> +      latch_edge->count = 0;
> +      latch_edge->flags |= flags;
> +      latch_edge->goto_locus = locus;
> +    }
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
> -    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> +    {
> +      if (!n_unroll)
> +        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> +		 last_iteration_updated
> +		 ? " Last iteration exit edge was proved true." : "");
> +      else
> +	{
> +          fprintf (dump_file, "Unrolled loop %d completely "
> +		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> +		   exit
> +		   ? " Exit condition of peeled iterations was eliminated." : "",
> +		   last_iteration_updated
> +		   ? " Last iteration exit edge was proved true." : "");
> +	}
> +    }
>  
> -  return true;
> +  return true;
>  }
>  
>  /* Adds a canonical induction variable to LOOP if suitable.
>     CREATE_IV is true if we may create a new iv.  UL determines
>     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
>     to determine the number of iterations of a loop by direct evaluation.
> -   Returns true if cfg is changed.  */
> +   Returns true if cfg is changed.  
> +
> +   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> +   loops.  This is used in a special case of the exit condition analysis.  */
>  
>  static bool
>  canonicalize_loop_induction_variables (struct loop *loop,
> @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
>  	      || TREE_CODE (niter) != INTEGER_CST))
>  	niter = find_loop_niter_by_eval (loop, &exit);
>  
> -      if (chrec_contains_undetermined (niter)
> -	  || TREE_CODE (niter) != INTEGER_CST)
> -	return false;
> +      if (TREE_CODE (niter) != INTEGER_CST)
> +	exit = NULL;
>      }
>  
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> +  /* We work exceptionally hard here to estimate the bound
> +     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
> +  if (niter && TREE_CODE (niter) == INTEGER_CST)
> +    record_niter_bound (loop, tree_to_double_int (niter), false, true);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS)
> +      && TREE_CODE (niter) == INTEGER_CST)
>      {
>        fprintf (dump_file, "Loop %d iterates ", loop->num);
>        print_generic_expr (dump_file, niter, TDF_SLIM);
>        fprintf (dump_file, " times.\n");
>      }
> +  if (dump_file && (dump_flags & TDF_DETAILS)
> +      && max_loop_iterations_int (loop) >= 0)
> +    {
> +      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> +	       (int)max_loop_iterations_int (loop));
> +    }
>  
>    if (try_unroll_loop_completely (loop, exit, niter, ul))
>      return true;
>  
> -  if (create_iv)
> +  if (create_iv
> +      && niter && !chrec_contains_undetermined (niter))
>      create_canonical_iv (loop, exit, niter);
>  
>    return false;
> @@ -483,11 +642,14 @@ unsigned int
>  canonicalize_induction_variables (void)
>  {
>    loop_iterator li;
> -  struct loop *loop;
> +  struct loop *loop, *next;
>    bool changed = false;
>  
> -  FOR_EACH_LOOP (li, loop, 0)
> +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
>      {
> +      loop = next;
> +      fel_next (&li, &next);
> +      
>        changed |= canonicalize_loop_induction_variables (loop,
>  							true, UL_SINGLE_ITER,
>  							true);
> @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
>    bool changed;
>    enum unroll_level ul;
>    int iteration = 0;
> +  struct loop *next;
>  
>    do
>      {
>        changed = false;
>  
> -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> +      for (fel_init (&li, &next, 0); next;)
>  	{
> -	  struct loop *loop_father = loop_outer (loop);
> +	  struct loop *loop_father = loop_outer (next);
> +
> +	  loop = next;
> +	  fel_next (&li, &next);
>  
>  	  if (may_increase_size && optimize_loop_for_speed_p (loop)
>  	      /* Unroll outermost loops only if asked to do so or they do
> Index: fortran/f95-lang.c
> ===================================================================
> --- fortran/f95-lang.c	(revision 192360)
> +++ fortran/f95-lang.c	(working copy)
> @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
>    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
>  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
>  
> +  ftype = build_function_type_list (void_type_node, NULL_TREE);
> +
> +  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> +		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> +
>    ftype = build_function_type_list (void_type_node,
>                                      pvoid_type_node, NULL_TREE);
>    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> Index: cfgloop.h
> ===================================================================
> --- cfgloop.h	(revision 192360)
> +++ cfgloop.h	(working copy)
> @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
>  struct loop * loop_version (struct loop *, void *,
>  			    basic_block *, unsigned, unsigned, unsigned, bool);
>  extern bool remove_path (edge);
> -void scale_loop_frequencies (struct loop *, int, int);
> +extern void scale_loop_frequencies (struct loop *, int, int);
> +extern void unloop (struct loop *, bool *);
>  
>  /* Induction variable analysis.  */
>  
> 
>
Jan Hubicka - Oct. 12, 2012, 1:24 p.m.
> > 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
> 
> I wonder if other languages need similar adjustment?

I also wondered ;) Only Fortran triggered, I will take a look.
> 
> +  /* Now destroy the loop.  First try to do so by cancelling the
> +     patch from exit conditional if we identified good candidate.
> +
> 
> you mean 'path' here, right?  Similar in other places.

Yeah.
> 
> +      /* Unloop destroys the latch edge.  */
> +      unloop (loop, &irred_invalidated);
> +      if (irred_invalidated)
> +       mark_irreducible_loops ();
> 
> this makes the whole thing quadratic in the number of loops.
> Please pass down a flag and handle this in the place we
> update SSA form (thus, once per function / unroll iteration).

This was in my first version of the patch. Then I noticed that remove_path
also relies on irreducible loops to be computed and does the exactly same
logic as I do.
So we would need to extend the &irred_invalidated to remove_path as well.

> 
> Or provide a more optimial implementation that only considers
> parent loops of 'loop' (even that is excessive though, quadratic
> in the loop depth).
> 
> -  FOR_EACH_LOOP (li, loop, 0)
> +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> 
> FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> 
> but I'm sure that IV canonicalization should happen for all loops.
> So, why do you need this change?  Esp. we should get rid of
> single-iteration outer loops here, too.

FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand.  I
was running into case where we cancelled inner loop but did not update SSA and
tried to estimate number of iterations in the outer loop (now also innermost)
that went into infinite loop.

Honza
> 
> -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> +      for (fel_init (&li, &next, 0); next;)
> 
> see above - FOR_EACH_LOOP (li, loop, 0)
> 
> Otherwise looks ok.  Please update and re-post.
> 
> Thanks,
> Richard.
> 
> 
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[2];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    a[i]=5;
> > +}
> > +/* Array bounds says the loop will not roll much.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[2];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      a[i]=5;
> > +      if (test2())
> > +	return;
> > +    }
> > +}
> > +/* We are not able to get rid of the final conditional because the loop has two exits.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > +int a[1];
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      a[i]=5;
> > +    }
> > +}
> > +/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
> > +
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int a[1];
> > +test(int c)
> > +{ 
> > +  int i=0,j;
> > +  for (i=0;i<c;i++)
> > +    {
> > +      for (j=0;j<c;j++)
> > +	{
> > +          a[i]=5;
> > +	  test2();
> > +	}
> > +    }
> > +}
> > +
> > +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> > +   from the last iteration.  */
> > +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> > ===================================================================
> > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > +int *a;
> > +test(int c)
> > +{ 
> > +  int i;
> > +  for (i=0;i<6;i++)
> > +    a[i]=5;
> > +}
> > +/* Basic testcase for complette unrolling.  */
> > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > Index: cfgloopmanip.c
> > ===================================================================
> > --- cfgloopmanip.c	(revision 192360)
> > +++ cfgloopmanip.c	(working copy)
> > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
> >  static void fix_loop_placements (struct loop *, bool *);
> >  static bool fix_bb_placement (basic_block);
> >  static void fix_bb_placements (basic_block, bool *);
> > -static void unloop (struct loop *, bool *);
> >  
> >  /* Checks whether basic block BB is dominated by DATA.  */
> >  static bool
> > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> >     If this may cause the information about irreducible regions to become
> >     invalid, IRRED_INVALIDATED is set to true.  */
> >  
> > -static void
> > +void
> >  unloop (struct loop *loop, bool *irred_invalidated)
> >  {
> >    basic_block *body;
> > Index: tree-ssa-loop-ivcanon.c
> > ===================================================================
> > --- tree-ssa-loop-ivcanon.c	(revision 192360)
> > +++ tree-ssa-loop-ivcanon.c	(working copy)
> > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> >  	  /* Look for reasons why we might optimize this stmt away. */
> >  
> >  	  /* Exit conditional.  */
> > -	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> > +	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> >  	    {
> >  	      if (dump_file && (dump_flags & TDF_DETAILS))
> >  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> > @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop 
> >    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> >    gimple cond;
> >    struct loop_size size;
> > +  bool n_unroll_found = false;
> > +  HOST_WIDE_INT maxiter;
> > +  basic_block latch;
> > +  edge latch_edge;
> > +  location_t locus;
> > +  int flags;
> > +  bool irred_invalidated = false;
> > +  gimple stmt;
> > +  gimple_stmt_iterator gsi;
> > +  edge other_edge = NULL, last_exit;
> > +  int num = loop->num;
> > +  bool last_iteration_updated = false;
> > +
> > +  /* See if we proved number of iterations to be low constant.  */
> > +  if (host_integerp (niter, 1))
> > +    {
> > +      n_unroll = tree_low_cst (niter, 1);
> > +      n_unroll_found = true;
> > +    }
> >  
> > -  if (loop->inner)
> > +  /* See if we can improve our estimate by using recorded loop bounds.  */
> > +  maxiter = max_loop_iterations_int (loop);
> > +  if (maxiter >= 0
> > +      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> > +    {
> > +      n_unroll = maxiter;
> > +      n_unroll_found = true;
> > +    }
> > +
> > +  if (!n_unroll_found)
> >      return false;
> >  
> > -  if (!host_integerp (niter, 1))
> > +  /* We unroll only inner loops, because we do not consider it profitable
> > +     otheriwse.  We still can cancel loopback edge of not rolling loop;
> > +     this is always a good idea.  */
> > +  if (loop->inner && n_unroll)
> >      return false;
> > -  n_unroll = tree_low_cst (niter, 1);
> >  
> >    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> >    if (n_unroll > max_unroll)
> > @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop 
> >  
> >    if (n_unroll)
> >      {
> > +      sbitmap wont_exit;
> > +      edge e;
> > +      unsigned i;
> > +      VEC (edge, heap) *to_remove = NULL;
> >        if (ul == UL_SINGLE_ITER)
> >  	return false;
> >  
> > @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop 
> >  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> >  	  return false;
> >  	}
> > -    }
> > -
> > -  if (n_unroll)
> > -    {
> > -      sbitmap wont_exit;
> > -      edge e;
> > -      unsigned i;
> > -      VEC (edge, heap) *to_remove = NULL;
> >  
> >        initialize_original_copy_tables ();
> >        wont_exit = sbitmap_alloc (n_unroll + 1);
> > @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop 
> >        free_original_copy_tables ();
> >      }
> >  
> > -  cond = last_stmt (exit->src);
> > -  if (exit->flags & EDGE_TRUE_VALUE)
> > -    gimple_cond_make_true (cond);
> > -  else
> > -    gimple_cond_make_false (cond);
> > -  update_stmt (cond);
> > +  /* After complette unrolling we still may get rid of the conditional
> > +     on exit in the last copy even if we have no idea what it does.
> > +     This is quite common case for loops of form
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       a[i]=0;
> > +
> > +     Here we prove the loop to iterate 5 times but we do not know
> > +     it from induction variable.
> > +
> > +     We want to make the conditional of exit true, so we can only 
> > +     consider exits that are not part of the inner (irreducible) loops
> > +     so we know that the conditional is tested at most once per iteration.
> > +
> > +     Situation is more complicated withloops with multiple exists:
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       {
> > +	 if (blah)
> > +	   break;
> > +	 a[i]=0;
> > +       }
> > +
> > +     Here we could cancel the conditional of if "(blah)". And:
> > +
> > +     int a[5];
> > +     for (i=0;i<b;i++)
> > +       {
> > +	 a[i]=0;
> > +	 if (blah)
> > +	   break;
> > +       }
> > + 
> > +     Here we can cancel the last i<b test.
> > +
> > +     To handle these we need to track all statements containing code that
> > +     can not execute on last iteration (in tree-ssa-loop-niter).
> > +     Then we can use control dependnecy (not computed here) to cancel all
> > +     the paths leading to them unless they are part of the inner loops.
> > +     This however seems bit like an overkill so we handle only the
> > +     simple case of single exit until interesting testcases appears.  */
> > +  last_exit = exit;
> > +  if (!last_exit)
> > +    {
> > +      last_exit = single_exit (loop);
> > +      if (!last_exit || last_exit->src->loop_father != loop
> > +	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> > +	last_exit = NULL;
> > +      else 
> > +	{
> > +	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> > +	    last_exit = NULL;
> > +	}
> > +    }
> > +
> > +  /* If loop has only one exit edge, we can remove the conditional from
> > +     the last copy of the loop.  
> > +     TODO: We should account this update into cost model.  */
> > +  if (last_exit)
> > +    {
> > +      cond = last_stmt (last_exit->src);
> > +      if (last_exit->flags & EDGE_TRUE_VALUE)
> > +	gimple_cond_make_true (cond);
> > +      else
> > +	gimple_cond_make_false (cond);
> > +      update_stmt (cond);
> > +      other_edge = EDGE_SUCC (last_exit->src, 0);
> > +      if (other_edge == last_exit)
> > +	other_edge = EDGE_SUCC (last_exit->src, 1);
> > +      last_iteration_updated = true;
> > +    }
> > +
> > +  /* Now destroy the loop.  First try to do so by cancelling the
> > +     patch from exit conditional if we identified good candidate.  
> > +
> > +     TODO: We should update the loop profile for the fact that
> > +     the last iteration no longer executes.  */
> > +  if (!other_edge || !remove_path (other_edge))
> > +    {
> > +      /* We did not manage to cancel the loop by removing the patch.
> > +         The loop latch remains reachable even if it will never be reached
> > +         at runtime.  We must redirect it to somewhere, so create basic
> > +         block containg __builtin_unreachable call for this reason.  */
> > +      latch = loop->latch;
> > +      latch_edge = loop_latch_edge (loop);
> > +      flags = latch_edge->flags;
> > +      locus = latch_edge->goto_locus;
> > +
> > +      /* Unloop destroys the latch edge.  */
> > +      unloop (loop, &irred_invalidated);
> > +      if (irred_invalidated)
> > +	mark_irreducible_loops ();
> > +
> > +      /* Create new basic block for the latch edge destination and wire
> > +	 it in.  */
> > +      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> > +      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> > +      gsi = gsi_start_bb (latch_edge->dest);
> > +      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> > +      latch_edge->dest->loop_father = current_loops->tree_root;
> > +      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> > +      latch_edge->probability = 0;
> > +      latch_edge->count = 0;
> > +      latch_edge->flags |= flags;
> > +      latch_edge->goto_locus = locus;
> > +    }
> >  
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> > -    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> > +    {
> > +      if (!n_unroll)
> > +        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> > +		 last_iteration_updated
> > +		 ? " Last iteration exit edge was proved true." : "");
> > +      else
> > +	{
> > +          fprintf (dump_file, "Unrolled loop %d completely "
> > +		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> > +		   exit
> > +		   ? " Exit condition of peeled iterations was eliminated." : "",
> > +		   last_iteration_updated
> > +		   ? " Last iteration exit edge was proved true." : "");
> > +	}
> > +    }
> >  
> > -  return true;
> > +  return true;
> >  }
> >  
> >  /* Adds a canonical induction variable to LOOP if suitable.
> >     CREATE_IV is true if we may create a new iv.  UL determines
> >     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
> >     to determine the number of iterations of a loop by direct evaluation.
> > -   Returns true if cfg is changed.  */
> > +   Returns true if cfg is changed.  
> > +
> > +   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> > +   loops.  This is used in a special case of the exit condition analysis.  */
> >  
> >  static bool
> >  canonicalize_loop_induction_variables (struct loop *loop,
> > @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
> >  	      || TREE_CODE (niter) != INTEGER_CST))
> >  	niter = find_loop_niter_by_eval (loop, &exit);
> >  
> > -      if (chrec_contains_undetermined (niter)
> > -	  || TREE_CODE (niter) != INTEGER_CST)
> > -	return false;
> > +      if (TREE_CODE (niter) != INTEGER_CST)
> > +	exit = NULL;
> >      }
> >  
> > -  if (dump_file && (dump_flags & TDF_DETAILS))
> > +  /* We work exceptionally hard here to estimate the bound
> > +     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
> > +  if (niter && TREE_CODE (niter) == INTEGER_CST)
> > +    record_niter_bound (loop, tree_to_double_int (niter), false, true);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > +      && TREE_CODE (niter) == INTEGER_CST)
> >      {
> >        fprintf (dump_file, "Loop %d iterates ", loop->num);
> >        print_generic_expr (dump_file, niter, TDF_SLIM);
> >        fprintf (dump_file, " times.\n");
> >      }
> > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > +      && max_loop_iterations_int (loop) >= 0)
> > +    {
> > +      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> > +	       (int)max_loop_iterations_int (loop));
> > +    }
> >  
> >    if (try_unroll_loop_completely (loop, exit, niter, ul))
> >      return true;
> >  
> > -  if (create_iv)
> > +  if (create_iv
> > +      && niter && !chrec_contains_undetermined (niter))
> >      create_canonical_iv (loop, exit, niter);
> >  
> >    return false;
> > @@ -483,11 +642,14 @@ unsigned int
> >  canonicalize_induction_variables (void)
> >  {
> >    loop_iterator li;
> > -  struct loop *loop;
> > +  struct loop *loop, *next;
> >    bool changed = false;
> >  
> > -  FOR_EACH_LOOP (li, loop, 0)
> > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> >      {
> > +      loop = next;
> > +      fel_next (&li, &next);
> > +      
> >        changed |= canonicalize_loop_induction_variables (loop,
> >  							true, UL_SINGLE_ITER,
> >  							true);
> > @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
> >    bool changed;
> >    enum unroll_level ul;
> >    int iteration = 0;
> > +  struct loop *next;
> >  
> >    do
> >      {
> >        changed = false;
> >  
> > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > +      for (fel_init (&li, &next, 0); next;)
> >  	{
> > -	  struct loop *loop_father = loop_outer (loop);
> > +	  struct loop *loop_father = loop_outer (next);
> > +
> > +	  loop = next;
> > +	  fel_next (&li, &next);
> >  
> >  	  if (may_increase_size && optimize_loop_for_speed_p (loop)
> >  	      /* Unroll outermost loops only if asked to do so or they do
> > Index: fortran/f95-lang.c
> > ===================================================================
> > --- fortran/f95-lang.c	(revision 192360)
> > +++ fortran/f95-lang.c	(working copy)
> > @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
> >    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
> >  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
> >  
> > +  ftype = build_function_type_list (void_type_node, NULL_TREE);
> > +
> > +  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> > +		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> > +
> >    ftype = build_function_type_list (void_type_node,
> >                                      pvoid_type_node, NULL_TREE);
> >    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> > Index: cfgloop.h
> > ===================================================================
> > --- cfgloop.h	(revision 192360)
> > +++ cfgloop.h	(working copy)
> > @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
> >  struct loop * loop_version (struct loop *, void *,
> >  			    basic_block *, unsigned, unsigned, unsigned, bool);
> >  extern bool remove_path (edge);
> > -void scale_loop_frequencies (struct loop *, int, int);
> > +extern void scale_loop_frequencies (struct loop *, int, int);
> > +extern void unloop (struct loop *, bool *);
> >  
> >  /* Induction variable analysis.  */
> >  
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend
Richard Guenther - Oct. 12, 2012, 1:25 p.m.
On Fri, 12 Oct 2012, Jan Hubicka wrote:

> > > 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
> > 
> > I wonder if other languages need similar adjustment?
> 
> I also wondered ;) Only Fortran triggered, I will take a look.
> > 
> > +  /* Now destroy the loop.  First try to do so by cancelling the
> > +     patch from exit conditional if we identified good candidate.
> > +
> > 
> > you mean 'path' here, right?  Similar in other places.
> 
> Yeah.
> > 
> > +      /* Unloop destroys the latch edge.  */
> > +      unloop (loop, &irred_invalidated);
> > +      if (irred_invalidated)
> > +       mark_irreducible_loops ();
> > 
> > this makes the whole thing quadratic in the number of loops.
> > Please pass down a flag and handle this in the place we
> > update SSA form (thus, once per function / unroll iteration).
> 
> This was in my first version of the patch. Then I noticed that remove_path
> also relies on irreducible loops to be computed and does the exactly same
> logic as I do.
> So we would need to extend the &irred_invalidated to remove_path as well.

Well, then do that.  Or at least constrain the number of loops
that are updated in mark_irreducible_loops (though that would
still be quadratic in the loop depth then?).  We solved scalability
issues of cunroll for 4.8, so adding new ones isn't going to fly.

> > 
> > Or provide a more optimial implementation that only considers
> > parent loops of 'loop' (even that is excessive though, quadratic
> > in the loop depth).
> > 
> > -  FOR_EACH_LOOP (li, loop, 0)
> > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> > 
> > FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > 
> > but I'm sure that IV canonicalization should happen for all loops.
> > So, why do you need this change?  Esp. we should get rid of
> > single-iteration outer loops here, too.
> 
> FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand.  I
> was running into case where we cancelled inner loop but did not update SSA and
> tried to estimate number of iterations in the outer loop (now also innermost)
> that went into infinite loop.

Well, that's not true (it uses fel_* stuff), FOR_EACH_LOOP constructs
a vector of loops in the desired order it walks over.  As for estimation
you probably need to invalidate the scev cache?  The current logic
explicitely walks all innermost loops and then updates, you cannot
walk an outer loop of an inner loop that changed without updating SSA
form for example.

Richard.

> Honza
> > 
> > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > +      for (fel_init (&li, &next, 0); next;)
> > 
> > see above - FOR_EACH_LOOP (li, loop, 0)
> > 
> > Otherwise looks ok.  Please update and re-post.
> > 
> > Thanks,
> > Richard.
> > 
> > 
> > > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> > > ===================================================================
> > > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > +int a[2];
> > > +test(int c)
> > > +{ 
> > > +  int i;
> > > +  for (i=0;i<c;i++)
> > > +    a[i]=5;
> > > +}
> > > +/* Array bounds says the loop will not roll much.  */
> > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> > > ===================================================================
> > > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > +int a[2];
> > > +test(int c)
> > > +{ 
> > > +  int i;
> > > +  for (i=0;i<c;i++)
> > > +    {
> > > +      a[i]=5;
> > > +      if (test2())
> > > +	return;
> > > +    }
> > > +}
> > > +/* We are not able to get rid of the final conditional because the loop has two exits.  */
> > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> > > ===================================================================
> > > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > > @@ -0,0 +1,15 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > > +int a[1];
> > > +test(int c)
> > > +{ 
> > > +  int i;
> > > +  for (i=0;i<c;i++)
> > > +    {
> > > +      a[i]=5;
> > > +    }
> > > +}
> > > +/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
> > > +
> > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> > > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> > > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> > > ===================================================================
> > > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > +int a[1];
> > > +test(int c)
> > > +{ 
> > > +  int i=0,j;
> > > +  for (i=0;i<c;i++)
> > > +    {
> > > +      for (j=0;j<c;j++)
> > > +	{
> > > +          a[i]=5;
> > > +	  test2();
> > > +	}
> > > +    }
> > > +}
> > > +
> > > +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> > > +   from the last iteration.  */
> > > +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> > > ===================================================================
> > > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > +int *a;
> > > +test(int c)
> > > +{ 
> > > +  int i;
> > > +  for (i=0;i<6;i++)
> > > +    a[i]=5;
> > > +}
> > > +/* Basic testcase for complette unrolling.  */
> > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > Index: cfgloopmanip.c
> > > ===================================================================
> > > --- cfgloopmanip.c	(revision 192360)
> > > +++ cfgloopmanip.c	(working copy)
> > > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
> > >  static void fix_loop_placements (struct loop *, bool *);
> > >  static bool fix_bb_placement (basic_block);
> > >  static void fix_bb_placements (basic_block, bool *);
> > > -static void unloop (struct loop *, bool *);
> > >  
> > >  /* Checks whether basic block BB is dominated by DATA.  */
> > >  static bool
> > > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> > >     If this may cause the information about irreducible regions to become
> > >     invalid, IRRED_INVALIDATED is set to true.  */
> > >  
> > > -static void
> > > +void
> > >  unloop (struct loop *loop, bool *irred_invalidated)
> > >  {
> > >    basic_block *body;
> > > Index: tree-ssa-loop-ivcanon.c
> > > ===================================================================
> > > --- tree-ssa-loop-ivcanon.c	(revision 192360)
> > > +++ tree-ssa-loop-ivcanon.c	(working copy)
> > > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> > >  	  /* Look for reasons why we might optimize this stmt away. */
> > >  
> > >  	  /* Exit conditional.  */
> > > -	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> > > +	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> > >  	    {
> > >  	      if (dump_file && (dump_flags & TDF_DETAILS))
> > >  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> > > @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop 
> > >    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> > >    gimple cond;
> > >    struct loop_size size;
> > > +  bool n_unroll_found = false;
> > > +  HOST_WIDE_INT maxiter;
> > > +  basic_block latch;
> > > +  edge latch_edge;
> > > +  location_t locus;
> > > +  int flags;
> > > +  bool irred_invalidated = false;
> > > +  gimple stmt;
> > > +  gimple_stmt_iterator gsi;
> > > +  edge other_edge = NULL, last_exit;
> > > +  int num = loop->num;
> > > +  bool last_iteration_updated = false;
> > > +
> > > +  /* See if we proved number of iterations to be low constant.  */
> > > +  if (host_integerp (niter, 1))
> > > +    {
> > > +      n_unroll = tree_low_cst (niter, 1);
> > > +      n_unroll_found = true;
> > > +    }
> > >  
> > > -  if (loop->inner)
> > > +  /* See if we can improve our estimate by using recorded loop bounds.  */
> > > +  maxiter = max_loop_iterations_int (loop);
> > > +  if (maxiter >= 0
> > > +      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> > > +    {
> > > +      n_unroll = maxiter;
> > > +      n_unroll_found = true;
> > > +    }
> > > +
> > > +  if (!n_unroll_found)
> > >      return false;
> > >  
> > > -  if (!host_integerp (niter, 1))
> > > +  /* We unroll only inner loops, because we do not consider it profitable
> > > +     otheriwse.  We still can cancel loopback edge of not rolling loop;
> > > +     this is always a good idea.  */
> > > +  if (loop->inner && n_unroll)
> > >      return false;
> > > -  n_unroll = tree_low_cst (niter, 1);
> > >  
> > >    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> > >    if (n_unroll > max_unroll)
> > > @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop 
> > >  
> > >    if (n_unroll)
> > >      {
> > > +      sbitmap wont_exit;
> > > +      edge e;
> > > +      unsigned i;
> > > +      VEC (edge, heap) *to_remove = NULL;
> > >        if (ul == UL_SINGLE_ITER)
> > >  	return false;
> > >  
> > > @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop 
> > >  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> > >  	  return false;
> > >  	}
> > > -    }
> > > -
> > > -  if (n_unroll)
> > > -    {
> > > -      sbitmap wont_exit;
> > > -      edge e;
> > > -      unsigned i;
> > > -      VEC (edge, heap) *to_remove = NULL;
> > >  
> > >        initialize_original_copy_tables ();
> > >        wont_exit = sbitmap_alloc (n_unroll + 1);
> > > @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop 
> > >        free_original_copy_tables ();
> > >      }
> > >  
> > > -  cond = last_stmt (exit->src);
> > > -  if (exit->flags & EDGE_TRUE_VALUE)
> > > -    gimple_cond_make_true (cond);
> > > -  else
> > > -    gimple_cond_make_false (cond);
> > > -  update_stmt (cond);
> > > +  /* After complette unrolling we still may get rid of the conditional
> > > +     on exit in the last copy even if we have no idea what it does.
> > > +     This is quite common case for loops of form
> > > +
> > > +     int a[5];
> > > +     for (i=0;i<b;i++)
> > > +       a[i]=0;
> > > +
> > > +     Here we prove the loop to iterate 5 times but we do not know
> > > +     it from induction variable.
> > > +
> > > +     We want to make the conditional of exit true, so we can only 
> > > +     consider exits that are not part of the inner (irreducible) loops
> > > +     so we know that the conditional is tested at most once per iteration.
> > > +
> > > +     Situation is more complicated withloops with multiple exists:
> > > +
> > > +     int a[5];
> > > +     for (i=0;i<b;i++)
> > > +       {
> > > +	 if (blah)
> > > +	   break;
> > > +	 a[i]=0;
> > > +       }
> > > +
> > > +     Here we could cancel the conditional of if "(blah)". And:
> > > +
> > > +     int a[5];
> > > +     for (i=0;i<b;i++)
> > > +       {
> > > +	 a[i]=0;
> > > +	 if (blah)
> > > +	   break;
> > > +       }
> > > + 
> > > +     Here we can cancel the last i<b test.
> > > +
> > > +     To handle these we need to track all statements containing code that
> > > +     can not execute on last iteration (in tree-ssa-loop-niter).
> > > +     Then we can use control dependnecy (not computed here) to cancel all
> > > +     the paths leading to them unless they are part of the inner loops.
> > > +     This however seems bit like an overkill so we handle only the
> > > +     simple case of single exit until interesting testcases appears.  */
> > > +  last_exit = exit;
> > > +  if (!last_exit)
> > > +    {
> > > +      last_exit = single_exit (loop);
> > > +      if (!last_exit || last_exit->src->loop_father != loop
> > > +	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> > > +	last_exit = NULL;
> > > +      else 
> > > +	{
> > > +	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> > > +	    last_exit = NULL;
> > > +	}
> > > +    }
> > > +
> > > +  /* If loop has only one exit edge, we can remove the conditional from
> > > +     the last copy of the loop.  
> > > +     TODO: We should account this update into cost model.  */
> > > +  if (last_exit)
> > > +    {
> > > +      cond = last_stmt (last_exit->src);
> > > +      if (last_exit->flags & EDGE_TRUE_VALUE)
> > > +	gimple_cond_make_true (cond);
> > > +      else
> > > +	gimple_cond_make_false (cond);
> > > +      update_stmt (cond);
> > > +      other_edge = EDGE_SUCC (last_exit->src, 0);
> > > +      if (other_edge == last_exit)
> > > +	other_edge = EDGE_SUCC (last_exit->src, 1);
> > > +      last_iteration_updated = true;
> > > +    }
> > > +
> > > +  /* Now destroy the loop.  First try to do so by cancelling the
> > > +     patch from exit conditional if we identified good candidate.  
> > > +
> > > +     TODO: We should update the loop profile for the fact that
> > > +     the last iteration no longer executes.  */
> > > +  if (!other_edge || !remove_path (other_edge))
> > > +    {
> > > +      /* We did not manage to cancel the loop by removing the patch.
> > > +         The loop latch remains reachable even if it will never be reached
> > > +         at runtime.  We must redirect it to somewhere, so create basic
> > > +         block containg __builtin_unreachable call for this reason.  */
> > > +      latch = loop->latch;
> > > +      latch_edge = loop_latch_edge (loop);
> > > +      flags = latch_edge->flags;
> > > +      locus = latch_edge->goto_locus;
> > > +
> > > +      /* Unloop destroys the latch edge.  */
> > > +      unloop (loop, &irred_invalidated);
> > > +      if (irred_invalidated)
> > > +	mark_irreducible_loops ();
> > > +
> > > +      /* Create new basic block for the latch edge destination and wire
> > > +	 it in.  */
> > > +      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> > > +      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> > > +      gsi = gsi_start_bb (latch_edge->dest);
> > > +      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> > > +      latch_edge->dest->loop_father = current_loops->tree_root;
> > > +      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> > > +      latch_edge->probability = 0;
> > > +      latch_edge->count = 0;
> > > +      latch_edge->flags |= flags;
> > > +      latch_edge->goto_locus = locus;
> > > +    }
> > >  
> > >    if (dump_file && (dump_flags & TDF_DETAILS))
> > > -    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> > > +    {
> > > +      if (!n_unroll)
> > > +        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> > > +		 last_iteration_updated
> > > +		 ? " Last iteration exit edge was proved true." : "");
> > > +      else
> > > +	{
> > > +          fprintf (dump_file, "Unrolled loop %d completely "
> > > +		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> > > +		   exit
> > > +		   ? " Exit condition of peeled iterations was eliminated." : "",
> > > +		   last_iteration_updated
> > > +		   ? " Last iteration exit edge was proved true." : "");
> > > +	}
> > > +    }
> > >  
> > > -  return true;
> > > +  return true;
> > >  }
> > >  
> > >  /* Adds a canonical induction variable to LOOP if suitable.
> > >     CREATE_IV is true if we may create a new iv.  UL determines
> > >     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
> > >     to determine the number of iterations of a loop by direct evaluation.
> > > -   Returns true if cfg is changed.  */
> > > +   Returns true if cfg is changed.  
> > > +
> > > +   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> > > +   loops.  This is used in a special case of the exit condition analysis.  */
> > >  
> > >  static bool
> > >  canonicalize_loop_induction_variables (struct loop *loop,
> > > @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
> > >  	      || TREE_CODE (niter) != INTEGER_CST))
> > >  	niter = find_loop_niter_by_eval (loop, &exit);
> > >  
> > > -      if (chrec_contains_undetermined (niter)
> > > -	  || TREE_CODE (niter) != INTEGER_CST)
> > > -	return false;
> > > +      if (TREE_CODE (niter) != INTEGER_CST)
> > > +	exit = NULL;
> > >      }
> > >  
> > > -  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +  /* We work exceptionally hard here to estimate the bound
> > > +     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
> > > +  if (niter && TREE_CODE (niter) == INTEGER_CST)
> > > +    record_niter_bound (loop, tree_to_double_int (niter), false, true);
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > > +      && TREE_CODE (niter) == INTEGER_CST)
> > >      {
> > >        fprintf (dump_file, "Loop %d iterates ", loop->num);
> > >        print_generic_expr (dump_file, niter, TDF_SLIM);
> > >        fprintf (dump_file, " times.\n");
> > >      }
> > > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > > +      && max_loop_iterations_int (loop) >= 0)
> > > +    {
> > > +      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> > > +	       (int)max_loop_iterations_int (loop));
> > > +    }
> > >  
> > >    if (try_unroll_loop_completely (loop, exit, niter, ul))
> > >      return true;
> > >  
> > > -  if (create_iv)
> > > +  if (create_iv
> > > +      && niter && !chrec_contains_undetermined (niter))
> > >      create_canonical_iv (loop, exit, niter);
> > >  
> > >    return false;
> > > @@ -483,11 +642,14 @@ unsigned int
> > >  canonicalize_induction_variables (void)
> > >  {
> > >    loop_iterator li;
> > > -  struct loop *loop;
> > > +  struct loop *loop, *next;
> > >    bool changed = false;
> > >  
> > > -  FOR_EACH_LOOP (li, loop, 0)
> > > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> > >      {
> > > +      loop = next;
> > > +      fel_next (&li, &next);
> > > +      
> > >        changed |= canonicalize_loop_induction_variables (loop,
> > >  							true, UL_SINGLE_ITER,
> > >  							true);
> > > @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
> > >    bool changed;
> > >    enum unroll_level ul;
> > >    int iteration = 0;
> > > +  struct loop *next;
> > >  
> > >    do
> > >      {
> > >        changed = false;
> > >  
> > > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > > +      for (fel_init (&li, &next, 0); next;)
> > >  	{
> > > -	  struct loop *loop_father = loop_outer (loop);
> > > +	  struct loop *loop_father = loop_outer (next);
> > > +
> > > +	  loop = next;
> > > +	  fel_next (&li, &next);
> > >  
> > >  	  if (may_increase_size && optimize_loop_for_speed_p (loop)
> > >  	      /* Unroll outermost loops only if asked to do so or they do
> > > Index: fortran/f95-lang.c
> > > ===================================================================
> > > --- fortran/f95-lang.c	(revision 192360)
> > > +++ fortran/f95-lang.c	(working copy)
> > > @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
> > >    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
> > >  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
> > >  
> > > +  ftype = build_function_type_list (void_type_node, NULL_TREE);
> > > +
> > > +  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> > > +		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> > > +
> > >    ftype = build_function_type_list (void_type_node,
> > >                                      pvoid_type_node, NULL_TREE);
> > >    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> > > Index: cfgloop.h
> > > ===================================================================
> > > --- cfgloop.h	(revision 192360)
> > > +++ cfgloop.h	(working copy)
> > > @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
> > >  struct loop * loop_version (struct loop *, void *,
> > >  			    basic_block *, unsigned, unsigned, unsigned, bool);
> > >  extern bool remove_path (edge);
> > > -void scale_loop_frequencies (struct loop *, int, int);
> > > +extern void scale_loop_frequencies (struct loop *, int, int);
> > > +extern void unloop (struct loop *, bool *);
> > >  
> > >  /* Induction variable analysis.  */
> > >  
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE / SUSE Labs
> > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> > GF: Jeff Hawn, Jennifer Guild, Felix Imend
> 
>
Jan Hubicka - Oct. 12, 2012, 2:21 p.m.
> > > 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
> > 
> > I wonder if other languages need similar adjustment?
> 
> I also wondered ;) Only Fortran triggered, I will take a look.
> > 
> > +  /* Now destroy the loop.  First try to do so by cancelling the
> > +     patch from exit conditional if we identified good candidate.
> > +
> > 
> > you mean 'path' here, right?  Similar in other places.
> 
> Yeah.
> > 
> > +      /* Unloop destroys the latch edge.  */
> > +      unloop (loop, &irred_invalidated);
> > +      if (irred_invalidated)
> > +       mark_irreducible_loops ();
> > 
> > this makes the whole thing quadratic in the number of loops.
> > Please pass down a flag and handle this in the place we
> > update SSA form (thus, once per function / unroll iteration).
> 
> This was in my first version of the patch. Then I noticed that remove_path
> also relies on irreducible loops to be computed and does the exactly same
> logic as I do.
> So we would need to extend the &irred_invalidated to remove_path as well.

Actually thinking about it, I perhaps can just split the loopback edge and
output there the unreachable call letting cfgcleanup to do all the hard work.
In a way I like it better when the complette unroller cancels the loops
and has chance to cascade, but the cascading needs SSA update and cleanup of
unneeded PHI nodes anyway.

Honza
Jan Hubicka - Oct. 13, 2012, 8:59 p.m.
> On Fri, 12 Oct 2012, Jan Hubicka wrote:
> 
> > > > 	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
> > > 
> > > I wonder if other languages need similar adjustment?
> > 
> > I also wondered ;) Only Fortran triggered, I will take a look.
> > > 
> > > +  /* Now destroy the loop.  First try to do so by cancelling the
> > > +     patch from exit conditional if we identified good candidate.
> > > +
> > > 
> > > you mean 'path' here, right?  Similar in other places.
> > 
> > Yeah.
> > > 
> > > +      /* Unloop destroys the latch edge.  */
> > > +      unloop (loop, &irred_invalidated);
> > > +      if (irred_invalidated)
> > > +       mark_irreducible_loops ();
> > > 
> > > this makes the whole thing quadratic in the number of loops.
> > > Please pass down a flag and handle this in the place we
> > > update SSA form (thus, once per function / unroll iteration).
> > 
> > This was in my first version of the patch. Then I noticed that remove_path
> > also relies on irreducible loops to be computed and does the exactly same
> > logic as I do.
> > So we would need to extend the &irred_invalidated to remove_path as well.
> 
> Well, then do that.  Or at least constrain the number of loops
> that are updated in mark_irreducible_loops (though that would
> still be quadratic in the loop depth then?).  We solved scalability
> issues of cunroll for 4.8, so adding new ones isn't going to fly.

Well, limiting number of mark_irreducible_loops updates is probably quite easy
to do.  unloop is returning true just in quite corner cases and actually
we do have this quadratic complexity issue in the RTL unroller forever.
> 
> > > 
> > > Or provide a more optimial implementation that only considers
> > > parent loops of 'loop' (even that is excessive though, quadratic
> > > in the loop depth).
> > > 
> > > -  FOR_EACH_LOOP (li, loop, 0)
> > > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> > > 
> > > FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > > 
> > > but I'm sure that IV canonicalization should happen for all loops.
> > > So, why do you need this change?  Esp. we should get rid of
> > > single-iteration outer loops here, too.
> > 
> > FOR_EACH_LOOP seems to assume that loops are not cancelled under its hand.  I
> > was running into case where we cancelled inner loop but did not update SSA and
> > tried to estimate number of iterations in the outer loop (now also innermost)
> > that went into infinite loop.
> 
> Well, that's not true (it uses fel_* stuff), FOR_EACH_LOOP constructs
> a vector of loops in the desired order it walks over.  As for estimation

I go for next loop before possibly cancelling it, while FOR_EACH_LOOP will
go from possibly cancelled loop to next.

> you probably need to invalidate the scev cache?  The current logic

I do not think the _by_eval function uses much of scev cache.  Here it was
simply walking PHI from infinite loop that was artefact of earlier update.

> explicitely walks all innermost loops and then updates, you cannot
> walk an outer loop of an inner loop that changed without updating SSA
> form for example.

I think the fel terators will do it when you are cancelling the loops on the
go.  I will dig out into this more.

Honza
> 
> Richard.
> 
> > Honza
> > > 
> > > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > > +      for (fel_init (&li, &next, 0); next;)
> > > 
> > > see above - FOR_EACH_LOOP (li, loop, 0)
> > > 
> > > Otherwise looks ok.  Please update and re-post.
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 
> > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
> > > > ===================================================================
> > > > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > > +int a[2];
> > > > +test(int c)
> > > > +{ 
> > > > +  int i;
> > > > +  for (i=0;i<c;i++)
> > > > +    a[i]=5;
> > > > +}
> > > > +/* Array bounds says the loop will not roll much.  */
> > > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
> > > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> > > > ===================================================================
> > > > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
> > > > @@ -0,0 +1,16 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > > +int a[2];
> > > > +test(int c)
> > > > +{ 
> > > > +  int i;
> > > > +  for (i=0;i<c;i++)
> > > > +    {
> > > > +      a[i]=5;
> > > > +      if (test2())
> > > > +	return;
> > > > +    }
> > > > +}
> > > > +/* We are not able to get rid of the final conditional because the loop has two exits.  */
> > > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
> > > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
> > > > ===================================================================
> > > > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
> > > > @@ -0,0 +1,15 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > > > +int a[1];
> > > > +test(int c)
> > > > +{ 
> > > > +  int i;
> > > > +  for (i=0;i<c;i++)
> > > > +    {
> > > > +      a[i]=5;
> > > > +    }
> > > > +}
> > > > +/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
> > > > +
> > > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
> > > > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */
> > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
> > > > ===================================================================
> > > > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > > +int a[1];
> > > > +test(int c)
> > > > +{ 
> > > > +  int i=0,j;
> > > > +  for (i=0;i<c;i++)
> > > > +    {
> > > > +      for (j=0;j<c;j++)
> > > > +	{
> > > > +          a[i]=5;
> > > > +	  test2();
> > > > +	}
> > > > +    }
> > > > +}
> > > > +
> > > > +/* We should do this as part of cunrolli, but our cost model do not take into account early exit
> > > > +   from the last iteration.  */
> > > > +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
> > > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
> > > > ===================================================================
> > > > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > > > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
> > > > @@ -0,0 +1,12 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> > > > +int *a;
> > > > +test(int c)
> > > > +{ 
> > > > +  int i;
> > > > +  for (i=0;i<6;i++)
> > > > +    a[i]=5;
> > > > +}
> > > > +/* Basic testcase for complette unrolling.  */
> > > > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
> > > > +/* { dg-final { cleanup-tree-dump "cunroll" } } */
> > > > Index: cfgloopmanip.c
> > > > ===================================================================
> > > > --- cfgloopmanip.c	(revision 192360)
> > > > +++ cfgloopmanip.c	(working copy)
> > > > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
> > > >  static void fix_loop_placements (struct loop *, bool *);
> > > >  static bool fix_bb_placement (basic_block);
> > > >  static void fix_bb_placements (basic_block, bool *);
> > > > -static void unloop (struct loop *, bool *);
> > > >  
> > > >  /* Checks whether basic block BB is dominated by DATA.  */
> > > >  static bool
> > > > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
> > > >     If this may cause the information about irreducible regions to become
> > > >     invalid, IRRED_INVALIDATED is set to true.  */
> > > >  
> > > > -static void
> > > > +void
> > > >  unloop (struct loop *loop, bool *irred_invalidated)
> > > >  {
> > > >    basic_block *body;
> > > > Index: tree-ssa-loop-ivcanon.c
> > > > ===================================================================
> > > > --- tree-ssa-loop-ivcanon.c	(revision 192360)
> > > > +++ tree-ssa-loop-ivcanon.c	(working copy)
> > > > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
> > > >  	  /* Look for reasons why we might optimize this stmt away. */
> > > >  
> > > >  	  /* Exit conditional.  */
> > > > -	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
> > > > +	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
> > > >  	    {
> > > >  	      if (dump_file && (dump_flags & TDF_DETAILS))
> > > >  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
> > > > @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop 
> > > >    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
> > > >    gimple cond;
> > > >    struct loop_size size;
> > > > +  bool n_unroll_found = false;
> > > > +  HOST_WIDE_INT maxiter;
> > > > +  basic_block latch;
> > > > +  edge latch_edge;
> > > > +  location_t locus;
> > > > +  int flags;
> > > > +  bool irred_invalidated = false;
> > > > +  gimple stmt;
> > > > +  gimple_stmt_iterator gsi;
> > > > +  edge other_edge = NULL, last_exit;
> > > > +  int num = loop->num;
> > > > +  bool last_iteration_updated = false;
> > > > +
> > > > +  /* See if we proved number of iterations to be low constant.  */
> > > > +  if (host_integerp (niter, 1))
> > > > +    {
> > > > +      n_unroll = tree_low_cst (niter, 1);
> > > > +      n_unroll_found = true;
> > > > +    }
> > > >  
> > > > -  if (loop->inner)
> > > > +  /* See if we can improve our estimate by using recorded loop bounds.  */
> > > > +  maxiter = max_loop_iterations_int (loop);
> > > > +  if (maxiter >= 0
> > > > +      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
> > > > +    {
> > > > +      n_unroll = maxiter;
> > > > +      n_unroll_found = true;
> > > > +    }
> > > > +
> > > > +  if (!n_unroll_found)
> > > >      return false;
> > > >  
> > > > -  if (!host_integerp (niter, 1))
> > > > +  /* We unroll only inner loops, because we do not consider it profitable
> > > > +     otheriwse.  We still can cancel loopback edge of not rolling loop;
> > > > +     this is always a good idea.  */
> > > > +  if (loop->inner && n_unroll)
> > > >      return false;
> > > > -  n_unroll = tree_low_cst (niter, 1);
> > > >  
> > > >    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
> > > >    if (n_unroll > max_unroll)
> > > > @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop 
> > > >  
> > > >    if (n_unroll)
> > > >      {
> > > > +      sbitmap wont_exit;
> > > > +      edge e;
> > > > +      unsigned i;
> > > > +      VEC (edge, heap) *to_remove = NULL;
> > > >        if (ul == UL_SINGLE_ITER)
> > > >  	return false;
> > > >  
> > > > @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop 
> > > >  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
> > > >  	  return false;
> > > >  	}
> > > > -    }
> > > > -
> > > > -  if (n_unroll)
> > > > -    {
> > > > -      sbitmap wont_exit;
> > > > -      edge e;
> > > > -      unsigned i;
> > > > -      VEC (edge, heap) *to_remove = NULL;
> > > >  
> > > >        initialize_original_copy_tables ();
> > > >        wont_exit = sbitmap_alloc (n_unroll + 1);
> > > > @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop 
> > > >        free_original_copy_tables ();
> > > >      }
> > > >  
> > > > -  cond = last_stmt (exit->src);
> > > > -  if (exit->flags & EDGE_TRUE_VALUE)
> > > > -    gimple_cond_make_true (cond);
> > > > -  else
> > > > -    gimple_cond_make_false (cond);
> > > > -  update_stmt (cond);
> > > > +  /* After complette unrolling we still may get rid of the conditional
> > > > +     on exit in the last copy even if we have no idea what it does.
> > > > +     This is quite common case for loops of form
> > > > +
> > > > +     int a[5];
> > > > +     for (i=0;i<b;i++)
> > > > +       a[i]=0;
> > > > +
> > > > +     Here we prove the loop to iterate 5 times but we do not know
> > > > +     it from induction variable.
> > > > +
> > > > +     We want to make the conditional of exit true, so we can only 
> > > > +     consider exits that are not part of the inner (irreducible) loops
> > > > +     so we know that the conditional is tested at most once per iteration.
> > > > +
> > > > +     Situation is more complicated withloops with multiple exists:
> > > > +
> > > > +     int a[5];
> > > > +     for (i=0;i<b;i++)
> > > > +       {
> > > > +	 if (blah)
> > > > +	   break;
> > > > +	 a[i]=0;
> > > > +       }
> > > > +
> > > > +     Here we could cancel the conditional of if "(blah)". And:
> > > > +
> > > > +     int a[5];
> > > > +     for (i=0;i<b;i++)
> > > > +       {
> > > > +	 a[i]=0;
> > > > +	 if (blah)
> > > > +	   break;
> > > > +       }
> > > > + 
> > > > +     Here we can cancel the last i<b test.
> > > > +
> > > > +     To handle these we need to track all statements containing code that
> > > > +     can not execute on last iteration (in tree-ssa-loop-niter).
> > > > +     Then we can use control dependnecy (not computed here) to cancel all
> > > > +     the paths leading to them unless they are part of the inner loops.
> > > > +     This however seems bit like an overkill so we handle only the
> > > > +     simple case of single exit until interesting testcases appears.  */
> > > > +  last_exit = exit;
> > > > +  if (!last_exit)
> > > > +    {
> > > > +      last_exit = single_exit (loop);
> > > > +      if (!last_exit || last_exit->src->loop_father != loop
> > > > +	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
> > > > +	last_exit = NULL;
> > > > +      else 
> > > > +	{
> > > > +	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
> > > > +	    last_exit = NULL;
> > > > +	}
> > > > +    }
> > > > +
> > > > +  /* If loop has only one exit edge, we can remove the conditional from
> > > > +     the last copy of the loop.  
> > > > +     TODO: We should account this update into cost model.  */
> > > > +  if (last_exit)
> > > > +    {
> > > > +      cond = last_stmt (last_exit->src);
> > > > +      if (last_exit->flags & EDGE_TRUE_VALUE)
> > > > +	gimple_cond_make_true (cond);
> > > > +      else
> > > > +	gimple_cond_make_false (cond);
> > > > +      update_stmt (cond);
> > > > +      other_edge = EDGE_SUCC (last_exit->src, 0);
> > > > +      if (other_edge == last_exit)
> > > > +	other_edge = EDGE_SUCC (last_exit->src, 1);
> > > > +      last_iteration_updated = true;
> > > > +    }
> > > > +
> > > > +  /* Now destroy the loop.  First try to do so by cancelling the
> > > > +     patch from exit conditional if we identified good candidate.  
> > > > +
> > > > +     TODO: We should update the loop profile for the fact that
> > > > +     the last iteration no longer executes.  */
> > > > +  if (!other_edge || !remove_path (other_edge))
> > > > +    {
> > > > +      /* We did not manage to cancel the loop by removing the patch.
> > > > +         The loop latch remains reachable even if it will never be reached
> > > > +         at runtime.  We must redirect it to somewhere, so create basic
> > > > +         block containg __builtin_unreachable call for this reason.  */
> > > > +      latch = loop->latch;
> > > > +      latch_edge = loop_latch_edge (loop);
> > > > +      flags = latch_edge->flags;
> > > > +      locus = latch_edge->goto_locus;
> > > > +
> > > > +      /* Unloop destroys the latch edge.  */
> > > > +      unloop (loop, &irred_invalidated);
> > > > +      if (irred_invalidated)
> > > > +	mark_irreducible_loops ();
> > > > +
> > > > +      /* Create new basic block for the latch edge destination and wire
> > > > +	 it in.  */
> > > > +      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
> > > > +      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
> > > > +      gsi = gsi_start_bb (latch_edge->dest);
> > > > +      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> > > > +      latch_edge->dest->loop_father = current_loops->tree_root;
> > > > +      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
> > > > +      latch_edge->probability = 0;
> > > > +      latch_edge->count = 0;
> > > > +      latch_edge->flags |= flags;
> > > > +      latch_edge->goto_locus = locus;
> > > > +    }
> > > >  
> > > >    if (dump_file && (dump_flags & TDF_DETAILS))
> > > > -    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
> > > > +    {
> > > > +      if (!n_unroll)
> > > > +        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
> > > > +		 last_iteration_updated
> > > > +		 ? " Last iteration exit edge was proved true." : "");
> > > > +      else
> > > > +	{
> > > > +          fprintf (dump_file, "Unrolled loop %d completely "
> > > > +		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
> > > > +		   exit
> > > > +		   ? " Exit condition of peeled iterations was eliminated." : "",
> > > > +		   last_iteration_updated
> > > > +		   ? " Last iteration exit edge was proved true." : "");
> > > > +	}
> > > > +    }
> > > >  
> > > > -  return true;
> > > > +  return true;
> > > >  }
> > > >  
> > > >  /* Adds a canonical induction variable to LOOP if suitable.
> > > >     CREATE_IV is true if we may create a new iv.  UL determines
> > > >     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
> > > >     to determine the number of iterations of a loop by direct evaluation.
> > > > -   Returns true if cfg is changed.  */
> > > > +   Returns true if cfg is changed.  
> > > > +
> > > > +   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
> > > > +   loops.  This is used in a special case of the exit condition analysis.  */
> > > >  
> > > >  static bool
> > > >  canonicalize_loop_induction_variables (struct loop *loop,
> > > > @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s
> > > >  	      || TREE_CODE (niter) != INTEGER_CST))
> > > >  	niter = find_loop_niter_by_eval (loop, &exit);
> > > >  
> > > > -      if (chrec_contains_undetermined (niter)
> > > > -	  || TREE_CODE (niter) != INTEGER_CST)
> > > > -	return false;
> > > > +      if (TREE_CODE (niter) != INTEGER_CST)
> > > > +	exit = NULL;
> > > >      }
> > > >  
> > > > -  if (dump_file && (dump_flags & TDF_DETAILS))
> > > > +  /* We work exceptionally hard here to estimate the bound
> > > > +     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
> > > > +  if (niter && TREE_CODE (niter) == INTEGER_CST)
> > > > +    record_niter_bound (loop, tree_to_double_int (niter), false, true);
> > > > +
> > > > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > > > +      && TREE_CODE (niter) == INTEGER_CST)
> > > >      {
> > > >        fprintf (dump_file, "Loop %d iterates ", loop->num);
> > > >        print_generic_expr (dump_file, niter, TDF_SLIM);
> > > >        fprintf (dump_file, " times.\n");
> > > >      }
> > > > +  if (dump_file && (dump_flags & TDF_DETAILS)
> > > > +      && max_loop_iterations_int (loop) >= 0)
> > > > +    {
> > > > +      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
> > > > +	       (int)max_loop_iterations_int (loop));
> > > > +    }
> > > >  
> > > >    if (try_unroll_loop_completely (loop, exit, niter, ul))
> > > >      return true;
> > > >  
> > > > -  if (create_iv)
> > > > +  if (create_iv
> > > > +      && niter && !chrec_contains_undetermined (niter))
> > > >      create_canonical_iv (loop, exit, niter);
> > > >  
> > > >    return false;
> > > > @@ -483,11 +642,14 @@ unsigned int
> > > >  canonicalize_induction_variables (void)
> > > >  {
> > > >    loop_iterator li;
> > > > -  struct loop *loop;
> > > > +  struct loop *loop, *next;
> > > >    bool changed = false;
> > > >  
> > > > -  FOR_EACH_LOOP (li, loop, 0)
> > > > +  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
> > > >      {
> > > > +      loop = next;
> > > > +      fel_next (&li, &next);
> > > > +      
> > > >        changed |= canonicalize_loop_induction_variables (loop,
> > > >  							true, UL_SINGLE_ITER,
> > > >  							true);
> > > > @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i
> > > >    bool changed;
> > > >    enum unroll_level ul;
> > > >    int iteration = 0;
> > > > +  struct loop *next;
> > > >  
> > > >    do
> > > >      {
> > > >        changed = false;
> > > >  
> > > > -      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
> > > > +      for (fel_init (&li, &next, 0); next;)
> > > >  	{
> > > > -	  struct loop *loop_father = loop_outer (loop);
> > > > +	  struct loop *loop_father = loop_outer (next);
> > > > +
> > > > +	  loop = next;
> > > > +	  fel_next (&li, &next);
> > > >  
> > > >  	  if (may_increase_size && optimize_loop_for_speed_p (loop)
> > > >  	      /* Unroll outermost loops only if asked to do so or they do
> > > > Index: fortran/f95-lang.c
> > > > ===================================================================
> > > > --- fortran/f95-lang.c	(revision 192360)
> > > > +++ fortran/f95-lang.c	(working copy)
> > > > @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void)
> > > >    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
> > > >  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
> > > >  
> > > > +  ftype = build_function_type_list (void_type_node, NULL_TREE);
> > > > +
> > > > +  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
> > > > +		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
> > > > +
> > > >    ftype = build_function_type_list (void_type_node,
> > > >                                      pvoid_type_node, NULL_TREE);
> > > >    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
> > > > Index: cfgloop.h
> > > > ===================================================================
> > > > --- cfgloop.h	(revision 192360)
> > > > +++ cfgloop.h	(working copy)
> > > > @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge,
> > > >  struct loop * loop_version (struct loop *, void *,
> > > >  			    basic_block *, unsigned, unsigned, unsigned, bool);
> > > >  extern bool remove_path (edge);
> > > > -void scale_loop_frequencies (struct loop *, int, int);
> > > > +extern void scale_loop_frequencies (struct loop *, int, int);
> > > > +extern void unloop (struct loop *, bool *);
> > > >  
> > > >  /* Induction variable analysis.  */
> > > >  
> > > > 
> > > > 
> > > 
> > > -- 
> > > Richard Biener <rguenther@suse.de>
> > > SUSE / SUSE Labs
> > > SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> > > GF: Jeff Hawn, Jennifer Guild, Felix Imend
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend
Jan Hubicka - Oct. 14, 2012, 7:43 p.m.
Hi,
here is an updated patch.  The idea of splitting loopback edge did not fly.  We
then remove the edge in cfgcleanup prior demolyshing the loop and we loose
track on what basic blocks needs updating because we no longer can get the loop
body.

As a good news however I do not need the changed loop depth walking.  The infinite
recursion I was running into before disappeared. I guess it was another bug I fixed
later properly.

I also looked into what unroll and loop_depth is doing and it is using IRREDUCIBLE
flags only to set the irred_invalidated flag.  Also the use of IRREDUCIBLE flag
within the unrolling itself (to locate the last exit of the loop) is safe WRT updates
we do, so we only need to recompute it when done after all the changes.  This solve
the quadratic time issue.

The pass also works when canonicalization is done on all loops, not just innermost
but I would also like to enable this separately of this change.

I also updated Java and Fortran for the builtin_unreachable macro.  Those are
the only constructing builtin_expect that is also used internaly.  I also
noticed that the builtin is missing CONST flag (it is looping const that is
possible to decare by combination of const and noreturn) but I will do that
incrementally.
I am honestly not sure what Ada and Go does here to get around to duplicate
all this mess, but they don't seem to handle other similar cases either.

The patch now adds a regression on Fortran testcase that simplifies into:
! { dg-do run }
! Program to check corner cases for DO statements.
program do_1
  implicit none
  integer i, j

  ! limit=HUGE(i), step 1
  j = 0
  do i = HUGE(i) - 10, HUGE(i), 1
    j = j + 1
  end do
  if (j .ne. 11) call abort

end program

here loop iterates into INT_MAX and compiles as:
  <bb 3>:
  # i_8 = PHI <2147483637(2), i_9(3)>
  # j_6 = PHI <0(2), j_7(3)>
  # DEBUG j => j_6
  # DEBUG i => i_8
  j_7 = j_6 + 1;
  # DEBUG j => j_7
  i_9 = i_8 + 1;
  # DEBUG i => i_9
  if (i_8 == 2147483647)
    goto <bb 4>;
  else
    goto <bb 3>;

Now we try to estimate number of iterations as:
Statement i_9 = i_8 + 1;
 is executed at most 9 (bounded by 9) + 1 times in loop 1.
Loop 1 iterates at most 9 times.

This is one iteration fewer than it ought to be.  The problem is that result of
i_9=i_8+1 is undefined on the last iteration but program is still valid because
the value is not used (it is used only by the PHI on i_8).  So this seems like
another semi-latent bug in tree-ssa-niter.  Any ideas what to do here?  I think
we need to prove that the value is used in something that matters: i.e. loop
exit test or memory access and only bound number of executions of statements
using them.

The patch will also need upating in 
 gcc.target/i386/l_fma_* 
testcases.  The reason is that we peel the vectorized prologues/epilogues that
was in fact motivation for this whole patch.  The testcases counts number of
instructions appearing in them and needs compensation for different cost
models of the patch, so I plan to do it for the final version only.

Bootstrapped/regtested x86_64-linux (modulo the regressions above) and also
tested with -O3 bootstrap that passes with -Wno-error.

Honza

	* gcc.dg/tree-ssa/cunroll-1.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-2.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-3.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-4.c: New testcase.
	* gcc.dg/tree-ssa/cunroll-5.c: New testcase.

	* cfgloopmanip.c (unloop): Export.
	* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate
	also with unknown exit conditional.
	(try_unroll_loop_completely): Use max_loop_iterations_int to unroll
	also loops with low upper bound; handle unlooping of the last loop
	even when exit conditional is not known; unloop loop that do not loop
	even if they are not innermost.
	(canonicalize_loop_induction_variables): Record niter bounds known;
	try unrolling even if number of iterations is not known;
	(canonicalize_induction_variables): Handle updating of irreducible loops
	(tree_unroll_loops_completely): Likewise.
	* cfgloop.h (unloop): Declare.

	* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
Index: java/builtins.c
===================================================================
*** java/builtins.c	(revision 192432)
--- java/builtins.c	(working copy)
*************** VMSupportsCS8_builtin (tree method_retur
*** 453,458 ****
--- 453,460 ----
  
  #define BUILTIN_NOTHROW 1
  #define BUILTIN_CONST 2
+ #define BUILTIN_NORETURN 4
+ #define BUILTIN_LEAF 4
  /* Define a single builtin.  */
  static void
  define_builtin (enum built_in_function val,
*************** define_builtin (enum built_in_function v
*** 475,480 ****
--- 477,487 ----
      TREE_NOTHROW (decl) = 1;
    if (flags & BUILTIN_CONST)
      TREE_READONLY (decl) = 1;
+   if (flags & BUILTIN_NORETURN)
+     TREE_THIS_VOLATILE (decl) = 1;
+   if (flags & BUILTIN_LEAF)
+     DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("leaf"),
+ 					NULL, DECL_ATTRIBUTES (decl));
  
    set_builtin_decl (val, decl, true);
  }
*************** initialize_builtins (void)
*** 488,493 ****
--- 495,501 ----
    tree double_ftype_double, double_ftype_double_double;
    tree float_ftype_float_float;
    tree boolean_ftype_boolean_boolean;
+   tree void_ftype;
    int i;
  
    for (i = 0; java_builtins[i].builtin_code != END_BUILTINS; ++i)
*************** initialize_builtins (void)
*** 564,569 ****
--- 572,583 ----
  		  boolean_ftype_boolean_boolean,
  		  "__builtin_expect",
  		  BUILTIN_CONST | BUILTIN_NOTHROW);
+   void_ftype
+     = build_function_type_list (void_type_node, NULL_TREE);
+   define_builtin (BUILT_IN_UNREACHABLE, "__builtin_unreachable", 
+ 		  void_ftype,
+ 		  "__builtin_unreachable",
+ 		  BUILTIN_NOTHROW | BUILTIN_NORETURN | BUILTIN_LEAF);
    define_builtin (BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4, 
  		  "__sync_bool_compare_and_swap_4",
  		  build_function_type_list (boolean_type_node,
Index: cfgloopmanip.c
===================================================================
*** cfgloopmanip.c	(revision 192432)
--- cfgloopmanip.c	(working copy)
*************** static int find_path (edge, basic_block 
*** 37,43 ****
  static void fix_loop_placements (struct loop *, bool *);
  static bool fix_bb_placement (basic_block);
  static void fix_bb_placements (basic_block, bool *);
- static void unloop (struct loop *, bool *);
  
  /* Checks whether basic block BB is dominated by DATA.  */
  static bool
--- 37,42 ----
*************** loopify (edge latch_edge, edge header_ed
*** 895,901 ****
     If this may cause the information about irreducible regions to become
     invalid, IRRED_INVALIDATED is set to true.  */
  
! static void
  unloop (struct loop *loop, bool *irred_invalidated)
  {
    basic_block *body;
--- 894,900 ----
     If this may cause the information about irreducible regions to become
     invalid, IRRED_INVALIDATED is set to true.  */
  
! void
  unloop (struct loop *loop, bool *irred_invalidated)
  {
    basic_block *body;
Index: tree-ssa-loop-ivcanon.c
===================================================================
*** tree-ssa-loop-ivcanon.c	(revision 192432)
--- tree-ssa-loop-ivcanon.c	(working copy)
*************** tree_estimate_loop_size (struct loop *lo
*** 231,237 ****
  	  /* Look for reasons why we might optimize this stmt away. */
  
  	  /* Exit conditional.  */
! 	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
  	    {
  	      if (dump_file && (dump_flags & TDF_DETAILS))
  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
--- 231,237 ----
  	  /* Look for reasons why we might optimize this stmt away. */
  
  	  /* Exit conditional.  */
! 	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
  	    {
  	      if (dump_file && (dump_flags & TDF_DETAILS))
  	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
*************** estimated_unrolled_size (struct loop_siz
*** 316,338 ****
  
  /* Tries to unroll LOOP completely, i.e. NITER times.
     UL determines which loops we are allowed to unroll.
!    EXIT is the exit of the loop that should be eliminated.  */
  
  static bool
  try_unroll_loop_completely (struct loop *loop,
  			    edge exit, tree niter,
! 			    enum unroll_level ul)
  {
    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
    gimple cond;
    struct loop_size size;
  
!   if (loop->inner)
      return false;
  
!   if (!host_integerp (niter, 1))
      return false;
-   n_unroll = tree_low_cst (niter, 1);
  
    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
    if (n_unroll > max_unroll)
--- 316,375 ----
  
  /* Tries to unroll LOOP completely, i.e. NITER times.
     UL determines which loops we are allowed to unroll.
!    EXIT is the exit of the loop that should be eliminated.  
!    IRRED_INVALIDATED is used to bookkeep if information about
!    irreducible regions may become invalid as a result
!    of the transformation.  */
  
  static bool
  try_unroll_loop_completely (struct loop *loop,
  			    edge exit, tree niter,
! 			    enum unroll_level ul,
! 			    bool *irred_invalidated)
  {
    unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
    gimple cond;
    struct loop_size size;
+   bool n_unroll_found = false;
+   HOST_WIDE_INT maxiter;
+   basic_block latch;
+   edge latch_edge;
+   location_t locus;
+   int flags;
+   gimple stmt;
+   gimple_stmt_iterator gsi;
+   edge other_edge = NULL, last_exit = exit;
+   int num = loop->num;
+   bool last_iteration_updated = false;
+ 
+   /* See if we proved number of iterations to be low constant.  */
+   if (host_integerp (niter, 1))
+     {
+       n_unroll = tree_low_cst (niter, 1);
+       n_unroll_found = true;
+       last_exit = exit;
+     }
+   else
+     exit = NULL;
+ 
+   /* See if we can improve our estimate by using recorded loop bounds.  */
+   maxiter = max_loop_iterations_int (loop);
+   if (maxiter >= 0
+       && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
+     {
+       n_unroll = maxiter;
+       n_unroll_found = true;
+       last_exit = NULL;
+     }
  
!   if (!n_unroll_found)
      return false;
  
!   /* We unroll only inner loops, because we do not consider it profitable
!      otheriwse.  We still can cancel loopback edge of not rolling loop;
!      this is always a good idea.  */
!   if (loop->inner && n_unroll)
      return false;
  
    max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
    if (n_unroll > max_unroll)
*************** try_unroll_loop_completely (struct loop 
*** 340,345 ****
--- 377,386 ----
  
    if (n_unroll)
      {
+       sbitmap wont_exit;
+       edge e;
+       unsigned i;
+       VEC (edge, heap) *to_remove = NULL;
        if (ul == UL_SINGLE_ITER)
  	return false;
  
*************** try_unroll_loop_completely (struct loop 
*** 372,385 ****
  	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
  	  return false;
  	}
-     }
- 
-   if (n_unroll)
-     {
-       sbitmap wont_exit;
-       edge e;
-       unsigned i;
-       VEC (edge, heap) *to_remove = NULL;
  
        initialize_original_copy_tables ();
        wont_exit = sbitmap_alloc (n_unroll + 1);
--- 413,418 ----
*************** try_unroll_loop_completely (struct loop 
*** 408,422 ****
        free_original_copy_tables ();
      }
  
!   cond = last_stmt (exit->src);
!   if (exit->flags & EDGE_TRUE_VALUE)
!     gimple_cond_make_true (cond);
!   else
!     gimple_cond_make_false (cond);
!   update_stmt (cond);
  
    if (dump_file && (dump_flags & TDF_DETAILS))
!     fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
  
    return true;
  }
--- 441,583 ----
        free_original_copy_tables ();
      }
  
!   /* EXIT is an edge that will be removed in all but last iteration of
!      the exit sequence.
!      LAST_EXIT is an exit edge that will be made unconditional
!      in the last iteration of the unrolled sequence.
! 
!      In the usual case where the number of iterations is given by
!      conditional on the induction vairable EXIT == LAST_EXIT.
!      The bounds may be however given by other reasons, such as array
!      sizes or because we are handling vectorizer prologue/epilogue loop.
! 
!      After complette unrolling we still may get rid of the conditional
!      on the exit in the last copy even if we have no idea what it does.
!      This is quite common case for loops of form
! 
!      int a[5];
!      for (i=0;i<b;i++)
!        a[i]=0;
! 
!      Here we prove the loop to iterate 5 times but we do not know
!      it from induction variable.
! 
!      We want to make the conditional of exit true, so we can only 
!      consider exits that are not part of the inner (irreducible) loops
!      so we know that the conditional is tested at most once per iteration.
! 
!      Situation is more complicated withloops with multiple exists:
! 
!      int a[5];
!      for (i=0;i<b;i++)
!        {
! 	 if (blah)
! 	   break;
! 	 a[i]=0;
!        }
! 
!      Here we could cancel the conditional of if "(blah)". And:
! 
!      int a[5];
!      for (i=0;i<b;i++)
!        {
! 	 a[i]=0;
! 	 if (blah)
! 	   break;
!        }
!  
!      Here we can cancel the last i<b test.
! 
!      To handle these we need to track all statements containing code that
!      can not execute on last iteration (in tree-ssa-loop-niter).
!      Then we can use control dependnecy (not computed here) to cancel all
!      the paths leading to them unless they are part of the inner loops.
!      This however seems bit like an overkill so we handle only the
!      simple case of single exit until interesting testcases appears.  */
!   if (!last_exit)
!     {
!       last_exit = single_exit (loop);
!       if (!last_exit || last_exit->src->loop_father != loop
! 	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
! 	last_exit = NULL;
!       else 
! 	{
! 	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
! 	    last_exit = NULL;
! 	}
!     }
! 
!   /* If loop has only one exit edge, we can remove the conditional from
!      the last copy of the loop.  
!      TODO: We should account this update into cost model.  */
!   if (last_exit)
!     {
!       cond = last_stmt (last_exit->src);
!       if (last_exit->flags & EDGE_TRUE_VALUE)
! 	gimple_cond_make_true (cond);
!       else
! 	gimple_cond_make_false (cond);
!       update_stmt (cond);
!       other_edge = EDGE_SUCC (last_exit->src, 0);
!       if (other_edge == last_exit)
! 	other_edge = EDGE_SUCC (last_exit->src, 1);
!       last_iteration_updated = true;
!     }
! 
!   /* Now destroy the loop.  First try to do so by cancelling the
!      path from exit conditional if we identified good candidate.  
! 
!      TODO: We should update the loop profile for the fact that
!      the last iteration no longer executes.  */
!   if (!other_edge || !remove_path (other_edge))
!     {
!       /* We did not manage to cancel the loop by removing the path.
!          The loop latch remains reachable even if it will never be reached
!          at runtime.  We must redirect it to somewhere, so create basic
!          block containg __builtin_unreachable call for this reason.  */
!       latch = loop->latch;
!       latch_edge = loop_latch_edge (loop);
!       flags = latch_edge->flags;
!       locus = latch_edge->goto_locus;
! 
!       /* Unloop destroys the latch edge.  */
!       unloop (loop, irred_invalidated);
! 
!       /* Create new basic block for the latch edge destination and wire
! 	 it in.  */
!       stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
!       latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
!       latch_edge->probability = 0;
!       latch_edge->count = 0;
!       latch_edge->flags |= flags;
!       latch_edge->goto_locus = locus;
! 
!       latch_edge->dest->loop_father = current_loops->tree_root;
!       latch_edge->dest->count = 0;
!       latch_edge->dest->frequency = 0;
!       set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
! 
!       gsi = gsi_start_bb (latch_edge->dest);
!       gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
!     }
  
    if (dump_file && (dump_flags & TDF_DETAILS))
!     {
!       if (!n_unroll)
!         fprintf (dump_file, "Turned loop %d to non-loop; it never loops.\n",
! 		 num);
!       else
!         fprintf (dump_file, "Unrolled loop %d completely "
! 		 "(duplicated %i times).\n", num, (int)n_unroll);
!       if (exit)
!         fprintf (dump_file, "Exit condition of peeled iterations was "
! 		 "eliminated.\n");
!       if (last_iteration_updated)
!         fprintf (dump_file, "Last iteration exit edge was proved true.\n");
!       else
!         fprintf (dump_file, "Latch of last iteration was marked by "
! 		 "__builtin_unreachable ().\n");
!     }
  
    return true;
  }
*************** try_unroll_loop_completely (struct loop 
*** 425,436 ****
     CREATE_IV is true if we may create a new iv.  UL determines
     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
     to determine the number of iterations of a loop by direct evaluation.
!    Returns true if cfg is changed.  */
  
  static bool
  canonicalize_loop_induction_variables (struct loop *loop,
  				       bool create_iv, enum unroll_level ul,
! 				       bool try_eval)
  {
    edge exit = NULL;
    tree niter;
--- 586,600 ----
     CREATE_IV is true if we may create a new iv.  UL determines
     which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
     to determine the number of iterations of a loop by direct evaluation.
!    Returns true if cfg is changed.  
! 
!    IRRED_INVALIDATED is used to keep if irreducible reginos needs to be recomputed.  */
  
  static bool
  canonicalize_loop_induction_variables (struct loop *loop,
  				       bool create_iv, enum unroll_level ul,
! 				       bool try_eval,
! 				       bool *irred_invalidated)
  {
    edge exit = NULL;
    tree niter;
*************** canonicalize_loop_induction_variables (s
*** 455,476 ****
  	      || TREE_CODE (niter) != INTEGER_CST))
  	niter = find_loop_niter_by_eval (loop, &exit);
  
!       if (chrec_contains_undetermined (niter)
! 	  || TREE_CODE (niter) != INTEGER_CST)
! 	return false;
      }
  
!   if (dump_file && (dump_flags & TDF_DETAILS))
      {
        fprintf (dump_file, "Loop %d iterates ", loop->num);
        print_generic_expr (dump_file, niter, TDF_SLIM);
        fprintf (dump_file, " times.\n");
      }
  
!   if (try_unroll_loop_completely (loop, exit, niter, ul))
      return true;
  
!   if (create_iv)
      create_canonical_iv (loop, exit, niter);
  
    return false;
--- 619,652 ----
  	      || TREE_CODE (niter) != INTEGER_CST))
  	niter = find_loop_niter_by_eval (loop, &exit);
  
!       if (TREE_CODE (niter) != INTEGER_CST)
! 	exit = NULL;
      }
  
!   /* We work exceptionally hard here to estimate the bound
!      by find_loop_niter_by_eval.  Be sure to keep it for future.  */
!   if (niter && TREE_CODE (niter) == INTEGER_CST)
!     record_niter_bound (loop, tree_to_double_int (niter), false, true);
! 
!   if (dump_file && (dump_flags & TDF_DETAILS)
!       && TREE_CODE (niter) == INTEGER_CST)
      {
        fprintf (dump_file, "Loop %d iterates ", loop->num);
        print_generic_expr (dump_file, niter, TDF_SLIM);
        fprintf (dump_file, " times.\n");
      }
+   if (dump_file && (dump_flags & TDF_DETAILS)
+       && max_loop_iterations_int (loop) >= 0)
+     {
+       fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
+ 	       (int)max_loop_iterations_int (loop));
+     }
  
!   if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated))
      return true;
  
!   if (create_iv
!       && niter && !chrec_contains_undetermined (niter))
      create_canonical_iv (loop, exit, niter);
  
    return false;
*************** canonicalize_induction_variables (void)
*** 485,499 ****
    loop_iterator li;
    struct loop *loop;
    bool changed = false;
  
    FOR_EACH_LOOP (li, loop, 0)
      {
        changed |= canonicalize_loop_induction_variables (loop,
  							true, UL_SINGLE_ITER,
! 							true);
      }
    gcc_assert (!need_ssa_update_p (cfun));
  
    /* Clean up the information about numbers of iterations, since brute force
       evaluation could reveal new information.  */
    scev_reset ();
--- 661,683 ----
    loop_iterator li;
    struct loop *loop;
    bool changed = false;
+   bool irred_invalidated = false;
+ 
+   /* Needed for loop exit conditional testing.  */
+   gcc_assert (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS));
  
    FOR_EACH_LOOP (li, loop, 0)
      {
        changed |= canonicalize_loop_induction_variables (loop,
  							true, UL_SINGLE_ITER,
! 							true,
! 							&irred_invalidated);
      }
    gcc_assert (!need_ssa_update_p (cfun));
  
+   if (irred_invalidated)
+     mark_irreducible_loops ();
+ 
    /* Clean up the information about numbers of iterations, since brute force
       evaluation could reveal new information.  */
    scev_reset ();
*************** tree_unroll_loops_completely (bool may_i
*** 592,602 ****
    enum unroll_level ul;
    int iteration = 0;
  
    do
      {
        changed = false;
  
!       FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
  	{
  	  struct loop *loop_father = loop_outer (loop);
  
--- 776,790 ----
    enum unroll_level ul;
    int iteration = 0;
  
+   /* Needed for loop exit conditional testing.  */
+   gcc_assert (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS));
+ 
    do
      {
+       bool irred_invalidated = false;
        changed = false;
  
!       FOR_EACH_LOOP (li, loop, 0)
  	{
  	  struct loop *loop_father = loop_outer (loop);
  
*************** tree_unroll_loops_completely (bool may_i
*** 609,615 ****
  	    ul = UL_NO_GROWTH;
  
  	  if (canonicalize_loop_induction_variables (loop, false, ul,
! 						     !flag_tree_loop_ivcanon))
  	    {
  	      changed = true;
  	      /* If we'll continue unrolling, we need to propagate constants
--- 797,804 ----
  	    ul = UL_NO_GROWTH;
  
  	  if (canonicalize_loop_induction_variables (loop, false, ul,
! 						     !flag_tree_loop_ivcanon,
! 						     &irred_invalidated))
  	    {
  	      changed = true;
  	      /* If we'll continue unrolling, we need to propagate constants
*************** tree_unroll_loops_completely (bool may_i
*** 629,634 ****
--- 818,826 ----
  	  struct loop **iter;
  	  unsigned i;
  
+ 	  if (irred_invalidated)
+ 	    mark_irreducible_loops ();
+ 
  	  update_ssa (TODO_update_ssa);
  
  	  /* Propagate the constants within the new basic blocks.  */
Index: fortran/f95-lang.c
===================================================================
*** fortran/f95-lang.c	(revision 192432)
--- fortran/f95-lang.c	(working copy)
*************** gfc_builtin_function (tree decl)
*** 538,543 ****
--- 538,544 ----
  #define ATTR_CONST_NOTHROW_LEAF_LIST	(ECF_NOTHROW | ECF_LEAF | ECF_CONST)
  #define ATTR_NOTHROW_LIST		(ECF_NOTHROW)
  #define ATTR_CONST_NOTHROW_LIST		(ECF_NOTHROW | ECF_CONST)
+ #define ATTR_NORETURN_NOTHROW_LEAF_LIST (ECF_NOTHROW | ECF_LEAF | ECF_NORETURN)
  
  static void
  gfc_define_builtin (const char *name, tree type, enum built_in_function code,
*************** gfc_init_builtin_functions (void)
*** 908,913 ****
--- 909,919 ----
    gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
  		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
  
+   ftype = build_function_type_list (void_type_node, NULL_TREE);
+ 
+   gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_UNREACHABLE,
+ 		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
+ 
    ftype = build_function_type_list (void_type_node,
                                      pvoid_type_node, NULL_TREE);
    gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
Index: cfgloop.h
===================================================================
*** cfgloop.h	(revision 192434)
--- cfgloop.h	(working copy)
*************** extern struct loop *loopify (edge, edge,
*** 320,326 ****
  struct loop * loop_version (struct loop *, void *,
  			    basic_block *, unsigned, unsigned, unsigned, bool);
  extern bool remove_path (edge);
! void scale_loop_frequencies (struct loop *, int, int);
  
  /* Induction variable analysis.  */
  
--- 320,327 ----
  struct loop * loop_version (struct loop *, void *,
  			    basic_block *, unsigned, unsigned, unsigned, bool);
  extern bool remove_path (edge);
! extern void unloop (struct loop *, bool *);
! extern void scale_loop_frequencies (struct loop *, int, int);
  
  /* Induction variable analysis.  */
  
Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
***************
*** 0 ****
--- 1,13 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+ int a[2];
+ test(int c)
+ { 
+   int i;
+   for (i=0;i<c;i++)
+     a[i]=5;
+ }
+ /* Array bounds says the loop will not roll much.  */
+ /* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunroll"} } */
+ /* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunroll"} } */
+ /* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
***************
*** 0 ****
--- 1,16 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+ int a[2];
+ test(int c)
+ { 
+   int i;
+   for (i=0;i<c;i++)
+     {
+       a[i]=5;
+       if (test2())
+ 	return;
+     }
+ }
+ /* We are not able to get rid of the final conditional because the loop has two exits.  */
+ /* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 times.." "cunroll"} } */
+ /* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
***************
*** 0 ****
--- 1,15 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+ int a[1];
+ test(int c)
+ { 
+   int i;
+   for (i=0;i<c;i++)
+     {
+       a[i]=5;
+     }
+ }
+ /* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
+ 
+ /* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunrolli"} } */
+ /* { dg-final { cleanup-tree-dump "cunrolli" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
***************
*** 0 ****
--- 1,20 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+ int a[1];
+ test(int c)
+ { 
+   int i=0,j;
+   for (i=0;i<c;i++)
+     {
+       for (j=0;j<c;j++)
+ 	{
+           a[i]=5;
+ 	  test2();
+ 	}
+     }
+ }
+ 
+ /* We should do this as part of cunrolli, but our cost model do not take into account early exit
+    from the last iteration.  */
+ /* { dg-final { scan-tree-dump "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
+ /* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
===================================================================
*** testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
--- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
***************
*** 0 ****
--- 1,12 ----
+ /* { dg-do compile } */
+ /* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+ int *a;
+ test(int c)
+ { 
+   int i;
+   for (i=0;i<6;i++)
+     a[i]=5;
+ }
+ /* Basic testcase for complette unrolling.  */
+ /* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 5 times.. Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
+ /* { dg-final { cleanup-tree-dump "cunroll" } } */

Patch

Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-1.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    a[i]=5;
+}
+/* Array bounds says the loop will not roll much.  */
+/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times). Last iteration exit edge was proved true." "cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    {
+      a[i]=5;
+      if (test2())
+	return;
+    }
+}
+/* We are not able to get rid of the final conditional because the loop has two exits.  */
+/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 times)." "cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-3.c	(revision 0)
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+int a[1];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    {
+      a[i]=5;
+    }
+}
+/* If we start duplicating headers prior curoll, this loop will have 0 iterations.  */
+
+/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 times)." "cunrolli"} } */
+/* { dg-final { cleanup-tree-dump "cunrolli" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-4.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[1];
+test(int c)
+{ 
+  int i=0,j;
+  for (i=0;i<c;i++)
+    {
+      for (j=0;j<c;j++)
+	{
+          a[i]=5;
+	  test2();
+	}
+    }
+}
+
+/* We should do this as part of cunrolli, but our cost model do not take into account early exit
+   from the last iteration.  */
+/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never loops. Last iteration exit edge was proved true." "cunrolli"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-5.c	(revision 0)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int *a;
+test(int c)
+{ 
+  int i;
+  for (i=0;i<6;i++)
+    a[i]=5;
+}
+/* Basic testcase for complette unrolling.  */
+/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 times). Exit condition of peeled iterations was eliminated. Last iteration exit edge was proved true." "cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: cfgloopmanip.c
===================================================================
--- cfgloopmanip.c	(revision 192360)
+++ cfgloopmanip.c	(working copy)
@@ -37,7 +37,6 @@  static int find_path (edge, basic_block 
 static void fix_loop_placements (struct loop *, bool *);
 static bool fix_bb_placement (basic_block);
 static void fix_bb_placements (basic_block, bool *);
-static void unloop (struct loop *, bool *);
 
 /* Checks whether basic block BB is dominated by DATA.  */
 static bool
@@ -895,7 +894,7 @@  loopify (edge latch_edge, edge header_ed
    If this may cause the information about irreducible regions to become
    invalid, IRRED_INVALIDATED is set to true.  */
 
-static void
+void
 unloop (struct loop *loop, bool *irred_invalidated)
 {
   basic_block *body;
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c	(revision 192360)
+++ tree-ssa-loop-ivcanon.c	(working copy)
@@ -231,7 +231,7 @@  tree_estimate_loop_size (struct loop *lo
 	  /* Look for reasons why we might optimize this stmt away. */
 
 	  /* Exit conditional.  */
-	  if (body[i] == exit->src && stmt == last_stmt (exit->src))
+	  if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
 	    {
 	      if (dump_file && (dump_flags & TDF_DETAILS))
 	        fprintf (dump_file, "   Exit condition will be eliminated.\n");
@@ -326,13 +326,43 @@  try_unroll_loop_completely (struct loop 
   unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
   gimple cond;
   struct loop_size size;
+  bool n_unroll_found = false;
+  HOST_WIDE_INT maxiter;
+  basic_block latch;
+  edge latch_edge;
+  location_t locus;
+  int flags;
+  bool irred_invalidated = false;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+  edge other_edge = NULL, last_exit;
+  int num = loop->num;
+  bool last_iteration_updated = false;
+
+  /* See if we proved number of iterations to be low constant.  */
+  if (host_integerp (niter, 1))
+    {
+      n_unroll = tree_low_cst (niter, 1);
+      n_unroll_found = true;
+    }
 
-  if (loop->inner)
+  /* See if we can improve our estimate by using recorded loop bounds.  */
+  maxiter = max_loop_iterations_int (loop);
+  if (maxiter >= 0
+      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
+    {
+      n_unroll = maxiter;
+      n_unroll_found = true;
+    }
+
+  if (!n_unroll_found)
     return false;
 
-  if (!host_integerp (niter, 1))
+  /* We unroll only inner loops, because we do not consider it profitable
+     otheriwse.  We still can cancel loopback edge of not rolling loop;
+     this is always a good idea.  */
+  if (loop->inner && n_unroll)
     return false;
-  n_unroll = tree_low_cst (niter, 1);
 
   max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
   if (n_unroll > max_unroll)
@@ -340,6 +370,10 @@  try_unroll_loop_completely (struct loop 
 
   if (n_unroll)
     {
+      sbitmap wont_exit;
+      edge e;
+      unsigned i;
+      VEC (edge, heap) *to_remove = NULL;
       if (ul == UL_SINGLE_ITER)
 	return false;
 
@@ -372,14 +408,6 @@  try_unroll_loop_completely (struct loop 
 	    fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
 	  return false;
 	}
-    }
-
-  if (n_unroll)
-    {
-      sbitmap wont_exit;
-      edge e;
-      unsigned i;
-      VEC (edge, heap) *to_remove = NULL;
 
       initialize_original_copy_tables ();
       wont_exit = sbitmap_alloc (n_unroll + 1);
@@ -408,24 +436,143 @@  try_unroll_loop_completely (struct loop 
       free_original_copy_tables ();
     }
 
-  cond = last_stmt (exit->src);
-  if (exit->flags & EDGE_TRUE_VALUE)
-    gimple_cond_make_true (cond);
-  else
-    gimple_cond_make_false (cond);
-  update_stmt (cond);
+  /* After complette unrolling we still may get rid of the conditional
+     on exit in the last copy even if we have no idea what it does.
+     This is quite common case for loops of form
+
+     int a[5];
+     for (i=0;i<b;i++)
+       a[i]=0;
+
+     Here we prove the loop to iterate 5 times but we do not know
+     it from induction variable.
+
+     We want to make the conditional of exit true, so we can only 
+     consider exits that are not part of the inner (irreducible) loops
+     so we know that the conditional is tested at most once per iteration.
+
+     Situation is more complicated withloops with multiple exists:
+
+     int a[5];
+     for (i=0;i<b;i++)
+       {
+	 if (blah)
+	   break;
+	 a[i]=0;
+       }
+
+     Here we could cancel the conditional of if "(blah)". And:
+
+     int a[5];
+     for (i=0;i<b;i++)
+       {
+	 a[i]=0;
+	 if (blah)
+	   break;
+       }
+ 
+     Here we can cancel the last i<b test.
+
+     To handle these we need to track all statements containing code that
+     can not execute on last iteration (in tree-ssa-loop-niter).
+     Then we can use control dependnecy (not computed here) to cancel all
+     the paths leading to them unless they are part of the inner loops.
+     This however seems bit like an overkill so we handle only the
+     simple case of single exit until interesting testcases appears.  */
+  last_exit = exit;
+  if (!last_exit)
+    {
+      last_exit = single_exit (loop);
+      if (!last_exit || last_exit->src->loop_father != loop
+	  || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)))
+	last_exit = NULL;
+      else 
+	{
+	  if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP)
+	    last_exit = NULL;
+	}
+    }
+
+  /* If loop has only one exit edge, we can remove the conditional from
+     the last copy of the loop.  
+     TODO: We should account this update into cost model.  */
+  if (last_exit)
+    {
+      cond = last_stmt (last_exit->src);
+      if (last_exit->flags & EDGE_TRUE_VALUE)
+	gimple_cond_make_true (cond);
+      else
+	gimple_cond_make_false (cond);
+      update_stmt (cond);
+      other_edge = EDGE_SUCC (last_exit->src, 0);
+      if (other_edge == last_exit)
+	other_edge = EDGE_SUCC (last_exit->src, 1);
+      last_iteration_updated = true;
+    }
+
+  /* Now destroy the loop.  First try to do so by cancelling the
+     patch from exit conditional if we identified good candidate.  
+
+     TODO: We should update the loop profile for the fact that
+     the last iteration no longer executes.  */
+  if (!other_edge || !remove_path (other_edge))
+    {
+      /* We did not manage to cancel the loop by removing the patch.
+         The loop latch remains reachable even if it will never be reached
+         at runtime.  We must redirect it to somewhere, so create basic
+         block containg __builtin_unreachable call for this reason.  */
+      latch = loop->latch;
+      latch_edge = loop_latch_edge (loop);
+      flags = latch_edge->flags;
+      locus = latch_edge->goto_locus;
+
+      /* Unloop destroys the latch edge.  */
+      unloop (loop, &irred_invalidated);
+      if (irred_invalidated)
+	mark_irreducible_loops ();
+
+      /* Create new basic block for the latch edge destination and wire
+	 it in.  */
+      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0);
+      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags);
+      gsi = gsi_start_bb (latch_edge->dest);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+      latch_edge->dest->loop_father = current_loops->tree_root;
+      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src);
+      latch_edge->probability = 0;
+      latch_edge->count = 0;
+      latch_edge->flags |= flags;
+      latch_edge->goto_locus = locus;
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
+    {
+      if (!n_unroll)
+        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.%s\n", num,
+		 last_iteration_updated
+		 ? " Last iteration exit edge was proved true." : "");
+      else
+	{
+          fprintf (dump_file, "Unrolled loop %d completely "
+		   "(duplicated %i times).%s%s\n", num, (int)n_unroll,
+		   exit
+		   ? " Exit condition of peeled iterations was eliminated." : "",
+		   last_iteration_updated
+		   ? " Last iteration exit edge was proved true." : "");
+	}
+    }
 
-  return true;
+  return true;
 }
 
 /* Adds a canonical induction variable to LOOP if suitable.
    CREATE_IV is true if we may create a new iv.  UL determines
    which loops we are allowed to completely unroll.  If TRY_EVAL is true, we try
    to determine the number of iterations of a loop by direct evaluation.
-   Returns true if cfg is changed.  */
+   Returns true if cfg is changed.  
+
+   IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible
+   loops.  This is used in a special case of the exit condition analysis.  */
 
 static bool
 canonicalize_loop_induction_variables (struct loop *loop,
@@ -455,22 +602,34 @@  canonicalize_loop_induction_variables (s
 	      || TREE_CODE (niter) != INTEGER_CST))
 	niter = find_loop_niter_by_eval (loop, &exit);
 
-      if (chrec_contains_undetermined (niter)
-	  || TREE_CODE (niter) != INTEGER_CST)
-	return false;
+      if (TREE_CODE (niter) != INTEGER_CST)
+	exit = NULL;
     }
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  /* We work exceptionally hard here to estimate the bound
+     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
+  if (niter && TREE_CODE (niter) == INTEGER_CST)
+    record_niter_bound (loop, tree_to_double_int (niter), false, true);
+
+  if (dump_file && (dump_flags & TDF_DETAILS)
+      && TREE_CODE (niter) == INTEGER_CST)
     {
       fprintf (dump_file, "Loop %d iterates ", loop->num);
       print_generic_expr (dump_file, niter, TDF_SLIM);
       fprintf (dump_file, " times.\n");
     }
+  if (dump_file && (dump_flags & TDF_DETAILS)
+      && max_loop_iterations_int (loop) >= 0)
+    {
+      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
+	       (int)max_loop_iterations_int (loop));
+    }
 
   if (try_unroll_loop_completely (loop, exit, niter, ul))
     return true;
 
-  if (create_iv)
+  if (create_iv
+      && niter && !chrec_contains_undetermined (niter))
     create_canonical_iv (loop, exit, niter);
 
   return false;
@@ -483,11 +642,14 @@  unsigned int
 canonicalize_induction_variables (void)
 {
   loop_iterator li;
-  struct loop *loop;
+  struct loop *loop, *next;
   bool changed = false;
 
-  FOR_EACH_LOOP (li, loop, 0)
+  for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;)
     {
+      loop = next;
+      fel_next (&li, &next);
+      
       changed |= canonicalize_loop_induction_variables (loop,
 							true, UL_SINGLE_ITER,
 							true);
@@ -591,14 +753,18 @@  tree_unroll_loops_completely (bool may_i
   bool changed;
   enum unroll_level ul;
   int iteration = 0;
+  struct loop *next;
 
   do
     {
       changed = false;
 
-      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
+      for (fel_init (&li, &next, 0); next;)
 	{
-	  struct loop *loop_father = loop_outer (loop);
+	  struct loop *loop_father = loop_outer (next);
+
+	  loop = next;
+	  fel_next (&li, &next);
 
 	  if (may_increase_size && optimize_loop_for_speed_p (loop)
 	      /* Unroll outermost loops only if asked to do so or they do
Index: fortran/f95-lang.c
===================================================================
--- fortran/f95-lang.c	(revision 192360)
+++ fortran/f95-lang.c	(working copy)
@@ -908,6 +908,11 @@  gfc_init_builtin_functions (void)
   gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT,
 		      "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST);
 
+  ftype = build_function_type_list (void_type_node, NULL_TREE);
+
+  gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT,
+		      "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST);
+
   ftype = build_function_type_list (void_type_node,
                                     pvoid_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE,
Index: cfgloop.h
===================================================================
--- cfgloop.h	(revision 192360)
+++ cfgloop.h	(working copy)
@@ -319,7 +319,8 @@  extern struct loop *loopify (edge, edge,
 struct loop * loop_version (struct loop *, void *,
 			    basic_block *, unsigned, unsigned, unsigned, bool);
 extern bool remove_path (edge);
-void scale_loop_frequencies (struct loop *, int, int);
+extern void scale_loop_frequencies (struct loop *, int, int);
+extern void unloop (struct loop *, bool *);
 
 /* Induction variable analysis.  */