Patchwork Fix array bound niter estimate (PR middle-end/54937)

login
register
mail settings
Submitter Jan Hubicka
Date Oct. 19, 2012, 9:33 a.m.
Message ID <20121019093317.GA30010@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/192624/
State New
Headers show

Comments

Jan Hubicka - Oct. 19, 2012, 9:33 a.m.
Hi,
this patch fixes off-by-one error in the testcase attached.  The problem is that
dominance based test used by record_estimate to check whether the given statement
must be executed at last iteration of the loop is wrong ignoring the side effect
of other statements that may terminate the program.
It also does not work for mulitple exits as excercised by cunroll-2.c testcase.

This patch makes simple approach of computing set of all statements that must
by executed last iteration first time record_estimate is executed this way.
The set is computed conservatively walking header BB and its signle successors
(possibly diving into nested loop) stopping on first BB with multiple exits.

Better result can be computed by
1) estimating what loops are known to be finite
2) inserting fake edges for all infinite loop and all statements with side effect
   that may terminate the execution
3) using the post dominance info.
But that seems too expensive for something that is executed several times per
function compilation. In fact I am thinking about making recrod-estimate to happen
at ivcanon time only and then making the loop_max_iterations and loop_estimated_iterations
to simply return the bound instead of trying to recompute it all the time.

Still I do not see us to care about very precise bound for loops having complex
control flow since we do not really want to completely unroll them and elsewhere
the off-by-one error in conservative direction for iteration estimate is not big
deal.

Bootstrapped/regtested x86_64-linux, seems sane?

Honza

	PR middle-end/54937
	* tree-ssa-loop-niter.c (compute_stmts_executed_last_iteration): New
	function.
	(record_estimate): Use it; remove confused dominance test.
	(estimate_numbers_of_iterations_loop): Free stmts_executed_last_iteration.
	* gcc.c-torture/execute/pr54937.c: Ne wtestcase.
	* testsuite/gcc.dg/tree-ssa/cunroll-2.c: Tighten.
Index: tree-ssa-loop-niter.c
===================================================================
*** tree-ssa-loop-niter.c	(revision 192537)
--- tree-ssa-loop-niter.c	(working copy)
*************** record_niter_bound (struct loop *loop, d
*** 2523,2528 ****
--- 2523,2580 ----
      loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
  }
  
+ /* Compute pointer set of statements in currently analyzed loop that are known
+    to be executed at last iteration and store it into AUX.  
+    We do very simple job of recording statements in the superblock
+    starsting in loop header until reaching first statement with side effect.
+ 
+    We can compute post-dominance after inserting fake edges to CFG
+    for all statements possibly terminating CFG and for all possibly
+    infinite subloops, but we really care about precise estimates
+    of simple loops with little control flow in it.  */
+ static void
+ compute_stmts_executed_last_iteration (struct loop *loop)
+ {
+   basic_block bb = loop->header;
+   pointer_set_t *visited = NULL;
+   gimple_stmt_iterator gsi;
+   pointer_set_t *stmts_executed_last_iteration;
+ 
+   loop->aux = stmts_executed_last_iteration = pointer_set_create ();
+   while (true)
+     {
+       for (gsi = gsi_start_bb (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+ 	{
+ 	  if (gimple_has_side_effects (gsi_stmt (gsi)))
+ 	    {
+ 	      if (visited)
+ 	        pointer_set_destroy (visited);
+ 	      return;
+ 	    }
+ 	  pointer_set_insert (stmts_executed_last_iteration, gsi_stmt (gsi));
+ 	}
+       if (single_succ_p (bb))
+ 	{
+ 	  /* Deal with the rare case that there is an infintie loop inside the
+ 	     loop examined.  */
+ 	  if (!visited)
+ 	    {
+               visited = pointer_set_create ();
+               pointer_set_insert (visited, bb);
+ 	    }
+ 	  bb = single_succ_edge (bb)->dest;
+           if (pointer_set_insert (visited, bb))
+ 	    break;
+ 	}
+       else
+ 	break;
+     }
+   if (visited)
+     pointer_set_destroy (visited);
+   return;
+ }
+ 
+ 
  /* Records that AT_STMT is executed at most BOUND + 1 times in LOOP.  IS_EXIT
     is true if the loop is exited immediately after STMT, and this exit
     is taken at last when the STMT is executed BOUND + 1 times.
*************** record_estimate (struct loop *loop, tree
*** 2535,2541 ****
  		 gimple at_stmt, bool is_exit, bool realistic, bool upper)
  {
    double_int delta;
-   edge exit;
  
    if (dump_file && (dump_flags & TDF_DETAILS))
      {
--- 2587,2592 ----
*************** record_estimate (struct loop *loop, tree
*** 2573,2586 ****
       If at_stmt is an exit or dominates the single exit from the loop,
       then the loop latch is executed at most BOUND times, otherwise
       it can be executed BOUND + 1 times.  */
!   exit = single_exit (loop);
!   if (is_exit
!       || (exit != NULL
! 	  && dominated_by_p (CDI_DOMINATORS,
! 			     exit->src, gimple_bb (at_stmt))))
      delta = double_int_zero;
    else
!     delta = double_int_one;
    i_bound += delta;
  
    /* If an overflow occurred, ignore the result.  */
--- 2624,2641 ----
       If at_stmt is an exit or dominates the single exit from the loop,
       then the loop latch is executed at most BOUND times, otherwise
       it can be executed BOUND + 1 times.  */
!   if (is_exit)
      delta = double_int_zero;
    else
!     {
!       if (!loop->aux)
! 	compute_stmts_executed_last_iteration (loop);
!       if (pointer_set_contains ((pointer_set_t *)loop->aux,
! 				at_stmt))
!         delta = double_int_zero;
!       else
!         delta = double_int_one;
!     }
    i_bound += delta;
  
    /* If an overflow occurred, ignore the result.  */
*************** estimate_numbers_of_iterations_loop (str
*** 2971,2976 ****
--- 3026,3033 ----
    if (loop->estimate_state != EST_NOT_COMPUTED)
      return;
  
+   gcc_assert (!loop->aux);
+ 
    loop->estimate_state = EST_AVAILABLE;
    /* Force estimate compuation but leave any existing upper bound in place.  */
    loop->any_estimate = false;
*************** estimate_numbers_of_iterations_loop (str
*** 3004,3009 ****
--- 3061,3071 ----
        bound = gcov_type_to_double_int (nit);
        record_niter_bound (loop, bound, true, false);
      }
+   if (loop->aux)
+     {
+       pointer_set_destroy ((pointer_set_t *)loop->aux);
+       loop->aux = NULL;
+     }
  }
  
  /* Sets NIT to the estimated number of executions of the latch of the
Richard Guenther - Oct. 19, 2012, 10:05 a.m.
On Fri, 19 Oct 2012, Jan Hubicka wrote:

> Hi,
> this patch fixes off-by-one error in the testcase attached.  The problem is that
> dominance based test used by record_estimate to check whether the given statement
> must be executed at last iteration of the loop is wrong ignoring the side effect
> of other statements that may terminate the program.
> It also does not work for mulitple exits as excercised by cunroll-2.c testcase.
> 
> This patch makes simple approach of computing set of all statements that must
> by executed last iteration first time record_estimate is executed this way.
> The set is computed conservatively walking header BB and its signle successors
> (possibly diving into nested loop) stopping on first BB with multiple exits.
> 
> Better result can be computed by
> 1) estimating what loops are known to be finite
> 2) inserting fake edges for all infinite loop and all statements with side effect
>    that may terminate the execution
> 3) using the post dominance info.

would using post-dom info even work?  That only says that _if_ the
dominated stmt executed then it came through the dominator.  It
doesn't deal with functions that may not return.

> But that seems too expensive for something that is executed several times per
> function compilation. In fact I am thinking about making recrod-estimate to happen
> at ivcanon time only and then making the loop_max_iterations and loop_estimated_iterations
> to simply return the bound instead of trying to recompute it all the time.
> 
> Still I do not see us to care about very precise bound for loops having complex
> control flow since we do not really want to completely unroll them and elsewhere
> the off-by-one error in conservative direction for iteration estimate is not big
> deal.
> 
> Bootstrapped/regtested x86_64-linux, seems sane?
> 
> Honza
> 
> 	PR middle-end/54937
> 	* tree-ssa-loop-niter.c (compute_stmts_executed_last_iteration): New
> 	function.
> 	(record_estimate): Use it; remove confused dominance test.
> 	(estimate_numbers_of_iterations_loop): Free stmts_executed_last_iteration.
> 	* gcc.c-torture/execute/pr54937.c: Ne wtestcase.
> 	* testsuite/gcc.dg/tree-ssa/cunroll-2.c: Tighten.
> Index: tree-ssa-loop-niter.c
> ===================================================================
> *** tree-ssa-loop-niter.c	(revision 192537)
> --- tree-ssa-loop-niter.c	(working copy)
> *************** record_niter_bound (struct loop *loop, d
> *** 2523,2528 ****
> --- 2523,2580 ----
>       loop->nb_iterations_estimate = loop->nb_iterations_upper_bound;
>   }
>   
> + /* Compute pointer set of statements in currently analyzed loop that are known
> +    to be executed at last iteration and store it into AUX.  
> +    We do very simple job of recording statements in the superblock
> +    starsting in loop header until reaching first statement with side effect.
> + 
> +    We can compute post-dominance after inserting fake edges to CFG
> +    for all statements possibly terminating CFG and for all possibly
> +    infinite subloops, but we really care about precise estimates
> +    of simple loops with little control flow in it.  */
> + static void
> + compute_stmts_executed_last_iteration (struct loop *loop)
> + {
> +   basic_block bb = loop->header;
> +   pointer_set_t *visited = NULL;
> +   gimple_stmt_iterator gsi;
> +   pointer_set_t *stmts_executed_last_iteration;
> + 
> +   loop->aux = stmts_executed_last_iteration = pointer_set_create ();
> +   while (true)
> +     {
> +       for (gsi = gsi_start_bb (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
> + 	{
> + 	  if (gimple_has_side_effects (gsi_stmt (gsi)))
> + 	    {
> + 	      if (visited)
> + 	        pointer_set_destroy (visited);
> + 	      return;
> + 	    }
> + 	  pointer_set_insert (stmts_executed_last_iteration, gsi_stmt (gsi));
> + 	}
> +       if (single_succ_p (bb))
> + 	{
> + 	  /* Deal with the rare case that there is an infintie loop inside the
> + 	     loop examined.  */
> + 	  if (!visited)
> + 	    {
> +               visited = pointer_set_create ();
> +               pointer_set_insert (visited, bb);
> + 	    }
> + 	  bb = single_succ_edge (bb)->dest;
> +           if (pointer_set_insert (visited, bb))
> + 	    break;
> + 	}
> +       else
> + 	break;
> +     }
> +   if (visited)
> +     pointer_set_destroy (visited);
> +   return;
> + }
> + 
> + 
>   /* Records that AT_STMT is executed at most BOUND + 1 times in LOOP.  IS_EXIT
>      is true if the loop is exited immediately after STMT, and this exit
>      is taken at last when the STMT is executed BOUND + 1 times.
> *************** record_estimate (struct loop *loop, tree
> *** 2535,2541 ****
>   		 gimple at_stmt, bool is_exit, bool realistic, bool upper)
>   {
>     double_int delta;
> -   edge exit;
>   
>     if (dump_file && (dump_flags & TDF_DETAILS))
>       {
> --- 2587,2592 ----
> *************** record_estimate (struct loop *loop, tree
> *** 2573,2586 ****
>        If at_stmt is an exit or dominates the single exit from the loop,
>        then the loop latch is executed at most BOUND times, otherwise
>        it can be executed BOUND + 1 times.  */
> !   exit = single_exit (loop);
> !   if (is_exit
> !       || (exit != NULL
> ! 	  && dominated_by_p (CDI_DOMINATORS,
> ! 			     exit->src, gimple_bb (at_stmt))))
>       delta = double_int_zero;
>     else
> !     delta = double_int_one;
>     i_bound += delta;
>   
>     /* If an overflow occurred, ignore the result.  */
> --- 2624,2641 ----
>        If at_stmt is an exit or dominates the single exit from the loop,
>        then the loop latch is executed at most BOUND times, otherwise
>        it can be executed BOUND + 1 times.  */
> !   if (is_exit)
>       delta = double_int_zero;
>     else
> !     {
> !       if (!loop->aux)
> ! 	compute_stmts_executed_last_iteration (loop);
> !       if (pointer_set_contains ((pointer_set_t *)loop->aux,
> ! 				at_stmt))
> !         delta = double_int_zero;
> !       else
> !         delta = double_int_one;
> !     }

What about the conservative variant of simply

      else
        delta = double_int_one;

?  I don't like all the code you add, nor the use of ->aux.

>     i_bound += delta;

Another alternative would be to not use i_bound for the
strong upper bound but only the estimate (thus conservatively
use i_bound + 1 for the upper bound if !is_exit).

Richard.

>     /* If an overflow occurred, ignore the result.  */
> *************** estimate_numbers_of_iterations_loop (str
> *** 2971,2976 ****
> --- 3026,3033 ----
>     if (loop->estimate_state != EST_NOT_COMPUTED)
>       return;
>   
> +   gcc_assert (!loop->aux);
> + 
>     loop->estimate_state = EST_AVAILABLE;
>     /* Force estimate compuation but leave any existing upper bound in place.  */
>     loop->any_estimate = false;
> *************** estimate_numbers_of_iterations_loop (str
> *** 3004,3009 ****
> --- 3061,3071 ----
>         bound = gcov_type_to_double_int (nit);
>         record_niter_bound (loop, bound, true, false);
>       }
> +   if (loop->aux)
> +     {
> +       pointer_set_destroy ((pointer_set_t *)loop->aux);
> +       loop->aux = NULL;
> +     }
>   }
>   
>   /* Sets NIT to the estimated number of executions of the latch of the
> Index: testsuite/gcc.c-torture/execute/pr54937.c
> ===================================================================
> --- testsuite/gcc.c-torture/execute/pr54937.c	(revision 0)
> +++ testsuite/gcc.c-torture/execute/pr54937.c	(revision 0)
> @@ -0,0 +1,22 @@
> +
> +void exit (int);
> +void abort (void);
> +int a[1];
> +void (*terminate_me)(int);
> +
> +__attribute__((noinline,noclone))
> +t(int c)
> +{ int i;
> +  for (i=0;i<c;i++)
> +    {
> +      if (i)
> +       terminate_me(0);
> +      a[i]=0;
> +    }
> +}
> +main()
> +{
> +  terminate_me = exit;
> +  t(100);
> +  abort();
> +}
> Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 192608)
> +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(working copy)
> @@ -12,5 +12,5 @@ test(int c)
>      }
>  }
>  /* We are not able to get rid of the final conditional because the loop has two exits.  */
> -/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 times.." "cunroll"} } */
> +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunroll"} } */
>  /* { dg-final { cleanup-tree-dump "cunroll" } } */
> 
>
Jan Hubicka - Oct. 19, 2012, 2:30 p.m.
> On Fri, 19 Oct 2012, Jan Hubicka wrote:
> 
> > Hi,
> > this patch fixes off-by-one error in the testcase attached.  The problem is that
> > dominance based test used by record_estimate to check whether the given statement
> > must be executed at last iteration of the loop is wrong ignoring the side effect
> > of other statements that may terminate the program.
> > It also does not work for mulitple exits as excercised by cunroll-2.c testcase.
> > 
> > This patch makes simple approach of computing set of all statements that must
> > by executed last iteration first time record_estimate is executed this way.
> > The set is computed conservatively walking header BB and its signle successors
> > (possibly diving into nested loop) stopping on first BB with multiple exits.
> > 
> > Better result can be computed by
> > 1) estimating what loops are known to be finite
> > 2) inserting fake edges for all infinite loop and all statements with side effect
> >    that may terminate the execution
> > 3) using the post dominance info.
> 
> would using post-dom info even work?  That only says that _if_ the
> dominated stmt executed then it came through the dominator.  It
> doesn't deal with functions that may not return.

With fake edges inserted it will. We do have code for that used in profiling that
also needs this stronger definition of CFG.
> 
> What about the conservative variant of simply
> 
>       else
>         delta = double_int_one;

I think it would be bad idea: it makes us to completely unroll one interation
too many that bloats code for no benefit. No optimization cancels the path in
CFG because of undefined effect and thus the code will be output (unless someone
smarter, like VRP, cleans up later, but it is more an exception than rule.)
> 
> ?  I don't like all the code you add, nor the use of ->aux.

Neither I really do, but what are the alternatives?

My first implementation simply checked that stmt is in the loop header and
walked up to the beggining of basic blocks looking for side effects.  Then I
become worried about possibility of gigantic basic blocks with many array
stores within the loop, so I decided to record the reachable statements instead
of repeating the walk.
Loop count estimation is recursive (i.e. it dives into inner loops), thus I
ended up with using AUX.  I can for sure put this separately or add extra
reference argument passed over the whole call stack, but there are quite many
functions that can leads to record_estimate. (I have nothing against that
alternative however if AUX looks ugly)
> 
> >     i_bound += delta;
> 
> Another alternative would be to not use i_bound for the
> strong upper bound but only the estimate (thus conservatively
> use i_bound + 1 for the upper bound if !is_exit).

We can not derrive realistic estimate based on this: the loop may exit much earlier.
We can only lower the estimate if it is already there and greater than this bound.
This can probably happen with profile feedback and I can implement it later,
I do not think it is terribly important though.

Honza
Jan Hubicka - Oct. 19, 2012, 3:03 p.m.
> > What about the conservative variant of simply
> > 
> >       else
> >         delta = double_int_one;
> 
> I think it would be bad idea: it makes us to completely unroll one interation
> too many that bloats code for no benefit. No optimization cancels the path in
> CFG because of undefined effect and thus the code will be output (unless someone
> smarter, like VRP, cleans up later, but it is more an exception than rule.)

OK, on deper tought I guess I can add double_int_one always at that spot and
once we are done with everything I can walk nb_iter_bound for all statements
known to not be executed on last iteration and record them to pointer set.

Finally I can walk from header in DFS stopping on loop exits, side effects and
those stateemnts.  If I visit no loop exit or side effect I know I can lower
iteration count by 1 (in estimate_numbers_of_iterations_loop).

This will give accurate answer and requires just little extra bookkeeping.

I will give this a try.
Honza
Richard Guenther - Oct. 22, 2012, 8:04 a.m.
On Fri, 19 Oct 2012, Jan Hubicka wrote:

> > On Fri, 19 Oct 2012, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch fixes off-by-one error in the testcase attached.  The problem is that
> > > dominance based test used by record_estimate to check whether the given statement
> > > must be executed at last iteration of the loop is wrong ignoring the side effect
> > > of other statements that may terminate the program.
> > > It also does not work for mulitple exits as excercised by cunroll-2.c testcase.
> > > 
> > > This patch makes simple approach of computing set of all statements that must
> > > by executed last iteration first time record_estimate is executed this way.
> > > The set is computed conservatively walking header BB and its signle successors
> > > (possibly diving into nested loop) stopping on first BB with multiple exits.
> > > 
> > > Better result can be computed by
> > > 1) estimating what loops are known to be finite
> > > 2) inserting fake edges for all infinite loop and all statements with side effect
> > >    that may terminate the execution
> > > 3) using the post dominance info.
> > 
> > would using post-dom info even work?  That only says that _if_ the
> > dominated stmt executed then it came through the dominator.  It
> > doesn't deal with functions that may not return.
> 
> With fake edges inserted it will. We do have code for that used in profiling that
> also needs this stronger definition of CFG.

Huh, but then we will need to split blocks.  I don't think that's viable.

> > 
> > What about the conservative variant of simply
> > 
> >       else
> >         delta = double_int_one;
> 
> I think it would be bad idea: it makes us to completely unroll one interation
> too many that bloats code for no benefit. No optimization cancels the path in
> CFG because of undefined effect and thus the code will be output (unless someone
> smarter, like VRP, cleans up later, but it is more an exception than rule.)
> > 
> > ?  I don't like all the code you add, nor the use of ->aux.
> 
> Neither I really do, but what are the alternatives?

See above ;)

> My first implementation simply checked that stmt is in the loop header and
> walked up to the beggining of basic blocks looking for side effects.  Then I
> become worried about possibility of gigantic basic blocks with many array
> stores within the loop, so I decided to record the reachable statements instead
> of repeating the walk.
> Loop count estimation is recursive (i.e. it dives into inner loops), thus I
> ended up with using AUX.  I can for sure put this separately or add extra
> reference argument passed over the whole call stack, but there are quite many
> functions that can leads to record_estimate. (I have nothing against that
> alternative however if AUX looks ugly)

I am worried about passes trying to use AUX.  We should at least document
that it is for internal use only.

> > 
> > >     i_bound += delta;
> > 
> > Another alternative would be to not use i_bound for the
> > strong upper bound but only the estimate (thus conservatively
> > use i_bound + 1 for the upper bound if !is_exit).
> 
> We can not derrive realistic estimate based on this: the loop may exit much earlier.
> We can only lower the estimate if it is already there and greater than this bound.
> This can probably happen with profile feedback and I can implement it later,
> I do not think it is terribly important though.
> 
> Honza
> 
>

Patch

Index: testsuite/gcc.c-torture/execute/pr54937.c
===================================================================
--- testsuite/gcc.c-torture/execute/pr54937.c	(revision 0)
+++ testsuite/gcc.c-torture/execute/pr54937.c	(revision 0)
@@ -0,0 +1,22 @@ 
+
+void exit (int);
+void abort (void);
+int a[1];
+void (*terminate_me)(int);
+
+__attribute__((noinline,noclone))
+t(int c)
+{ int i;
+  for (i=0;i<c;i++)
+    {
+      if (i)
+       terminate_me(0);
+      a[i]=0;
+    }
+}
+main()
+{
+  terminate_me = exit;
+  t(100);
+  abort();
+}
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c	(revision 192608)
+++ testsuite/gcc.dg/tree-ssa/cunroll-2.c	(working copy)
@@ -12,5 +12,5 @@  test(int c)
     }
 }
 /* We are not able to get rid of the final conditional because the loop has two exits.  */
-/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 times.." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 times.." "cunroll"} } */
 /* { dg-final { cleanup-tree-dump "cunroll" } } */