diff mbox

Do not give realistic estimates for loop with array accesses

Message ID 20160330100018.GA54780@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka March 30, 2016, 10 a.m. UTC
Hi,
while looking into sudoku solving benchark, I noticed that we incorrectly
estimate loop to iterate 10 times just because the array it traverses is of
dimension 10. This of course is just upper bound and not realistic bound.
Realistically those loops iterates once most of time.

It turns out this bug was introduced by me in
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
While I do not recall doing that patch, it seems like a thinko about reliable
(name of the variable) and realistic (name of the parameter it is passed to).

Fixing this caused one testsuite fallout in predictive commoning testcase
because loop unswitching previously disabled itself having an estimated number
of iterations 1 (I am not sure if that is not supposed to be 0, with 1
iteration unswithcing may pay back, little bit, I suppose it wants to test
number of stmt executions of the condtional to be at least 2 which depends on
whether the conditional is before or after the loop exits). This made me notice
that some loop passes that want to give up on low trip count uses combination
of estimated number of iterations and max number of iterations while other
don't which is fixed here. The code combining both realistic and max counts
is same as i.e. in unroller and other passes already.

I also wonder if predictive comming is a win for loops with very low
trip count, but that is for separate patch, too, anyway.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
	estimates here.
	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
	max_loop_iterations_int.
	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.

Comments

Richard Biener March 30, 2016, 11:30 a.m. UTC | #1
On Wed, 30 Mar 2016, Jan Hubicka wrote:

> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
> 
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
> 
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
> 
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
> 	* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
> 	estimates here.
> 	* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
> 	max_loop_iterations_int.
> 	* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
> 	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===================================================================
> --- tree-ssa-loop-niter.c	(revision 234516)
> +++ tree-ssa-loop-niter.c	(working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>    tree low, high, type, next;
>    bool sign, upper = true, at_end = false;
>    struct loop *loop = data->loop;
> -  bool reliable = true;
>  
>    if (TREE_CODE (base) != ARRAY_REF)
>      return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>        && tree_int_cst_compare (next, high) <= 0)
>      return true;
>  
> -  /* If access is not executed on every iteration, we must ensure that overlow may
> -     not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that overlow
> +     may not make the access valid later.  */
>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>  				step, data->stmt, loop, true))
> -    reliable = false;
> +    upper = false;
>  
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
>  }
>  
> Index: tree-ssa-loop-unswitch.c
> ===================================================================
> --- tree-ssa-loop-unswitch.c	(revision 234516)
> +++ tree-ssa-loop-unswitch.c	(working copy)
> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>        /* If the loop is not expected to iterate, there is no need
>  	 for unswitching.  */
>        iterations = estimated_loop_iterations_int (loop);
> +      if (iterations < 0)
> +        iterations = max_loop_iterations_int (loop);

You are only changing one place in this file.

>        if (iterations >= 0 && iterations <= 1)
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> Index: tree-ssa-loop-ivopts.c
> ===================================================================
> --- tree-ssa-loop-ivopts.c	(revision 234516)
> +++ tree-ssa-loop-ivopts.c	(working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>    HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>    if (niter == -1)
> -    return AVG_LOOP_NITER (loop);
> +    {
> +      niter = max_stmt_executions_int (loop);
> +      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +        return AVG_LOOP_NITER (loop);
> +    }
>  
>    return niter;
>  }
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c	(revision 234516)
> +++ tree-vect-loop.c	(working copy)
> @@ -2063,6 +2063,9 @@ start_over:
>  
>    estimated_niter
>      = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> +  if (estimated_niter != -1)
> +    estimated_niter
> +      = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>    if (estimated_niter != -1
>        && ((unsigned HOST_WIDE_INT) estimated_niter
>            <= MAX (th, (unsigned)min_profitable_estimate)))

The vectorizer already checks this (albeit indirectly):

  HOST_WIDE_INT max_niter
    = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
  if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
      || (max_niter != -1
          && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "not vectorized: iteration count smaller than "
                         "vectorization factor.\n");
      return false;
    }

Richard.

>
Bin.Cheng March 30, 2016, 1:24 p.m. UTC | #2
On Wed, Mar 30, 2016 at 11:00 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
>
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
>
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
>
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
>         * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
>         estimates here.
>         * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>         max_loop_iterations_int.
>         * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>         * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===================================================================
> --- tree-ssa-loop-niter.c       (revision 234516)
> +++ tree-ssa-loop-niter.c       (working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>    tree low, high, type, next;
>    bool sign, upper = true, at_end = false;
>    struct loop *loop = data->loop;
> -  bool reliable = true;
>
>    if (TREE_CODE (base) != ARRAY_REF)
>      return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>        && tree_int_cst_compare (next, high) <= 0)
>      return true;
>
> -  /* If access is not executed on every iteration, we must ensure that overlow may
> -     not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that overlow
> +     may not make the access valid later.  */
>    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>                                 step, data->stmt, loop, true))
> -    reliable = false;
> +    upper = false;
>
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
>  }
Hi,
I have a concern that GCC records bound information from basic blocks
even that don't dominate loop latch.  Given below example:

extern int b[];
void bar (int *);
int foo (int x, int n)
{
  int i;
  int arr[128] = {0};

  for (i = 0; i < n; i++)
    {
      if (x > i)
        {
          a[i] = i;
          b[i] = i;
        }
    }
  bar (arr);
  return 0;
}
The upper bound inferred from a[i] is 127.  This information is
recorded along with the stmt itself in loop->bounds.  Afterwards, this
information is also used in a flow-sensitive way.  In this example, we
are sure that &b[i] won't overflow (thus a SCEV) because it's in the
same basic block as a[i].  GCC currently relies on such information in
overflow detection for scev, i.e., loop_exits_before_overflow.

But with this change, we won't record upper bound information in
record_estimate because the parameter is set to false?

Thanks,
bin
Jan Hubicka March 30, 2016, 2:22 p.m. UTC | #3
> > -  /* If access is not executed on every iteration, we must ensure that overlow may
> > -     not make the access valid later.  */
> > +  /* If access is not executed on every iteration, we must ensure that overlow
> > +     may not make the access valid later.  */
> >    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> >        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
> >                                 step, data->stmt, loop, true))
> > -    reliable = false;
> > +    upper = false;
> >
> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
> >    return true;
> >  }
> Hi,
> I have a concern that GCC records bound information from basic blocks
> even that don't dominate loop latch.  Given below example:
> 
> extern int b[];
> void bar (int *);
> int foo (int x, int n)
> {
>   int i;
>   int arr[128] = {0};
> 
>   for (i = 0; i < n; i++)
>     {
>       if (x > i)
>         {
>           a[i] = i;
>           b[i] = i;
>         }
>     }
>   bar (arr);
>   return 0;
> }

This testcase is not affected, becase scev_probably_wraps_p returns false in this case.
In the wrapping case, we can't derive upper bound - this is indeed a correctness issue.

I am experiemtning with enabling loop peeling by default. For that I extended the code
to record likely upper bounds, which can be used to test if the loop most probably has
low trip count. This is also useful to trottle other transformations.

In this case we can probably assume that no sane person would count
on wrapping and record this as likely bound.

Honza
Bin.Cheng March 30, 2016, 2:41 p.m. UTC | #4
On Wed, Mar 30, 2016 at 3:22 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > -  /* If access is not executed on every iteration, we must ensure that overlow may
>> > -     not make the access valid later.  */
>> > +  /* If access is not executed on every iteration, we must ensure that overlow
>> > +     may not make the access valid later.  */
>> >    if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>> >        && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
>> >                                 step, data->stmt, loop, true))
>> > -    reliable = false;
>> > +    upper = false;
>> >
>> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
>> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>> >    return true;
>> >  }
>> Hi,
>> I have a concern that GCC records bound information from basic blocks
>> even that don't dominate loop latch.  Given below example:
>>
>> extern int b[];
>> void bar (int *);
>> int foo (int x, int n)
>> {
>>   int i;
>>   int arr[128] = {0};
>>
>>   for (i = 0; i < n; i++)
>>     {
>>       if (x > i)
>>         {
>>           a[i] = i;
>>           b[i] = i;
>>         }
>>     }
>>   bar (arr);
>>   return 0;
>> }
>
> This testcase is not affected, becase scev_probably_wraps_p returns false in this case.
> In the wrapping case, we can't derive upper bound - this is indeed a correctness issue.
In the wrapping case, we still can derive upper bound if the index's
wrapping range is larger than array bound.  But I agree it looks like
very corner case and not likely be useful in practice.

Thanks,
bin
>
> I am experiemtning with enabling loop peeling by default. For that I extended the code
> to record likely upper bounds, which can be used to test if the loop most probably has
> low trip count. This is also useful to trottle other transformations.
>
> In this case we can probably assume that no sane person would count
> on wrapping and record this as likely bound.
>
> Honza
diff mbox

Patch

Index: tree-ssa-loop-niter.c
===================================================================
--- tree-ssa-loop-niter.c	(revision 234516)
+++ tree-ssa-loop-niter.c	(working copy)
@@ -3115,7 +3115,6 @@  idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
     return true;
@@ -3187,14 +3186,14 @@  idx_infer_loop_bounds (tree base, tree *
       && tree_int_cst_compare (next, high) <= 0)
     return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow may
-     not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+     may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
       && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
 				step, data->stmt, loop, true))
-    reliable = false;
+    upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
   return true;
 }
 
Index: tree-ssa-loop-unswitch.c
===================================================================
--- tree-ssa-loop-unswitch.c	(revision 234516)
+++ tree-ssa-loop-unswitch.c	(working copy)
@@ -223,6 +223,8 @@  tree_unswitch_single_loop (struct loop *
       /* If the loop is not expected to iterate, there is no need
 	 for unswitching.  */
       iterations = estimated_loop_iterations_int (loop);
+      if (iterations < 0)
+        iterations = max_loop_iterations_int (loop);
       if (iterations >= 0 && iterations <= 1)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 234516)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -121,7 +121,11 @@  avg_loop_niter (struct loop *loop)
 {
   HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
   if (niter == -1)
-    return AVG_LOOP_NITER (loop);
+    {
+      niter = max_stmt_executions_int (loop);
+      if (niter == -1 || niter > AVG_LOOP_NITER (loop))
+        return AVG_LOOP_NITER (loop);
+    }
 
   return niter;
 }
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 234516)
+++ tree-vect-loop.c	(working copy)
@@ -2063,6 +2063,9 @@  start_over:
 
   estimated_niter
     = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter != -1)
+    estimated_niter
+      = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
   if (estimated_niter != -1
       && ((unsigned HOST_WIDE_INT) estimated_niter
           <= MAX (th, (unsigned)min_profitable_estimate)))