diff mbox

Enable elimination of IV use with unsigned type candidate

Message ID CAHFci2-ZJdXrEbV4yQG6UjqswiPp-B8trwrgAn2NAzm4f1fdRw@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng July 1, 2014, 9:32 a.m. UTC
Sorry for this late reply, I spent some time in understanding the problem.

On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote:

>> expressions.  It's possible to have iv_elimination_compare_lt to do some
>> undo transformation on may_be_zero, but I found it's difficult for cases
>> involving signed/unsigned conversion like case loop-41.c.  Since I think
>> there is no obvious benefit to fold may_be_zero here (somehow because the
>> two operands are already in folded forms), this patch just calls build2_loc
>> instead.
>
> But it may fold to true/false, no?
You are right, it can be folded to false in many cases.  Thus I
managed to check specific folded forms of may_be_zero, as in attached
patch.  So far it works for tests I added, but there are some other
folded forms not handled.

>

>>
>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>> iv_elimination_compare_lt are like below with i_1 of signed type:
>> B:             i_1 + 1
>> A:             0
>> niter->niter:  (unsigned int)i_1
>>
>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>> Without this patch, it is considered equal to each other.
>
> just looking at this part.  Do you have a testcase that exhibits a difference
> when just applying patch A?  So I can have a look here?
>
> From the code in iv_elimination_compare_lt I can't figure why we'd
> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
> type.
>
> I suppose the issue may be that at tree_to_aff_combination time
> we strip all nops with STRIP_NOPS but when operating on ->rest
> via convert/scale or add we do not strip them again.
>
> But then 'nit' should be i_1, not (unsigned int)i_1.  So the analysis above
> really doesn't look correct.
>
> Just to make sure we don't paper over an issue in tree-affine.c.
>
> Thus - testcase?  On x86 we don't run into this place in
> iv_elimination_compare_lt (on an unpatched tree).
>
> CCing Zdenek for the meat of patch B.

I am more and more convinced that check of overflow in function
iv_elimination_compare_lt is not sufficient.  Considering example
given by comments just before the function:
/* Tries to replace loop exit by one formulated in terms of a LT_EXPR
   comparison with CAND.  NITER describes the number of iterations of
   the loops.  If successful, the comparison in COMP_P is altered accordingly.

   We aim to handle the following situation:

   sometype *base, *p;
   int a, b, i;

   i = a;
   p = p_0 = base + a;

   do
     {
       bla (*p);
       p++;
       i++;
     }
   while (i < b);

   Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
   We aim to optimize this to

   p = p_0 = base + a;
   do
     {
       bla (*p);
       p++;
     }
   while (p < p_0 - a + b);

   This preserves the correctness, since the pointer arithmetics does not
   overflow.  More precisely:

   1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
      overflow in computing it or the values of p.
   2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
      overflow.  To prove this, we use the fact that p_0 = base + a.  */

The example seems wrong to me, because the statement only stands for
unsigned case like ivopts-lt.c:


#include "stdint.h"

void
f1 (char *p, uintptr_t i, uintptr_t n)
{
  p += i;
  do
    {
      *p = '\0';
      p += 1;
      i++;
    }
  while (i < n);
}

If i/n are of signed type, the 2nd point for "a+1>b" scenario in the
comment isn't right, because of below facts:
a) niter of loop are calculated in unsigned type as in function
number_of_iterations_lt.  So loop information for this case is like:

Analyzing # of iterations of loop 1
  exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D)
  bounds on difference of bases: -4294967295 ... 4294967294
  result:
    zero if i_4(D) >= n_12(D)
    # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) -
1, bounded by 4294967294
  number of iterations ((unsigned int) n_12(D) - (unsigned int)
i_4(D)) - 1; zero if i_4(D) >= n_12(D)

b) The (unsigned int)n_12(D) could overflow, the check in
iv_elimination_compare_lt doesn't take this fact into consideration.

Tricky thing is GCC now doesn't eliminate "i < n" with point P for
signed version of case.  The reason behind is again the may_be_zero
information.  The original form of may_be_zero is like "i_4(D) + 1>
n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both
i_4/n_12 are of signed type and don't wrap.  In the end, function
iv_elimination_compare_lt only handles the original form of
may_be_zero.
So below case is the closest one to illustrate the problem:

#include "stdint.h"

void
f1 (char *p, int i, int n)
{
  p += i;
  do
    {
      *p = '\0';
      p += 1;
      i++;
    }
  while (i < n);
}

char arr[100] = "hello\n";

int main(void)
{
  f1 (arr, 1, -1);

  return 0;
}

If we build the original form of may_be_zero with change:
bnds->up, false),
                                     TYPE_SIGN (niter_type));

The case will be mis-compiled.

One the other handle, iv_elimination_compare_lt for loops with
may_be_zero can be further improved to handle cases other than pointer
candidate.  My original patch is intended to handle unsigned type
candidates, but there are other cases.

The problem here is a little complicated, especially if we want to
improve iv elimination, I may mis-read the code somehow.  So any idea
about this?

Thanks,
bin
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-41.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-41.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-41.c	(revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+typedef long unsigned int size_t;
+extern float a[100], b[100];
+
+int foo (int M, int l)
+{
+  unsigned ivtmp = 0, niters, _37, _38, bnd;
+  size_t _67, _1;
+  float *vectp_a, *vectp_b, *vectp_a2;
+  float vect__6, vect__7, vect__8;
+
+  _38 = (unsigned int)l;
+  bnd = _38 + 1;
+
+  _1 = (size_t) M;
+  _67 = _1 * 4;
+  vectp_a = a; vectp_b = b; vectp_a2 = a + _67;
+
+  do
+    {
+      vect__6 = *vectp_a;
+      vect__7 = *vectp_b;
+      vect__8 = vect__6 + vect__7;
+      *vectp_a = vect__8;
+      vectp_a = vectp_a + 4;
+      vectp_b = vectp_b + 4;
+      vectp_a2 = vectp_a2 + 4;
+      ivtmp += 1;
+    }
+  while (ivtmp < bnd);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(bnd.*\\)" 1 "ivopts" } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-40.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-40.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-40.c	(revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+typedef long unsigned int size_t;
+extern float a[100], b[100];
+
+int foo (int M, unsigned int l)
+{
+  unsigned ivtmp = 0, niters, _37, _38, bnd;
+  size_t _67, _1;
+  float *vectp_a, *vectp_b, *vectp_a2;
+  float vect__6, vect__7, vect__8;
+
+  _38 = (unsigned int)l;
+  bnd = _38 + 1;
+
+  _1 = (size_t) M;
+  _67 = _1 * 4;
+  vectp_a = a; vectp_b = b; vectp_a2 = a + _67;
+
+  do
+    {
+      vect__6 = *vectp_a;
+      vect__7 = *vectp_b;
+      vect__8 = vect__6 + vect__7;
+      *vectp_a = vect__8;
+      vectp_a = vectp_a + 4;
+      vectp_b = vectp_b + 4;
+      vectp_a2 = vectp_a2 + 4;
+      ivtmp += 1;
+    }
+  while (ivtmp < bnd);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(bnd.*\\)" 1 "ivopts" } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 211966)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4432,6 +4432,44 @@ iv_period (struct iv *iv)
   return period;
 }
 
+/* Returns no wrapping period of induction variable IV.  For now
+   only unsigned type IV is handled, we could extend it in case
+   of non-overflow for signed ones.  Return zero if it can't be
+   decided.  */
+
+static tree
+iv_nowrap_period (struct iv *iv)
+{
+  bool overflow;
+  tree type;
+  tree base = iv->base, step = iv->step;
+  widest_int base_val, step_val, max_val, span, period;
+
+  gcc_assert (step && TREE_CODE (step) == INTEGER_CST);
+
+  type = TREE_TYPE (base);
+  if (!TYPE_UNSIGNED (type) || TREE_CODE (base) != INTEGER_CST)
+    return integer_zero_node;
+
+  base_val = wi::to_widest (base);
+  step_val = wi::to_widest (step);
+  if (!POINTER_TYPE_P (type) && TYPE_MAX_VALUE (type)
+      && TREE_CODE (TYPE_MAX_VALUE (type)) == INTEGER_CST)
+    max_val = wi::to_widest (TYPE_MAX_VALUE (type));
+  else
+    {
+      wide_int max_wi = wi::max_value (TYPE_PRECISION (type), UNSIGNED);
+      max_val = wi::to_widest (wide_int_to_tree (type, max_wi));
+    }
+
+  span = max_val - base_val + step_val - 1;
+  period = wi::div_trunc (span, step_val, UNSIGNED, &overflow);
+  if (overflow)
+    return integer_zero_node;
+
+  return wide_int_to_tree (type, period);
+}
+
 /* Returns the comparison operator used when eliminating the iv USE.  */
 
 static enum tree_code
@@ -4560,7 +4598,84 @@ difference_cannot_overflow_p (tree base, tree offs
     }
 }
 
-/* Tries to replace loop exit by one formulated in terms of a LT_EXPR
+/* Check whether PERIOD of CAND is greater than the number of iterations
+   described by DESC for which the exit condition is true.  The exit
+   condition is comparison against USE.  */
+
+static bool
+period_greater_niter_exit (struct ivopts_data *data,
+			   struct iv_use *use, struct iv_cand *cand,
+			   tree period, struct tree_niter_desc *desc)
+{
+  struct loop *loop = data->current_loop;
+
+  /* If the number of iterations is constant, compare against it directly.  */
+  if (TREE_CODE (desc->niter) == INTEGER_CST)
+    {
+      /* See cand_value_at.  */
+      if (stmt_after_increment (loop, cand, use->stmt))
+        {
+          if (!tree_int_cst_lt (desc->niter, period))
+            return false;
+        }
+      else
+        {
+          if (tree_int_cst_lt (period, desc->niter))
+            return false;
+        }
+    }
+
+  /* If not, and if this is the only possible exit of the loop, see whether
+     we can get a conservative estimate on the number of iterations of the
+     entire loop and compare against that instead.  */
+  else
+    {
+      widest_int period_value, max_niter;
+
+      max_niter = desc->max;
+      if (stmt_after_increment (loop, cand, use->stmt))
+        max_niter += 1;
+      period_value = wi::to_widest (period);
+      if (wi::gtu_p (max_niter, period_value))
+        {
+          /* See if we can take advantage of inferred loop bound information.  */
+          if (data->loop_single_exit_p)
+            {
+              if (!max_loop_iterations (loop, &max_niter))
+                return false;
+              /* The loop bound is already adjusted by adding 1.  */
+              if (wi::gtu_p (max_niter, period_value))
+                return false;
+            }
+          else
+            return false;
+        }
+    }
+
+  return true;
+}
+
+/* Determine whether no wrapping period of CAND is greater than
+   the number of iterations described by DESC for which exit
+   conditionis true.  The exit condition is comparison against USE.  */
+
+static bool
+nowrap_cand_for_loop_niter_p (struct ivopts_data *data,
+			      struct iv_use *use,
+			      struct iv_cand *cand,
+			      struct tree_niter_desc *desc)
+{
+  tree period;
+  
+  period = iv_nowrap_period (cand->iv);
+  if (period != integer_zero_node
+      && period_greater_niter_exit (data, use, cand, period, desc))
+    return true;
+
+  return false;
+}
+
+/* Tries to replace loop exit in USE by one formulated in terms of a LT_EXPR
    comparison with CAND.  NITER describes the number of iterations of
    the loops.  If successful, the comparison in COMP_P is altered accordingly.
 
@@ -4597,11 +4712,17 @@ difference_cannot_overflow_p (tree base, tree offs
    1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
       overflow in computing it or the values of p.
    2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
-      overflow.  To prove this, we use the fact that p_0 = base + a.  */
+      overflow.  To prove this, we use the fact that p_0 = base + a.
 
+
+   Moreover, with more complex overflow analysis for unsigned type, it is
+   possible to eliminate I with P if P is an induction candidate of unsigned
+   integer type other than pointer if we can prove that P doesn't  overflow
+   during all iterations of current loop.  */
+
 static bool
-iv_elimination_compare_lt (struct ivopts_data *data,
-                           struct iv_cand *cand, enum tree_code *comp_p,
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+			   struct iv_cand *cand, enum tree_code *comp_p,
 			   struct tree_niter_desc *niter)
 {
   tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset;
@@ -4610,11 +4731,13 @@ static bool
   HOST_WIDE_INT step;
 
   /* We need to know that the candidate induction variable does not overflow.
-     While more complex analysis may be used to prove this, for now just
-     check that the variable appears in the original program and that it
-     is computed in a type that guarantees no overflows.  */
+     Apart from checking that the variable appears in the original program
+     and that is is computed in a type that guarantees no overflows, we also
+     check no wrapping periord of the variable covers the niter if it has
+     unsigned type.  More complex analysis may be used to prove this.  */
   cand_type = TREE_TYPE (cand->iv->base);
-  if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
+  if ((cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
+      && !nowrap_cand_for_loop_niter_p (data, use, cand, niter))
     return false;
 
   /* Make sure that the loop iterates till the loop bound is hit, as otherwise
@@ -4657,11 +4780,54 @@ static bool
       else
 	return false;
     }
+  else if (TREE_CODE (mbz) == EQ_EXPR)
+    {
+      /* Check whether may_be_zero is in below two forms:
+	   1) b' == TYPE_MAX_VALUE (TREE_TYPE (b'))
+	      where b' is of type nit_type.  The expression could be
+	      folded from "b' + 1 < 1".  In this case, A is "0" and B
+	      is "b' + 1".
+	   2) b' == -1
+	      where b' is of signed type which has nit_type as its
+	      unsigned counterpart.  The expression could be folded
+	      from "(nit_type)b' + 1 < 1".  In this case, A is "0"
+	      and "(nit_type)b' + 1".  */
+      tree type;
+      tree op0 = TREE_OPERAND (mbz, 0);
+      tree op1 = TREE_OPERAND (mbz, 1);
+
+      if (TREE_CODE (op1) != INTEGER_CST)
+	{
+	  tree tmp = op0;
+	  op0 = op1;
+	  op1 = tmp;
+	}
+      type = TREE_TYPE (op0);
+      if (TREE_CODE (op1) != INTEGER_CST)
+	return false;
+
+      if (!TYPE_UNSIGNED (type)
+	  && types_compatible_p (unsigned_type_for (type), nit_type))
+	{
+	  type = nit_type;
+	  op0 = fold_convert_loc (UNKNOWN_LOCATION, nit_type, op0);
+	  op1 = fold_convert_loc (UNKNOWN_LOCATION, nit_type, op1);
+	}
+
+      if (!TYPE_UNSIGNED (type)
+	  || !TYPE_MAX_VALUE (type)
+	  || TREE_CODE (op1) != INTEGER_CST
+	  || !operand_equal_p (op1, TYPE_MAX_VALUE (type), 0))
+	return false;
+
+      a = integer_zero_node;
+      b = fold_build2 (PLUS_EXPR, type, op0, integer_one_node);
+    }
   else
     return false;
 
   /* Expected number of iterations is B - A - 1.  Check that it matches
-     the actual number, i.e., that B - A - NITER = 1.  */
+     the actual number, i.e., that B - A = NITER + 1.  */
   tree_to_aff_combination (niter->niter, nit_type, &nit);
   tree_to_aff_combination (fold_convert (nit_type, a), nit_type, &tmpa);
   tree_to_aff_combination (fold_convert (nit_type, b), nit_type, &tmpb);
@@ -4673,12 +4839,15 @@ static bool
     return false;
 
   /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not
-     overflow.  */
-  offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step),
-			cand->iv->step,
-			fold_convert (TREE_TYPE (cand->iv->step), a));
-  if (!difference_cannot_overflow_p (cand->iv->base, offset))
-    return false;
+     overflow.  The check is uncessary when A equals to ZERO.  */
+  if (a != integer_zero_node)
+    {
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step),
+			    cand->iv->step,
+			    fold_convert (TREE_TYPE (cand->iv->step), a));
+      if (!difference_cannot_overflow_p (cand->iv->base, offset))
+	return false;
+    }
 
   /* Determine the new comparison operator.  */
   comp = step < 0 ? GT_EXPR : LT_EXPR;
@@ -4733,50 +4902,9 @@ may_eliminate_iv (struct ivopts_data *data,
      This is the case iff the period of the induction variable is greater
      than the number of iterations for which the exit condition is true.  */
   period = iv_period (cand->iv);
+  if (!period_greater_niter_exit (data, use, cand, period, desc))
+    return false;
 
-  /* If the number of iterations is constant, compare against it directly.  */
-  if (TREE_CODE (desc->niter) == INTEGER_CST)
-    {
-      /* See cand_value_at.  */
-      if (stmt_after_increment (loop, cand, use->stmt))
-        {
-          if (!tree_int_cst_lt (desc->niter, period))
-            return false;
-        }
-      else
-        {
-          if (tree_int_cst_lt (period, desc->niter))
-            return false;
-        }
-    }
-
-  /* If not, and if this is the only possible exit of the loop, see whether
-     we can get a conservative estimate on the number of iterations of the
-     entire loop and compare against that instead.  */
-  else
-    {
-      widest_int period_value, max_niter;
-
-      max_niter = desc->max;
-      if (stmt_after_increment (loop, cand, use->stmt))
-        max_niter += 1;
-      period_value = wi::to_widest (period);
-      if (wi::gtu_p (max_niter, period_value))
-        {
-          /* See if we can take advantage of inferred loop bound information.  */
-          if (data->loop_single_exit_p)
-            {
-              if (!max_loop_iterations (loop, &max_niter))
-                return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (wi::gtu_p (max_niter, period_value))
-                return false;
-            }
-          else
-            return false;
-        }
-    }
-
   cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
 
   *bound = fold_convert (TREE_TYPE (cand->iv->base),
@@ -4796,7 +4924,7 @@ may_eliminate_iv (struct ivopts_data *data,
 	   base the exit condition on it.  However, that is often too
 	   expensive.  */
   if (!integer_zerop (desc->may_be_zero))
-    return iv_elimination_compare_lt (data, cand, comp, desc);
+    return iv_elimination_compare_lt (data, use, cand, comp, desc);
 
   return true;
 }

Comments

Bin.Cheng July 1, 2014, 10:53 a.m. UTC | #1
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Sorry for this late reply, I spent some time in understanding the problem.
>
> On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>
>>> expressions.  It's possible to have iv_elimination_compare_lt to do some
>>> undo transformation on may_be_zero, but I found it's difficult for cases
>>> involving signed/unsigned conversion like case loop-41.c.  Since I think
>>> there is no obvious benefit to fold may_be_zero here (somehow because the
>>> two operands are already in folded forms), this patch just calls build2_loc
>>> instead.
>>
>> But it may fold to true/false, no?
> You are right, it can be folded to false in many cases.  Thus I
> managed to check specific folded forms of may_be_zero, as in attached
> patch.  So far it works for tests I added, but there are some other
> folded forms not handled.
>
>>
>
>>>
>>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>>> iv_elimination_compare_lt are like below with i_1 of signed type:
>>> B:             i_1 + 1
>>> A:             0
>>> niter->niter:  (unsigned int)i_1
>>>
>>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>>> Without this patch, it is considered equal to each other.
>>
>> just looking at this part.  Do you have a testcase that exhibits a difference
>> when just applying patch A?  So I can have a look here?
>>
>> From the code in iv_elimination_compare_lt I can't figure why we'd
>> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
>> type.
>>
>> I suppose the issue may be that at tree_to_aff_combination time
>> we strip all nops with STRIP_NOPS but when operating on ->rest
>> via convert/scale or add we do not strip them again.
>>
>> But then 'nit' should be i_1, not (unsigned int)i_1.  So the analysis above
>> really doesn't look correct.
>>
>> Just to make sure we don't paper over an issue in tree-affine.c.
>>
>> Thus - testcase?  On x86 we don't run into this place in
>> iv_elimination_compare_lt (on an unpatched tree).
>>
>> CCing Zdenek for the meat of patch B.
>
> I am more and more convinced that check of overflow in function
> iv_elimination_compare_lt is not sufficient.  Considering example
> given by comments just before the function:
> /* Tries to replace loop exit by one formulated in terms of a LT_EXPR
>    comparison with CAND.  NITER describes the number of iterations of
>    the loops.  If successful, the comparison in COMP_P is altered accordingly.
>
>    We aim to handle the following situation:
>
>    sometype *base, *p;
>    int a, b, i;
>
>    i = a;
>    p = p_0 = base + a;
>
>    do
>      {
>        bla (*p);
>        p++;
>        i++;
>      }
>    while (i < b);
>
>    Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
>    We aim to optimize this to
>
>    p = p_0 = base + a;
>    do
>      {
>        bla (*p);
>        p++;
>      }
>    while (p < p_0 - a + b);
>
>    This preserves the correctness, since the pointer arithmetics does not
>    overflow.  More precisely:
>
>    1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
>       overflow in computing it or the values of p.
>    2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
>       overflow.  To prove this, we use the fact that p_0 = base + a.  */
>
> The example seems wrong to me, because the statement only stands for
> unsigned case like ivopts-lt.c:
>
>
> #include "stdint.h"
>
> void
> f1 (char *p, uintptr_t i, uintptr_t n)
> {
>   p += i;
>   do
>     {
>       *p = '\0';
>       p += 1;
>       i++;
>     }
>   while (i < n);
> }
>
> If i/n are of signed type, the 2nd point for "a+1>b" scenario in the
> comment isn't right, because of below facts:
> a) niter of loop are calculated in unsigned type as in function
> number_of_iterations_lt.  So loop information for this case is like:
>
> Analyzing # of iterations of loop 1
>   exit condition [i_4(D) + 1, + , 1](no_overflow) < n_12(D)
>   bounds on difference of bases: -4294967295 ... 4294967294
>   result:
>     zero if i_4(D) >= n_12(D)
>     # of iterations ((unsigned int) n_12(D) - (unsigned int) i_4(D)) -
> 1, bounded by 4294967294
>   number of iterations ((unsigned int) n_12(D) - (unsigned int)
> i_4(D)) - 1; zero if i_4(D) >= n_12(D)
>
> b) The (unsigned int)n_12(D) could overflow, the check in
> iv_elimination_compare_lt doesn't take this fact into consideration.
>
> Tricky thing is GCC now doesn't eliminate "i < n" with point P for
> signed version of case.  The reason behind is again the may_be_zero
> information.  The original form of may_be_zero is like "i_4(D) + 1>
> n_12(D)", which is folded into "i_4(D) >= n_12(D)" because both
> i_4/n_12 are of signed type and don't wrap.  In the end, function
> iv_elimination_compare_lt only handles the original form of
> may_be_zero.
> So below case is the closest one to illustrate the problem:
>
> #include "stdint.h"
>
> void
> f1 (char *p, int i, int n)
> {
>   p += i;
>   do
>     {
>       *p = '\0';
>       p += 1;
>       i++;
>     }
>   while (i < n);
> }
>
> char arr[100] = "hello\n";
>
> int main(void)
> {
>   f1 (arr, 1, -1);
>
>   return 0;
> }
>
> If we build the original form of may_be_zero with change:
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c   (revision 211966)
> +++ gcc/tree-ssa-loop-niter.c   (working copy)
> @@ -1153,8 +1153,13 @@
>          condition.  */
>
>        if (mpz_sgn (bnds->below) < 0)
> +#if 0
>         niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node,
>                                           iv1->base, iv0->base);
> +#else
> +       niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR,
> boolean_type_node,
> +                                         iv1->base, iv0->base);
> +#endif
>        niter->niter = delta;
>        niter->max = widest_int::from (wi::from_mpz (niter_type,
> bnds->up, false),
>                                      TYPE_SIGN (niter_type));
>
> The case will be mis-compiled.
>
> One the other handle, iv_elimination_compare_lt for loops with
> may_be_zero can be further improved to handle cases other than pointer
> candidate.  My original patch is intended to handle unsigned type
> candidates, but there are other cases.
>
> The problem here is a little complicated, especially if we want to
> improve iv elimination, I may mis-read the code somehow.  So any idea
> about this?
>

BTW, for below gimple dump before ivopt:
foo (int l)
{
  long long int sum;
  long long int j;
  int i;
  int _8;

  <bb 2>:

  <bb 3>:
  # i_1 = PHI <0(2), i_4(4)>
  # j_2 = PHI <0(2), j_5(4)>
  # sum_3 = PHI <0(2), sum_6(4)>
  i_4 = i_1 + 1;
  j_5 = j_2 + 1;
  sum_6 = sum_3 + j_5;
  if (i_4 < l_7(D))
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # sum_12 = PHI <sum_6(3)>
  _8 = (int) sum_12;
  return _8;

}

The loop information is like:

;;
;; Loop 1
;;  header 3, latch 4
;;  depth 1, outer 0
;;  nodes: 3 4
Processing loop 1
  single exit 3 -> 5, exit condition if (i_4 < l_7(D))


Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < l_7(D)
  bounds on difference of bases: -2147483649 ... 2147483646
  result:
    zero if l_7(D) < 1
    # of iterations (unsigned int) l_7(D) + 4294967295, bounded by 2147483646
  number of iterations (unsigned int) l_7(D) + 4294967295; zero if l_7(D) < 1

Question is, is it safe to eliminate "i_4 < l_7(D)" as below?
Replacing exit test: if (i_4 < l_7(D))
foo (int l)
{
  long long int sum;
  long long int j;
  int i;
  int _8;
  long long int _10;
  unsigned int _11;
  unsigned long _13;
  unsigned long _14;
  unsigned int _15;

  <bb 2>:
  _11 = (unsigned int) l_7(D);
  _15 = _11 + 4294967295;
  _14 = (unsigned long) _15;
  _13 = _14 + 1;
  _10 = (long long int) _13;

  <bb 3>:
  # j_2 = PHI <0(2), j_5(4)>
  # sum_3 = PHI <0(2), sum_6(4)>
  j_5 = j_2 + 1;
  sum_6 = sum_3 + j_5;
  if (j_5 < _10)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

  <bb 5>:
  # sum_12 = PHI <sum_6(3)>
  _8 = (int) sum_12;
  return _8;

}

Thanks,
bin
Bin.Cheng July 3, 2014, 1:58 p.m. UTC | #2
On Tue, Jul 1, 2014 at 10:32 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Sorry for this late reply, I spent some time in understanding the problem.
>
> On Tue, Jun 24, 2014 at 12:36 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 23, 2014 at 11:49 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>
>>> expressions.  It's possible to have iv_elimination_compare_lt to do some
>>> undo transformation on may_be_zero, but I found it's difficult for cases
>>> involving signed/unsigned conversion like case loop-41.c.  Since I think
>>> there is no obvious benefit to fold may_be_zero here (somehow because the
>>> two operands are already in folded forms), this patch just calls build2_loc
>>> instead.
>>
>> But it may fold to true/false, no?
> You are right, it can be folded to false in many cases.  Thus I
> managed to check specific folded forms of may_be_zero, as in attached
> patch.  So far it works for tests I added, but there are some other
> folded forms not handled.
>
>>
>
>>>
>>> When GCC trying to eliminate use 0 with cand 0, the miscellaneous trees in
>>> iv_elimination_compare_lt are like below with i_1 of signed type:
>>> B:             i_1 + 1
>>> A:             0
>>> niter->niter:  (unsigned int)i_1
>>>
>>> Apparently, (B-A-1) is "i_1", which doesn't equal to "(unsigned int)i_1".
>>> Without this patch, it is considered equal to each other.
>>
>> just looking at this part.  Do you have a testcase that exhibits a difference
>> when just applying patch A?  So I can have a look here?
>>
>> From the code in iv_elimination_compare_lt I can't figure why we'd
>> end up with i_1 instead of (unsigned int)i_1 as we convert to a common
>> type.
>>
>> I suppose the issue may be that at tree_to_aff_combination time
>> we strip all nops with STRIP_NOPS but when operating on ->rest
>> via convert/scale or add we do not strip them again.
>>
>> But then 'nit' should be i_1, not (unsigned int)i_1.  So the analysis above
>> really doesn't look correct.
>>
>> Just to make sure we don't paper over an issue in tree-affine.c.
>>
>> Thus - testcase?  On x86 we don't run into this place in
>> iv_elimination_compare_lt (on an unpatched tree).
>>
>> CCing Zdenek for the meat of patch B.
>

For the record, here is my understanding about the problem.
Considering below simple loop, and we don't restrict P to any specific
type as GCC now does:

   some_type p, p_0;
   int a, b, i;

   i = a;
   p = p_0

   do
     {
       use(p);
       p += step;
       i++;
     }
   while (i < b);

We want to optimization it into below form, given there is no
overflow/wrap behavior introduced, if NITER is the number of the loop.

   p = p_0;
   do
     {
       use(p);
       p += step;
     }
   while (p < p_0 + (NITER + 1) * step);

For convenient, we assume positive step for now.

I think it's safe (in other words, no new overflow or wrap), if below
two conditions are satisfied:
1) When the original latch executes for N(>0) times, the new latch
executes for same times.
2) When the original latch executes ZERO time, the new latch executes
for ZERO time.

For 1), expression "p_0 + (NITER + 1) * step" should not overflow/wrap upward.
This is checked now by code snippet like:
  /* We need to know that the candidate induction variable does not overflow.
     While more complex analysis may be used to prove this, for now just
     check that the variable appears in the original program and that it
     is computed in a type that guarantees no overflows.  */
  cand_type = TREE_TYPE (cand->iv->base);
  if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
    return false;

GCC only checks if the variable appears originally and is of type not
overflow/wrap.  Just as the comment states, it can be improved
(something like introduced in my patch).

For 2), expression "p_0 + (NITER + 1) * step" should not overflow/wrap
downward, in other words, "p >= p_0 + (NITER + 1) * step" should hold
when may_be_zero (i.e, "a + 1 > b") holds.  Since NITER is computed as
(unsigned_type)B - (unsigned_type)A - 1, the new bound is effective in
form of "p_0 + (unsigned_type)B * step - (unsigned_type)A * step".
Due to the folded form of may_be_zero, GCC only handles cases in which
B/A are of unsigned type, we only need to make sure that "p_0 -
(unsigned_type)A * step" holds, given "(unsigned_type)B <=
(unsigned_type)A" already holds.

When comes to cases in which B is of signed type, we need to make sure
that "(unsigned_type)B" doesn't overflow/wrap.

So the conclusions are:
1) With the original patch B, patch A is needed because we relax GCC
for cases in which B is of signed type.
2) With the updated patch B, patch A won't be needed.


Thanks,
bin
diff mbox

Patch

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c   (revision 211966)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -1153,8 +1153,13 @@ 
         condition.  */

       if (mpz_sgn (bnds->below) < 0)
+#if 0
        niter->may_be_zero = fold_build2 (LT_EXPR, boolean_type_node,
                                          iv1->base, iv0->base);
+#else
+       niter->may_be_zero = build2_loc (UNKNOWN_LOCATION, LT_EXPR,
boolean_type_node,
+                                         iv1->base, iv0->base);
+#endif
       niter->niter = delta;
       niter->max = widest_int::from (wi::from_mpz (niter_type,