diff mbox

Improve how we handle overflow for type conversion in scev/ivopts, part I

Message ID 000101d092e1$2da77330$88f65990$@arm.com
State New
Headers show

Commit Message

Bin Cheng May 20, 2015, 9:41 a.m. UTC
Hi,
As we know, GCC is too conservative when checking overflow behavior in SCEV
and loop related optimizers.  Result is some variable can't be recognized as
scalar evolution and thus optimizations are missed.  To be specific,
optimizers like ivopts and vectorizer are affected.
This issue is more severe on 64 bit platforms, for example, PR62173 is
failed on aarch64; scev-3.c and scev-4.c were marked as XFAIL on lp64
platforms.

As the first part to improve overflow checking in GCC, this patch does below
improvements:
  1) Ideally, chrec_convert should be responsible to convert scev like
"(type){base, step}" to scev like "{(type)base, (type)step} when the result
scev doesn't overflow; chrec_convert_aggressive should do the conversion if
the result scev could overflow/wrap.  Unfortunately, current implementation
may use chrec_convert_aggressive to return a scev that won't overflow.  This
is because of a) the static parameter "fold_conversions" for
instantiate_scev_convert can only tracks whether chrec_convert_aggressive
may be called, rather than if it does some overflow conversion or not;  b)
the implementation of instantiate_scev_convert sometimes shortcuts the call
to chrec_convert and misses conversion opportunities.  This patch improves
this.
  2) iv->no_overflow computed in simple_iv is too conservative.  With 1)
fixed, iv->no_overflow should reflects whether chrec_convert_aggressive does
return an overflow scev.  This patch improves this.
  3) chrec_convert should be able to prove the resulting scev won't overflow
with loop niter information.  This patch doesn't finish this, but it
factored a new interface out of scev_probably_wraps_p for future
improvement.  And that will be the part II patch.

With the improvements in SCEV, this patch also improves optimizer(IVOPT)
that uses scev information like below:
  For array reference in the form of arr[IV], GCC tries to derive new
address iv {arr+iv.base, iv.step*elem_size} from IV.  If IV overflow wrto a
type that is narrower than address space, this derivation is not true
because &arr[IV] isn't a scev.  Root cause why scev-*.c are failed now is
the overflow information of IV is too conservative.  IVOPT has to be
conservative to reject &arr[IV] as a scev.  With more accurate overflow
information, IVOPT can be improved too.  So this patch fixes the mentioned
long standing issues.

Bootstrap and test on x86_64, x86 and aarch64.
BTW, test gcc.target/i386/pr49781-1.c failed on x86_64, but I can confirmed
it's not this patch's fault.

So what's your opinion on this?.

Thanks,
bin

2015-05-20  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/62173
	* tree-ssa-loop-ivopts.c (struct iv): New field.  Reorder fields.
	(alloc_iv, set_iv): New parameter.
	(determine_biv_step): Delete.
	(find_bivs): Inline original determine_biv_step.  Pass new
	argument to set_iv.
	(idx_find_step): Use no_overflow information for conversion.
	* tree-scalar-evolution.c (analyze_scalar_evolution_in_loop): Let
	resolve_mixers handle folded_casts.
	(instantiate_scev_name): Change bool parameter to bool pointer.
	(instantiate_scev_poly, instantiate_scev_binary): Ditto.
	(instantiate_array_ref, instantiate_scev_not): Ditto.
	(instantiate_scev_3, instantiate_scev_2): Ditto.
	(instantiate_scev_1, instantiate_scev_r): Ditto.
	(instantiate_scev_convert, ): Change parameter.  Pass argument
	to chrec_convert_aggressive.
	(instantiate_scev): Change argument.
	(resolve_mixers): New parameter and set it.
	(scev_const_prop): New argument.
	* tree-scalar-evolution.h (resolve_mixers): New parameter.
	* tree-chrec.c (convert_affine_scev): Call chrec_convert instead
	of chrec_conert_1.
	(chrec_convert): New parameter.  Move definition below.
	(chrec_convert_aggressive): New parameter and set it.  Call
	convert_affine_scev.
	* tree-chrec.h (chrec_convert): New parameter.
	(chrec_convert_aggressive): Ditto.
	* tree-ssa-loop-niter.c (loop_exits_before_overflow): New function.
	(scev_probably_wraps_p): Factor loop niter related code into
	loop_exits_before_overflow.

gcc/testsuite/ChangeLog
2015-05-20  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/62173
	* gcc.dg/tree-ssa/scev-3.c: Remove xfail.
	* gcc.dg/tree-ssa/scev-4.c: Ditto.
	* gcc.dg/tree-ssa/scev-8.c: New.

Comments

Richard Biener May 22, 2015, 11:45 a.m. UTC | #1
On Wed, May 20, 2015 at 11:41 AM, Bin Cheng <bin.cheng@arm.com> wrote:
> Hi,
> As we know, GCC is too conservative when checking overflow behavior in SCEV
> and loop related optimizers.  Result is some variable can't be recognized as
> scalar evolution and thus optimizations are missed.  To be specific,
> optimizers like ivopts and vectorizer are affected.
> This issue is more severe on 64 bit platforms, for example, PR62173 is
> failed on aarch64; scev-3.c and scev-4.c were marked as XFAIL on lp64
> platforms.
>
> As the first part to improve overflow checking in GCC, this patch does below
> improvements:
>   1) Ideally, chrec_convert should be responsible to convert scev like
> "(type){base, step}" to scev like "{(type)base, (type)step} when the result
> scev doesn't overflow; chrec_convert_aggressive should do the conversion if
> the result scev could overflow/wrap.  Unfortunately, current implementation
> may use chrec_convert_aggressive to return a scev that won't overflow.  This
> is because of a) the static parameter "fold_conversions" for
> instantiate_scev_convert can only tracks whether chrec_convert_aggressive
> may be called, rather than if it does some overflow conversion or not;  b)
> the implementation of instantiate_scev_convert sometimes shortcuts the call
> to chrec_convert and misses conversion opportunities.  This patch improves
> this.
>   2) iv->no_overflow computed in simple_iv is too conservative.  With 1)
> fixed, iv->no_overflow should reflects whether chrec_convert_aggressive does
> return an overflow scev.  This patch improves this.
>   3) chrec_convert should be able to prove the resulting scev won't overflow
> with loop niter information.  This patch doesn't finish this, but it
> factored a new interface out of scev_probably_wraps_p for future
> improvement.  And that will be the part II patch.
>
> With the improvements in SCEV, this patch also improves optimizer(IVOPT)
> that uses scev information like below:
>   For array reference in the form of arr[IV], GCC tries to derive new
> address iv {arr+iv.base, iv.step*elem_size} from IV.  If IV overflow wrto a
> type that is narrower than address space, this derivation is not true
> because &arr[IV] isn't a scev.  Root cause why scev-*.c are failed now is
> the overflow information of IV is too conservative.  IVOPT has to be
> conservative to reject &arr[IV] as a scev.  With more accurate overflow
> information, IVOPT can be improved too.  So this patch fixes the mentioned
> long standing issues.
>
> Bootstrap and test on x86_64, x86 and aarch64.
> BTW, test gcc.target/i386/pr49781-1.c failed on x86_64, but I can confirmed
> it's not this patch's fault.
>
> So what's your opinion on this?.

I maybe mixing things up but does

+chrec_convert_aggressive (tree type, tree chrec, bool *fold_conversions)
 {
...
+  if (evolution_function_is_affine_p (chrec))
+    {
+      tree base, step;
+      struct loop *loop;
+
+      loop = get_chrec_loop (chrec);
+      base = CHREC_LEFT (chrec);
+      step = CHREC_RIGHT (chrec);
+      if (convert_affine_scev (loop, type, &base, &step, NULL, true))
+       return build_polynomial_chrec (loop->num, base, step);

^^^ not forget to set *fold_conversions to true?  Or we need to use
convert_affine_scev (..., false)?

+    }

(bah, and the diff somehow messes up -p context :/  which is why I like
context diffs more)

Other from the above the patch looks good to me.

Thanks,
Richard.

> Thanks,
> bin
>
> 2015-05-20  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/62173
>         * tree-ssa-loop-ivopts.c (struct iv): New field.  Reorder fields.
>         (alloc_iv, set_iv): New parameter.
>         (determine_biv_step): Delete.
>         (find_bivs): Inline original determine_biv_step.  Pass new
>         argument to set_iv.
>         (idx_find_step): Use no_overflow information for conversion.
>         * tree-scalar-evolution.c (analyze_scalar_evolution_in_loop): Let
>         resolve_mixers handle folded_casts.
>         (instantiate_scev_name): Change bool parameter to bool pointer.
>         (instantiate_scev_poly, instantiate_scev_binary): Ditto.
>         (instantiate_array_ref, instantiate_scev_not): Ditto.
>         (instantiate_scev_3, instantiate_scev_2): Ditto.
>         (instantiate_scev_1, instantiate_scev_r): Ditto.
>         (instantiate_scev_convert, ): Change parameter.  Pass argument
>         to chrec_convert_aggressive.
>         (instantiate_scev): Change argument.
>         (resolve_mixers): New parameter and set it.
>         (scev_const_prop): New argument.
>         * tree-scalar-evolution.h (resolve_mixers): New parameter.
>         * tree-chrec.c (convert_affine_scev): Call chrec_convert instead
>         of chrec_conert_1.
>         (chrec_convert): New parameter.  Move definition below.
>         (chrec_convert_aggressive): New parameter and set it.  Call
>         convert_affine_scev.
>         * tree-chrec.h (chrec_convert): New parameter.
>         (chrec_convert_aggressive): Ditto.
>         * tree-ssa-loop-niter.c (loop_exits_before_overflow): New function.
>         (scev_probably_wraps_p): Factor loop niter related code into
>         loop_exits_before_overflow.
>
> gcc/testsuite/ChangeLog
> 2015-05-20  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/62173
>         * gcc.dg/tree-ssa/scev-3.c: Remove xfail.
>         * gcc.dg/tree-ssa/scev-4.c: Ditto.
>         * gcc.dg/tree-ssa/scev-8.c: New.
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/scev-4.c	(revision 222758)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-4.c	(working copy)
@@ -20,5 +20,5 @@  f(int k)
         }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail { lp64 || llp64 } } } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/scev-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/scev-8.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-8.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (long long s, long long l)
+{
+  long long i;
+
+  for (i = s; i < l; i++)
+    {
+      a[(short)i] = 0;
+    }
+  return 0;
+}
+
+/* Address of array reference is not scev.  */
+/* { dg-final { scan-tree-dump-not "use \[0-9\]\n  address" "ivopts" } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/scev-3.c	(revision 222758)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-3.c	(working copy)
@@ -15,5 +15,5 @@  f(int k)
         }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail { lp64 || llp64 } } } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 222758)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -171,9 +171,10 @@  struct iv
   tree base_object;	/* A memory object to that the induction variable points.  */
   tree step;		/* Step of the iv (constant only).  */
   tree ssa_name;	/* The ssa name with the value.  */
+  unsigned use_id;	/* The identifier in the use if it is the case.  */
   bool biv_p;		/* Is it a biv?  */
   bool have_use_for;	/* Do we already have a use for it?  */
-  unsigned use_id;	/* The identifier in the use if it is the case.  */
+  bool no_overflow;	/* True if the iv doesn't overflow.  */
 };
 
 /* Per-ssa version information (induction variable descriptions, etc.).  */
@@ -991,10 +992,10 @@  contain_complex_addr_expr (tree expr)
 }
 
 /* Allocates an induction variable with given initial value BASE and step STEP
-   for loop LOOP.  */
+   for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  */
 
 static struct iv *
-alloc_iv (tree base, tree step)
+alloc_iv (tree base, tree step, bool no_overflow = false)
 {
   tree expr = base;
   struct iv *iv = XCNEW (struct iv);
@@ -1021,21 +1022,24 @@  static struct iv *
   iv->have_use_for = false;
   iv->use_id = 0;
   iv->ssa_name = NULL_TREE;
+  iv->no_overflow = no_overflow;
 
   return iv;
 }
 
-/* Sets STEP and BASE for induction variable IV.  */
+/* Sets STEP and BASE for induction variable IV.  NO_OVERFLOW implies the IV
+   doesn't overflow.  */
 
 static void
-set_iv (struct ivopts_data *data, tree iv, tree base, tree step)
+set_iv (struct ivopts_data *data, tree iv, tree base, tree step,
+	bool no_overflow)
 {
   struct version_info *info = name_info (data, iv);
 
   gcc_assert (!info->iv);
 
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (iv));
-  info->iv = alloc_iv (base, step);
+  info->iv = alloc_iv (base, step, no_overflow);
   info->iv->ssa_name = iv;
 }
 
@@ -1057,31 +1061,12 @@  get_iv (struct ivopts_data *data, tree var)
 
       if (!bb
 	  || !flow_bb_inside_loop_p (data->current_loop, bb))
-	set_iv (data, var, var, build_int_cst (type, 0));
+	set_iv (data, var, var, build_int_cst (type, 0), true);
     }
 
   return name_info (data, var)->iv;
 }
 
-/* Determines the step of a biv defined in PHI.  Returns NULL if PHI does
-   not define a simple affine biv with nonzero step.  */
-
-static tree
-determine_biv_step (gphi *phi)
-{
-  struct loop *loop = gimple_bb (phi)->loop_father;
-  tree name = PHI_RESULT (phi);
-  affine_iv iv;
-
-  if (virtual_operand_p (name))
-    return NULL_TREE;
-
-  if (!simple_iv (loop, loop, name, &iv, true))
-    return NULL_TREE;
-
-  return integer_zerop (iv.step) ? NULL_TREE : iv.step;
-}
-
 /* Return the first non-invariant ssa var found in EXPR.  */
 
 static tree
@@ -1115,6 +1100,7 @@  static bool
 find_bivs (struct ivopts_data *data)
 {
   gphi *phi;
+  affine_iv iv;
   tree step, type, base, stop;
   bool found = false;
   struct loop *loop = data->current_loop;
@@ -1127,10 +1113,16 @@  find_bivs (struct ivopts_data *data)
       if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
 	continue;
 
-      step = determine_biv_step (phi);
-      if (!step)
+      if (virtual_operand_p (PHI_RESULT (phi)))
 	continue;
 
+      if (!simple_iv (loop, loop, PHI_RESULT (phi), &iv, true))
+	continue;
+
+      if (integer_zerop (iv.step))
+	continue;
+
+      step = iv.step;
       base = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
       /* Stop expanding iv base at the first ssa var referred by iv step.
 	 Ideally we should stop at any ssa var, because that's expensive
@@ -1153,7 +1145,7 @@  find_bivs (struct ivopts_data *data)
 	    step = fold_convert (type, step);
 	}
 
-      set_iv (data, PHI_RESULT (phi), base, step);
+      set_iv (data, PHI_RESULT (phi), base, step, iv.no_overflow);
       found = true;
     }
 
@@ -1256,7 +1248,7 @@  find_givs_in_stmt (struct ivopts_data *data, gimpl
   if (!find_givs_in_stmt_scev (data, stmt, &iv))
     return;
 
-  set_iv (data, gimple_assign_lhs (stmt), iv.base, iv.step);
+  set_iv (data, gimple_assign_lhs (stmt), iv.base, iv.step, iv.no_overflow);
 }
 
 /* Finds general ivs in basic block BB.  */
@@ -1614,6 +1606,7 @@  idx_find_step (tree base, tree *idx, void *data)
 {
   struct ifs_ivopts_data *dta = (struct ifs_ivopts_data *) data;
   struct iv *iv;
+  bool use_overflow_semantics = false;
   tree step, iv_base, iv_step, lbound, off;
   struct loop *loop = dta->ivopts_data->current_loop;
 
@@ -1673,9 +1666,12 @@  idx_find_step (tree base, tree *idx, void *data)
 
   iv_base = iv->base;
   iv_step = iv->step;
+  if (iv->no_overflow && nowrap_type_p (TREE_TYPE (iv_step)))
+    use_overflow_semantics = true;
+
   if (!convert_affine_scev (dta->ivopts_data->current_loop,
 			    sizetype, &iv_base, &iv_step, dta->stmt,
-			    false))
+			    use_overflow_semantics))
     {
       /* The index might wrap.  */
       return false;
Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 222758)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -2145,7 +2145,7 @@  analyze_scalar_evolution_in_loop (struct loop *wrt
   /* We cannot just do
 
      tmp = analyze_scalar_evolution (use_loop, version);
-     ev = resolve_mixers (wrto_loop, tmp);
+     ev = resolve_mixers (wrto_loop, tmp, folded_casts);
 
      as resolve_mixers would query the scalar evolution with respect to
      wrto_loop.  For example, in the situation described in the function
@@ -2154,9 +2154,9 @@  analyze_scalar_evolution_in_loop (struct loop *wrt
 
      analyze_scalar_evolution (use_loop, version) = k2
 
-     and resolve_mixers (loop1, k2) finds that the value of k2 in loop 1
-     is 100, which is a wrong result, since we are interested in the
-     value in loop 3.
+     and resolve_mixers (loop1, k2, folded_casts) finds that the value of
+     k2 in loop 1 is 100, which is a wrong result, since we are interested
+     in the value in loop 3.
 
      Instead, we need to proceed from use_loop to wrto_loop loop by loop,
      each time checking that there is no evolution in the inner loop.  */
@@ -2166,11 +2166,8 @@  analyze_scalar_evolution_in_loop (struct loop *wrt
   while (1)
     {
       tmp = analyze_scalar_evolution (use_loop, ev);
-      ev = resolve_mixers (use_loop, tmp);
+      ev = resolve_mixers (use_loop, tmp, folded_casts);
 
-      if (folded_casts && tmp != ev)
-	*folded_casts = true;
-
       if (use_loop == wrto_loop)
 	return ev;
 
@@ -2292,7 +2289,7 @@  loop_closed_phi_def (tree var)
 }
 
 static tree instantiate_scev_r (basic_block, struct loop *, struct loop *,
-				tree, bool, int);
+				tree, bool *, int);
 
 /* Analyze all the parameters of the chrec, between INSTANTIATE_BELOW
    and EVOLUTION_LOOP, that were left under a symbolic form.
@@ -2301,9 +2298,10 @@  static tree instantiate_scev_r (basic_block, struc
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2312,7 +2310,7 @@  static tree
 instantiate_scev_name (basic_block instantiate_below,
 		       struct loop *evolution_loop, struct loop *inner_loop,
 		       tree chrec,
-		       bool fold_conversions,
+		       bool *fold_conversions,
 		       int size_expr)
 {
   tree res;
@@ -2406,9 +2404,10 @@  instantiate_scev_name (basic_block instantiate_bel
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2416,7 +2415,7 @@  instantiate_scev_name (basic_block instantiate_bel
 static tree
 instantiate_scev_poly (basic_block instantiate_below,
 		       struct loop *evolution_loop, struct loop *,
-		       tree chrec, bool fold_conversions, int size_expr)
+		       tree chrec, bool *fold_conversions, int size_expr)
 {
   tree op1;
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
@@ -2450,9 +2449,10 @@  instantiate_scev_poly (basic_block instantiate_bel
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2462,7 +2462,7 @@  instantiate_scev_binary (basic_block instantiate_b
 			 struct loop *evolution_loop, struct loop *inner_loop,
 			 tree chrec, enum tree_code code,
 			 tree type, tree c0, tree c1,
-			 bool fold_conversions, int size_expr)
+			 bool *fold_conversions, int size_expr)
 {
   tree op1;
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop, inner_loop,
@@ -2508,9 +2508,10 @@  instantiate_scev_binary (basic_block instantiate_b
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2518,7 +2519,7 @@  instantiate_scev_binary (basic_block instantiate_b
 static tree
 instantiate_array_ref (basic_block instantiate_below,
 		       struct loop *evolution_loop, struct loop *inner_loop,
-		       tree chrec, bool fold_conversions, int size_expr)
+		       tree chrec, bool *fold_conversions, int size_expr)
 {
   tree res;
   tree index = TREE_OPERAND (chrec, 1);
@@ -2545,9 +2546,10 @@  instantiate_array_ref (basic_block instantiate_bel
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2556,7 +2558,7 @@  static tree
 instantiate_scev_convert (basic_block instantiate_below,
 			  struct loop *evolution_loop, struct loop *inner_loop,
 			  tree chrec, tree type, tree op,
-			  bool fold_conversions, int size_expr)
+			  bool *fold_conversions, int size_expr)
 {
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
 				 inner_loop, op,
@@ -2567,19 +2569,21 @@  instantiate_scev_convert (basic_block instantiate_
 
   if (fold_conversions)
     {
-      tree tmp = chrec_convert_aggressive (type, op0);
+      tree tmp = chrec_convert_aggressive (type, op0, fold_conversions);
       if (tmp)
 	return tmp;
-    }
 
-  if (chrec && op0 == op)
-    return chrec;
+      /* If we used chrec_convert_aggressive, we can no longer assume that
+	 signed chrecs do not overflow, as chrec_convert does, so avoid
+	 calling it in that case.  */
+      if (*fold_conversions)
+	{
+	  if (chrec && op0 == op)
+	    return chrec;
 
-  /* If we used chrec_convert_aggressive, we can no longer assume that
-     signed chrecs do not overflow, as chrec_convert does, so avoid
-     calling it in that case.  */
-  if (fold_conversions)
-    return fold_convert (type, op0);
+	  return fold_convert (type, op0);
+	}
+    }
 
   return chrec_convert (type, op0, NULL);
 }
@@ -2593,9 +2597,10 @@  instantiate_scev_convert (basic_block instantiate_
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2605,7 +2610,7 @@  instantiate_scev_not (basic_block instantiate_belo
 		      struct loop *evolution_loop, struct loop *inner_loop,
 		      tree chrec,
 		      enum tree_code code, tree type, tree op,
-		      bool fold_conversions, int size_expr)
+		      bool *fold_conversions, int size_expr)
 {
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
 				 inner_loop, op,
@@ -2643,9 +2648,10 @@  instantiate_scev_not (basic_block instantiate_belo
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2654,7 +2660,7 @@  static tree
 instantiate_scev_3 (basic_block instantiate_below,
 		    struct loop *evolution_loop, struct loop *inner_loop,
 		    tree chrec,
-		    bool fold_conversions, int size_expr)
+		    bool *fold_conversions, int size_expr)
 {
   tree op1, op2;
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
@@ -2691,9 +2697,10 @@  instantiate_scev_3 (basic_block instantiate_below,
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2702,7 +2709,7 @@  static tree
 instantiate_scev_2 (basic_block instantiate_below,
 		    struct loop *evolution_loop, struct loop *inner_loop,
 		    tree chrec,
-		    bool fold_conversions, int size_expr)
+		    bool *fold_conversions, int size_expr)
 {
   tree op1;
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
@@ -2731,9 +2738,10 @@  instantiate_scev_2 (basic_block instantiate_below,
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2742,7 +2750,7 @@  static tree
 instantiate_scev_1 (basic_block instantiate_below,
 		    struct loop *evolution_loop, struct loop *inner_loop,
 		    tree chrec,
-		    bool fold_conversions, int size_expr)
+		    bool *fold_conversions, int size_expr)
 {
   tree op0 = instantiate_scev_r (instantiate_below, evolution_loop,
 				 inner_loop, TREE_OPERAND (chrec, 0),
@@ -2764,9 +2772,10 @@  instantiate_scev_1 (basic_block instantiate_below,
 
    CACHE is the cache of already instantiated values.
 
-   FOLD_CONVERSIONS should be set to true when the conversions that
-   may wrap in signed/pointer type are folded, as long as the value of
-   the chrec is preserved.
+   Variable pointed by FOLD_CONVERSIONS is set to TRUE when the
+   conversions that may wrap in signed/pointer type are folded, as long
+   as the value of the chrec is preserved.  If FOLD_CONVERSIONS is NULL
+   then we don't do such fold.
 
    SIZE_EXPR is used for computing the size of the expression to be
    instantiated, and to stop if it exceeds some limit.  */
@@ -2775,7 +2784,7 @@  static tree
 instantiate_scev_r (basic_block instantiate_below,
 		    struct loop *evolution_loop, struct loop *inner_loop,
 		    tree chrec,
-		    bool fold_conversions, int size_expr)
+		    bool *fold_conversions, int size_expr)
 {
   /* Give up if the expression is larger than the MAX that we allow.  */
   if (size_expr++ > PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
@@ -2900,7 +2909,7 @@  instantiate_scev (basic_block instantiate_below, s
     }
 
   res = instantiate_scev_r (instantiate_below, evolution_loop,
-			    NULL, chrec, false, 0);
+			    NULL, chrec, NULL, 0);
 
   if (destr)
     {
@@ -2924,9 +2933,10 @@  instantiate_scev (basic_block instantiate_below, s
    of an expression.  */
 
 tree
-resolve_mixers (struct loop *loop, tree chrec)
+resolve_mixers (struct loop *loop, tree chrec, bool *folded_casts)
 {
   bool destr = false;
+  bool fold_conversions = false;
   if (!global_cache)
     {
       global_cache = new instantiate_cache_type;
@@ -2934,8 +2944,11 @@  tree
     }
 
   tree ret = instantiate_scev_r (block_before_loop (loop), loop, NULL,
-				 chrec, true, 0);
+				 chrec, &fold_conversions, 0);
 
+  if (folded_casts && !*folded_casts)
+    *folded_casts = fold_conversions;
+
   if (destr)
     {
       delete global_cache;
@@ -3387,7 +3400,8 @@  scev_const_prop (void)
 	      && !INTEGRAL_TYPE_P (type))
 	    continue;
 
-	  ev = resolve_mixers (loop, analyze_scalar_evolution (loop, name));
+	  ev = resolve_mixers (loop, analyze_scalar_evolution (loop, name),
+			       NULL);
 	  if (!is_gimple_min_invariant (ev)
 	      || !may_propagate_copy (name, ev))
 	    continue;
Index: gcc/tree-scalar-evolution.h
===================================================================
--- gcc/tree-scalar-evolution.h	(revision 222758)
+++ gcc/tree-scalar-evolution.h	(working copy)
@@ -31,7 +31,7 @@  extern void scev_reset_htab (void);
 extern void scev_finalize (void);
 extern tree analyze_scalar_evolution (struct loop *, tree);
 extern tree instantiate_scev (basic_block, struct loop *, tree);
-extern tree resolve_mixers (struct loop *, tree);
+extern tree resolve_mixers (struct loop *, tree, bool *);
 extern void gather_stats_on_scev_database (void);
 extern unsigned int scev_const_prop (void);
 extern bool expression_expensive_p (tree);
Index: gcc/tree-chrec.c
===================================================================
--- gcc/tree-chrec.c	(revision 222758)
+++ gcc/tree-chrec.c	(working copy)
@@ -1178,8 +1178,6 @@  nb_vars_in_chrec (tree chrec)
     }
 }
 
-static tree chrec_convert_1 (tree, tree, gimple, bool);
-
 /* Converts BASE and STEP of affine scev to TYPE.  LOOP is the loop whose iv
    the scev corresponds to.  AT_STMT is the statement at that the scev is
    evaluated.  USE_OVERFLOW_SEMANTICS is true if this function should assume that
@@ -1254,8 +1252,7 @@  convert_affine_scev (struct loop *loop, tree type,
 				use_overflow_semantics))
     return false;
 
-  new_base = chrec_convert_1 (type, *base, at_stmt,
-			      use_overflow_semantics);
+  new_base = chrec_convert (type, *base, at_stmt, use_overflow_semantics);
   /* The step must be sign extended, regardless of the signedness
      of CT and TYPE.  This only needs to be handled specially when
      CT is unsigned -- to avoid e.g. unsigned char [100, +, 255]
@@ -1266,10 +1263,11 @@  convert_affine_scev (struct loop *loop, tree type,
   if (TYPE_PRECISION (step_type) > TYPE_PRECISION (ct) && TYPE_UNSIGNED (ct))
     {
       tree signed_ct = build_nonstandard_integer_type (TYPE_PRECISION (ct), 0);
-      new_step = chrec_convert_1 (signed_ct, new_step, at_stmt,
-                                  use_overflow_semantics);
+      new_step = chrec_convert (signed_ct, new_step, at_stmt,
+                                use_overflow_semantics);
     }
-  new_step = chrec_convert_1 (step_type, new_step, at_stmt, use_overflow_semantics);
+  new_step = chrec_convert (step_type, new_step, at_stmt,
+			    use_overflow_semantics);
 
   if (automatically_generated_chrec_p (new_base)
       || automatically_generated_chrec_p (new_step))
@@ -1306,36 +1304,6 @@  chrec_convert_rhs (tree type, tree chrec, gimple a
    determining a more accurate estimation of the number of iterations.
    By default AT_STMT could be safely set to NULL_TREE.
 
-   The following rule is always true: TREE_TYPE (chrec) ==
-   TREE_TYPE (CHREC_LEFT (chrec)) == TREE_TYPE (CHREC_RIGHT (chrec)).
-   An example of what could happen when adding two chrecs and the type
-   of the CHREC_RIGHT is different than CHREC_LEFT is:
-
-   {(uint) 0, +, (uchar) 10} +
-   {(uint) 0, +, (uchar) 250}
-
-   that would produce a wrong result if CHREC_RIGHT is not (uint):
-
-   {(uint) 0, +, (uchar) 4}
-
-   instead of
-
-   {(uint) 0, +, (uint) 260}
-*/
-
-tree
-chrec_convert (tree type, tree chrec, gimple at_stmt)
-{
-  return chrec_convert_1 (type, chrec, at_stmt, true);
-}
-
-/* Convert CHREC to TYPE.  When the analyzer knows the context in
-   which the CHREC is built, it sets AT_STMT to the statement that
-   contains the definition of the analyzed variable, otherwise the
-   conversion is less accurate: the information is used for
-   determining a more accurate estimation of the number of iterations.
-   By default AT_STMT could be safely set to NULL_TREE.
-
    USE_OVERFLOW_SEMANTICS is true if this function should assume that
    the rules for overflow of the given language apply (e.g., that signed
    arithmetics in C does not overflow) -- i.e., to use them to avoid unnecessary
@@ -1420,15 +1388,53 @@  keep_cast:
   return res;
 }
 
+/* Convert CHREC to TYPE.  When the analyzer knows the context in
+   which the CHREC is built, it sets AT_STMT to the statement that
+   contains the definition of the analyzed variable, otherwise the
+   conversion is less accurate: the information is used for
+   determining a more accurate estimation of the number of iterations.
+   By default AT_STMT could be safely set to NULL_TREE.
+
+   The following rule is always true: TREE_TYPE (chrec) ==
+   TREE_TYPE (CHREC_LEFT (chrec)) == TREE_TYPE (CHREC_RIGHT (chrec)).
+   An example of what could happen when adding two chrecs and the type
+   of the CHREC_RIGHT is different than CHREC_LEFT is:
+
+   {(uint) 0, +, (uchar) 10} +
+   {(uint) 0, +, (uchar) 250}
+
+   that would produce a wrong result if CHREC_RIGHT is not (uint):
+
+   {(uint) 0, +, (uchar) 4}
+
+   instead of
+
+   {(uint) 0, +, (uint) 260}
+
+   USE_OVERFLOW_SEMANTICS is true if this function should assume that
+   the rules for overflow of the given language apply (e.g., that signed
+   arithmetics in C does not overflow) -- i.e., to use them to avoid unnecessary
+   tests, but also to enforce that the result follows them.  */
+
+tree
+chrec_convert (tree type, tree chrec, gimple at_stmt,
+	       bool use_overflow_semantics)
+{
+  return chrec_convert_1 (type, chrec, at_stmt, use_overflow_semantics);
+}
+
 /* Convert CHREC to TYPE, without regard to signed overflows.  Returns the new
    chrec if something else than what chrec_convert would do happens, NULL_TREE
-   otherwise.  */
+   otherwise.  This function set TRUE to variable pointed by FOLD_CONVERSIONS
+   if the result chrec may overflow.  */
 
 tree
-chrec_convert_aggressive (tree type, tree chrec)
+chrec_convert_aggressive (tree type, tree chrec, bool *fold_conversions)
 {
   tree inner_type, left, right, lc, rc, rtype;
 
+  gcc_assert (fold_conversions != NULL);
+
   if (automatically_generated_chrec_p (chrec)
       || TREE_CODE (chrec) != POLYNOMIAL_CHREC)
     return NULL_TREE;
@@ -1437,17 +1443,33 @@  tree
   if (TYPE_PRECISION (type) > TYPE_PRECISION (inner_type))
     return NULL_TREE;
 
+  if (useless_type_conversion_p (type, inner_type))
+    return NULL_TREE;
+
+  if (evolution_function_is_affine_p (chrec))
+    {
+      tree base, step;
+      struct loop *loop;
+
+      loop = get_chrec_loop (chrec);
+      base = CHREC_LEFT (chrec);
+      step = CHREC_RIGHT (chrec);
+      if (convert_affine_scev (loop, type, &base, &step, NULL, true))
+	return build_polynomial_chrec (loop->num, base, step);
+    }
   rtype = POINTER_TYPE_P (type) ? sizetype : type;
 
   left = CHREC_LEFT (chrec);
   right = CHREC_RIGHT (chrec);
-  lc = chrec_convert_aggressive (type, left);
+  lc = chrec_convert_aggressive (type, left, fold_conversions);
   if (!lc)
     lc = chrec_convert (type, left, NULL);
-  rc = chrec_convert_aggressive (rtype, right);
+  rc = chrec_convert_aggressive (rtype, right, fold_conversions);
   if (!rc)
     rc = chrec_convert (rtype, right, NULL);
 
+  *fold_conversions = true;
+
   return build_polynomial_chrec (CHREC_VARIABLE (chrec), lc, rc);
 }
 
Index: gcc/tree-chrec.h
===================================================================
--- gcc/tree-chrec.h	(revision 222758)
+++ gcc/tree-chrec.h	(working copy)
@@ -59,9 +59,9 @@  enum ev_direction scev_direction (const_tree);
 extern tree chrec_fold_plus (tree, tree, tree);
 extern tree chrec_fold_minus (tree, tree, tree);
 extern tree chrec_fold_multiply (tree, tree, tree);
-extern tree chrec_convert (tree, tree, gimple);
+extern tree chrec_convert (tree, tree, gimple, bool = true);
 extern tree chrec_convert_rhs (tree, tree, gimple);
-extern tree chrec_convert_aggressive (tree, tree);
+extern tree chrec_convert_aggressive (tree, tree, bool *);
 
 /* Operations.  */
 extern tree chrec_apply (unsigned, tree, tree);
Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c	(revision 222758)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -3773,61 +3773,39 @@  nowrap_type_p (tree type)
   return false;
 }
 
-/* Return false only when the induction variable BASE + STEP * I is
-   known to not overflow: i.e. when the number of iterations is small
-   enough with respect to the step and initial condition in order to
-   keep the evolution confined in TYPEs bounds.  Return true when the
-   iv is known to overflow or when the property is not computable.
+/* Return true only when the number of iterations for LOOP is small enough
+   with respect to the step and initial condition in order to keep the
+   evolution confined in TYPEs bounds.  The evolution is defined by BASE
+   and STEP.  Otherwise return false.
 
-   USE_OVERFLOW_SEMANTICS is true if this function should assume that
-   the rules for overflow of the given language apply (e.g., that signed
-   arithmetics in C does not overflow).  */
+   TODO: Given below program, this function should be able to prove that
+   i doesn't overflow or wrap.
 
-bool
-scev_probably_wraps_p (tree base, tree step,
-		       gimple at_stmt, struct loop *loop,
-		       bool use_overflow_semantics)
-{
-  tree delta, step_abs;
-  tree unsigned_type, valid_niter;
-  tree type = TREE_TYPE (step);
-  tree e;
-  widest_int niter;
-  struct nb_iter_bound *bound;
+     int *a;
+     int foo (signed char s, signed char l)
+     {
+       int sum = 0;
+       signed char i;
 
-  /* FIXME: We really need something like
-     http://gcc.gnu.org/ml/gcc-patches/2005-06/msg02025.html.
+       for (i = s; i < l; i++)
+         sum += a[i];
 
-     We used to test for the following situation that frequently appears
-     during address arithmetics:
+       return sum;
+     }
 
-       D.1621_13 = (long unsigned intD.4) D.1620_12;
-       D.1622_14 = D.1621_13 * 8;
-       D.1623_15 = (doubleD.29 *) D.1622_14;
+   In order to do this, we need to do loop niter analysis just like in
+   tree-ssa-loop-niter.c or to find a way to use result of the analysis.  */
 
-     And derived that the sequence corresponding to D_14
-     can be proved to not wrap because it is used for computing a
-     memory access; however, this is not really the case -- for example,
-     if D_12 = (unsigned char) [254,+,1], then D_14 has values
-     2032, 2040, 0, 8, ..., but the code is still legal.  */
+static bool
+loop_exits_before_overflow (tree base, tree step,
+			    gimple at_stmt, struct loop *loop)
+{
+  widest_int niter;
+  struct nb_iter_bound *bound;
+  tree e, delta, step_abs;
+  tree type = TREE_TYPE (step);
+  tree unsigned_type, valid_niter;
 
-  if (chrec_contains_undetermined (base)
-      || chrec_contains_undetermined (step))
-    return true;
-
-  if (integer_zerop (step))
-    return false;
-
-  /* If we can use the fact that signed and pointer arithmetics does not
-     wrap, we are done.  */
-  if (use_overflow_semantics && nowrap_type_p (TREE_TYPE (base)))
-    return false;
-
-  /* To be able to use estimates on number of iterations of the loop,
-     we must have an upper bound on the absolute value of the step.  */
-  if (TREE_CODE (step) != INTEGER_CST)
-    return true;
-
   /* Don't issue signed overflow warnings.  */
   fold_defer_overflow_warnings ();
 
@@ -3865,7 +3843,7 @@  nowrap_type_p (tree type)
       && integer_nonzerop (e))
     {
       fold_undefer_and_ignore_overflow_warnings ();
-      return false;
+      return true;
     }
   if (at_stmt)
     for (bound = loop->bounds; bound; bound = bound->next)
@@ -3873,12 +3851,65 @@  nowrap_type_p (tree type)
 	if (n_of_executions_at_most (at_stmt, bound, valid_niter))
 	  {
 	    fold_undefer_and_ignore_overflow_warnings ();
-	    return false;
+	    return true;
 	  }
       }
 
   fold_undefer_and_ignore_overflow_warnings ();
+  return false;
+}
 
+/* Return false only when the induction variable BASE + STEP * I is
+   known to not overflow: i.e. when the number of iterations is small
+   enough with respect to the step and initial condition in order to
+   keep the evolution confined in TYPEs bounds.  Return true when the
+   iv is known to overflow or when the property is not computable.
+
+   USE_OVERFLOW_SEMANTICS is true if this function should assume that
+   the rules for overflow of the given language apply (e.g., that signed
+   arithmetics in C does not overflow).  */
+
+bool
+scev_probably_wraps_p (tree base, tree step,
+		       gimple at_stmt, struct loop *loop,
+		       bool use_overflow_semantics)
+{
+  /* FIXME: We really need something like
+     http://gcc.gnu.org/ml/gcc-patches/2005-06/msg02025.html.
+
+     We used to test for the following situation that frequently appears
+     during address arithmetics:
+
+       D.1621_13 = (long unsigned intD.4) D.1620_12;
+       D.1622_14 = D.1621_13 * 8;
+       D.1623_15 = (doubleD.29 *) D.1622_14;
+
+     And derived that the sequence corresponding to D_14
+     can be proved to not wrap because it is used for computing a
+     memory access; however, this is not really the case -- for example,
+     if D_12 = (unsigned char) [254,+,1], then D_14 has values
+     2032, 2040, 0, 8, ..., but the code is still legal.  */
+
+  if (chrec_contains_undetermined (base)
+      || chrec_contains_undetermined (step))
+    return true;
+
+  if (integer_zerop (step))
+    return false;
+
+  /* If we can use the fact that signed and pointer arithmetics does not
+     wrap, we are done.  */
+  if (use_overflow_semantics && nowrap_type_p (TREE_TYPE (base)))
+    return false;
+
+  /* To be able to use estimates on number of iterations of the loop,
+     we must have an upper bound on the absolute value of the step.  */
+  if (TREE_CODE (step) != INTEGER_CST)
+    return true;
+
+  if (loop_exits_before_overflow (base, step, at_stmt, loop))
+    return false;
+
   /* At this point we still don't have a proof that the iv does not
      overflow: give up.  */
   return true;