diff mbox

Improve IVOPT to handle outside and inside loop iv uses differently in GCC

Message ID 001901cedad5$a9866710$fc933530$@arm.com
State New
Headers show

Commit Message

Bin Cheng Nov. 6, 2013, 9:50 a.m. UTC
Hi,
GCC IVOPT has a problem that it doesn't differentiate between iv uses
outside of loop from inside ones.  It computes cost for outside iv use just
like inside ones, which is wrong because outside iv use should be computed
along loop exit edge and the cost should be amortized against loop iteration
number.  Lastly, the computation of outside iv use is inserted in loop,
rather along loop exit edge.

This is interesting since usually outside iv use should be handled
differently, or it hurts optimization in several ways like:
1) Wrong iv candidate is chosen because of inaccurate cost.
2) Extra computation in loop itself is redundant.
3) Extra code computing outside iv use in loop may increases register
pressure because both iv variables before and after stepping could be alive
at same time.
4) IVOPT generates code that it expects to stay as is, passes like DOM tends
to break this because of the extra computation.  This hurts targets with
auto-increment support more.

This patch fixes the problem.  Bootstrap and test on x86/x86_64/arm.
Richard, Zdenek,  does this look reasonable?

Thanks,
bin


gcc/testsuite/ChangeLog
2013-11-06  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test.

2013-11-06  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (iv_use_p, iv_cand_p): Move around.
	(iv_use_location): New.
	(struct iv): Remove have_use_for and use_id.  New fields
	inside_use, outside_uses_vec and use_loc.
	(struct iv_use): New fields exit_edge and outside_use_p.
	(struct edge_info, edge_info_p): New.
	(struct ivopts_data): New fields alloc_uses_vecs, changed_bbs,
	edge_map and edge_obstack.
	(init_edge_info, get_edge_info): New.
	(dump_use): Dump outside/inside information for iv use.
	(tree_ssa_iv_optimize_init): Init new fields.
	(alloc_iv): Init new fields.  Remove have_use_for and use_id.
	(record_use): New parameter.  Record information for outside loop
	iv use.
	(find_interesting_uses_op): New parameter.  Handle inside and
	outside loop iv uses.
	(find_interesting_uses_cond, idx_record_use): Pass new argument.
	(find_interesting_uses_address): Likewise.
	(find_interesting_uses_stmt, create_new_iv): likewise.
	(find_interesting_uses_outside): Rename exit to exit_edge.
	New parameter normal_edge_p.  Pass new argument.
	(find_interesting_uses): Find iv uses in two passes.
	(get_computation): Compute cost at right position for iv use.
	(determine_use_iv_cost_generic): Ajust cost for outside loop iv use.
	(rewrite_use_outside_of_loop): New.
	(rewrite_use): Call rewrite_use_outside_of_loop.
	(remove_unused_ivs): Keep computation only for inner iv use.
	(free_loop_data):  Reset outside_uses_vec in various iv structures.
	Free alloc_uses_vecs and edge_map.
	(tree_ssa_iv_optimize_finalize): Free and reset.
	(tree_ssa_iv_optimize_loop): Create edge_map.
	(tree_ssa_iv_optimize): Call rewrite_into_loop_closed_ssa if
	necessary.

Comments

Bin Cheng Nov. 21, 2013, 9:04 a.m. UTC | #1
Ping and CC Zdenek with the right email.

Thanks,
bin

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of bin.cheng
> Sent: Wednesday, November 06, 2013 5:51 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener; ook@ucw.cz
> Subject: [PATCH GCC]Improve IVOPT to handle outside and inside loop iv
> uses differently in GCC
> 
> Hi,
> GCC IVOPT has a problem that it doesn't differentiate between iv uses
> outside of loop from inside ones.  It computes cost for outside iv use
just like
> inside ones, which is wrong because outside iv use should be computed
> along loop exit edge and the cost should be amortized against loop
iteration
> number.  Lastly, the computation of outside iv use is inserted in loop,
rather
> along loop exit edge.
> 
> This is interesting since usually outside iv use should be handled
differently,
> or it hurts optimization in several ways like:
> 1) Wrong iv candidate is chosen because of inaccurate cost.
> 2) Extra computation in loop itself is redundant.
> 3) Extra code computing outside iv use in loop may increases register
> pressure because both iv variables before and after stepping could be
alive
> at same time.
> 4) IVOPT generates code that it expects to stay as is, passes like DOM
tends
> to break this because of the extra computation.  This hurts targets with
auto-
> increment support more.
> 
> This patch fixes the problem.  Bootstrap and test on x86/x86_64/arm.
> Richard, Zdenek,  does this look reasonable?
> 
> Thanks,
> bin
> 
> 
> gcc/testsuite/ChangeLog
> 2013-11-06  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test.
> 
> 2013-11-06  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* tree-ssa-loop-ivopts.c (iv_use_p, iv_cand_p): Move around.
> 	(iv_use_location): New.
> 	(struct iv): Remove have_use_for and use_id.  New fields
> 	inside_use, outside_uses_vec and use_loc.
> 	(struct iv_use): New fields exit_edge and outside_use_p.
> 	(struct edge_info, edge_info_p): New.
> 	(struct ivopts_data): New fields alloc_uses_vecs, changed_bbs,
> 	edge_map and edge_obstack.
> 	(init_edge_info, get_edge_info): New.
> 	(dump_use): Dump outside/inside information for iv use.
> 	(tree_ssa_iv_optimize_init): Init new fields.
> 	(alloc_iv): Init new fields.  Remove have_use_for and use_id.
> 	(record_use): New parameter.  Record information for outside loop
> 	iv use.
> 	(find_interesting_uses_op): New parameter.  Handle inside and
> 	outside loop iv uses.
> 	(find_interesting_uses_cond, idx_record_use): Pass new argument.
> 	(find_interesting_uses_address): Likewise.
> 	(find_interesting_uses_stmt, create_new_iv): likewise.
> 	(find_interesting_uses_outside): Rename exit to exit_edge.
> 	New parameter normal_edge_p.  Pass new argument.
> 	(find_interesting_uses): Find iv uses in two passes.
> 	(get_computation): Compute cost at right position for iv use.
> 	(determine_use_iv_cost_generic): Ajust cost for outside loop iv use.
> 	(rewrite_use_outside_of_loop): New.
> 	(rewrite_use): Call rewrite_use_outside_of_loop.
> 	(remove_unused_ivs): Keep computation only for inner iv use.
> 	(free_loop_data):  Reset outside_uses_vec in various iv structures.
> 	Free alloc_uses_vecs and edge_map.
> 	(tree_ssa_iv_optimize_finalize): Free and reset.
> 	(tree_ssa_iv_optimize_loop): Create edge_map.
> 	(tree_ssa_iv_optimize): Call rewrite_into_loop_closed_ssa if
> 	necessary.
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c	(revision 0)
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts" } */
+
+struct header {
+    unsigned long size;
+    struct header **ptr;
+};
+
+extern unsigned long top;
+
+int foo (struct header *start, int size)
+{
+  struct header *hd = start;
+  top = 0;
+  while (hd != 0 && top+1 < size)
+    {
+      ++top;
+      hd = (hd->ptr)[4];
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "iv2tmp.* = .* + \[0-9\]*" "ivopts" } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -56,6 +56,24 @@  along with GCC; see the file COPYING3.  If not see
    4) The trees are transformed to use the new variables, the dead code is
       removed.
 
+   Induction variable is possibly used outside of loop.  The algorithm
+   handles outside iv use like:
+      -- Marks iv use outside of loop by recording the corresponding exit
+	 edge along which the iv is used.  If an induction variable is
+	 used both inside and outside of loop, outside use will be ignored,
+	 as if it is escalated into inside one.
+      -- Computes the use cost against location of the end of source block
+	 of the exit edge, and amortize the cost against loop iteration
+	 number.
+      -- Inserts computation of outside iv use along the exit edge, rather
+	 than in basic block of current loop.
+   TODO: For degenerable phi node with identical arguments, if computations
+   of arguments along each exit edge are identical, the algorithm inserts
+   duplicate gimples for each phi argument.  In this case, only one version
+   of computation should be inserted at the beginning of the basic block.
+   Unfortunately, it's hard to know whether the computations are identical
+   before acutally rewrting the uses.
+
    All of this is done loop by loop.  Doing it globally is theoretically
    possible, it might give a better performance and it might enable us
    to decide costs more precisely, but getting all the interactions right
@@ -125,6 +143,19 @@  avg_loop_niter (struct loop *loop)
   return niter;
 }
 
+/* The types used by the induction variable optimizations.  */
+
+typedef struct iv_use *iv_use_p;
+typedef struct iv_cand *iv_cand_p;
+
+/* Indicate what kind of use recorded for struct iv.  */
+enum iv_use_location
+{
+  NONE_LOC,	/* iv has no use for it.  */
+  OUTSIDE_LOOP,	/* iv has outside use for it.  */
+  INSIDE_LOOP	/* iv has inside use for it.  */
+};
+
 /* Representation of the induction variable.  */
 struct iv
 {
@@ -132,9 +163,12 @@  struct iv
   tree base_object;	/* A memory object to that the induction variable points.  */
   tree step;		/* Step of the iv (constant only).  */
   tree ssa_name;	/* The ssa name with the value.  */
+  iv_use_p inside_use;	/* The id of inside loop use.  */
+  vec<iv_use_p> *outside_uses_vec;
+			/* The pointers to outside loop uses.  */
+  enum iv_use_location use_loc;
+			/* What kind of use for this iv.  */
   bool biv_p;		/* Is it a biv?  */
-  bool have_use_for;	/* Do we already have a use for it?  */
-  unsigned use_id;	/* The identifier in the use if it is the case.  */
 };
 
 /* Per-ssa version information (induction variable descriptions, etc.).  */
@@ -200,6 +234,8 @@  struct iv_use
 
   struct iv_cand *selected;
 			/* The selected candidate.  */
+  edge exit_edge;	/* Exit edge for outside loop uses.  */
+  bool outside_use_p;	/* True for outside loop uses.  */
 };
 
 /* The position where the iv is computed.  */
@@ -243,12 +279,16 @@  struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
+/* Exit edge information.  */
+struct edge_info
+{
+  edge exit_edge;
+  edge split_edge;
+  basic_block split_bb;
+};
 
-typedef struct iv_use *iv_use_p;
+typedef struct edge_info *edge_info_p;
 
-typedef struct iv_cand *iv_cand_p;
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : typed_free_remove <iv_inv_expr_ent>
@@ -306,12 +346,27 @@  struct ivopts_data
   /* The uses of induction variables.  */
   vec<iv_use_p> iv_uses;
 
+  /* The uses vectors allocated for iv.  These vectors have to be
+     recorded and released via this field, because there are lots
+     of pointer copies for iv structure in this optimization.  */
+  vec<void *> alloc_uses_vecs;
+
   /* The candidates.  */
   vec<iv_cand_p> iv_candidates;
 
   /* A bitmap of important candidates.  */
   bitmap important_candidates;
 
+  /* A bitmap of basic blocks containing use which does not satisfy
+     loop closed ssa property.  */
+  bitmap changed_bbs;
+
+  /* Pointer map embodying a mapping from exit edge to its edge_info.  */
+  struct pointer_map_t *edge_map;
+
+  /* Obstack for exit edge information.  */
+  struct obstack edge_obstack;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -447,6 +502,38 @@  iv_cand (struct ivopts_data *data, unsigned i)
   return data->iv_candidates[i];
 }
 
+/* Add an entry for edge EXIT_EDGE to the edge-to-edge_info mapping
+   if not exist yet.  */
+static struct edge_info *
+init_edge_info (struct ivopts_data *data, edge exit_edge)
+{
+  void **p = pointer_map_contains (data->edge_map, exit_edge);
+  edge_info_p c;
+
+  if (p)
+    return (edge_info_p)*p;
+
+  p = pointer_map_insert (data->edge_map, exit_edge);
+  c = (edge_info_p) obstack_alloc (&data->edge_obstack,
+				   sizeof (struct edge_info));
+  c->exit_edge = exit_edge;
+  c->split_edge = NULL;
+  c->split_bb = NULL;
+  *p = c;
+  return c;
+}
+
+/* Get edge_info for edge EXIT_EDGE, return NULL if it doesn't exist.  */
+static struct edge_info *
+get_edge_info (struct ivopts_data *data, edge exit_edge)
+{
+  void **p = pointer_map_contains (data->edge_map, exit_edge);
+  if (!p)
+    return NULL;
+
+  return (struct edge_info *) *p;
+}
+
 /* The single loop exit if it dominates the latch, NULL otherwise.  */
 
 edge
@@ -517,7 +604,8 @@  dump_use (FILE *file, struct iv_use *use)
   switch (use->type)
     {
     case USE_NONLINEAR_EXPR:
-      fprintf (file, "  generic\n");
+      fprintf (file, "  generic(%s)\n",
+	       use->outside_use_p ? "outside" : "inside");
       break;
 
     case USE_ADDRESS:
@@ -860,9 +948,13 @@  tree_ssa_iv_optimize_init (struct ivopts_data *dat
   data->version_info = XCNEWVEC (struct version_info, data->version_info_size);
   data->relevant = BITMAP_ALLOC (NULL);
   data->important_candidates = BITMAP_ALLOC (NULL);
+  data->changed_bbs = BITMAP_ALLOC (NULL);
+  data->edge_map = NULL;
+  gcc_obstack_init (&data->edge_obstack);
   data->max_inv_id = 0;
   data->niters = NULL;
   data->iv_uses.create (20);
+  data->alloc_uses_vecs.create (20);
   data->iv_candidates.create (20);
   data->inv_expr_tab.create (10);
   data->inv_expr_id = 0;
@@ -930,9 +1022,10 @@  alloc_iv (tree base, tree step)
   iv->base = base;
   iv->base_object = determine_base_object (base);
   iv->step = step;
+  iv->inside_use = NULL;
+  iv->outside_uses_vec = NULL;
+  iv->use_loc = NONE_LOC;
   iv->biv_p = false;
-  iv->have_use_for = false;
-  iv->use_id = 0;
   iv->ssa_name = NULL_TREE;
 
   return iv;
@@ -1193,11 +1286,13 @@  find_induction_variables (struct ivopts_data *data
   return true;
 }
 
-/* Records a use of type USE_TYPE at *USE_P in STMT whose value is IV.  */
+/* Records a use of type USE_TYPE at *USE_P in STMT whose value is IV.
+   EXIT_EDGE is the exit edge through which IV is used outside of loop,
+   which is NULL if this is a use inside loop.  */
 
 static struct iv_use *
 record_use (struct ivopts_data *data, tree *use_p, struct iv *iv,
-	    gimple stmt, enum use_type use_type)
+	    gimple stmt, enum use_type use_type, edge exit_edge)
 {
   struct iv_use *use = XCNEW (struct iv_use);
 
@@ -1205,6 +1300,8 @@  record_use (struct ivopts_data *data, tree *use_p,
   use->type = use_type;
   use->iv = iv;
   use->stmt = stmt;
+  use->outside_use_p = (exit_edge != NULL) ? true : false;
+  use->exit_edge = exit_edge;
   use->op_p = use_p;
   use->related_cands = BITMAP_ALLOC (NULL);
 
@@ -1212,6 +1309,10 @@  record_use (struct ivopts_data *data, tree *use_p,
      caller.  */
   iv->ssa_name = NULL_TREE;
 
+  /* Record the exit edge for rewriting.  */
+  if (exit_edge != NULL)
+    (void) init_edge_info (data, exit_edge);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_use (dump_file, use);
 
@@ -1247,10 +1348,12 @@  record_invariant (struct ivopts_data *data, tree o
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (op));
 }
 
-/* Checks whether the use OP is interesting and if so, records it.  */
+/* Checks whether the use OP is interesting and if so, records it.
+   EXIT_EDGE is the corresponding exit edge if we are looking for
+   outside loop use in OP.  */
 
 static struct iv_use *
-find_interesting_uses_op (struct ivopts_data *data, tree op)
+find_interesting_uses_op (struct ivopts_data *data, tree op, edge exit_edge)
 {
   struct iv *iv;
   struct iv *civ;
@@ -1264,12 +1367,35 @@  static struct iv_use *
   if (!iv)
     return NULL;
 
-  if (iv->have_use_for)
+  /* Outside use must be along normal exit edge.  */
+  if (exit_edge != NULL)
+    gcc_assert ((exit_edge->flags & (EDGE_COMPLEX | EDGE_FAKE)) == 0);
+
+  /* Just return if there is use inside loop.  */
+  if (iv->use_loc == INSIDE_LOOP)
     {
-      use = iv_use (data, iv->use_id);
+      gcc_assert (iv->inside_use != NULL);
+      gcc_assert (iv->inside_use->type == USE_NONLINEAR_EXPR);
+      return iv->inside_use;
+    }
 
-      gcc_assert (use->type == USE_NONLINEAR_EXPR);
-      return use;
+  /* Check if the outside use has already been recorded for this
+     exit edge.  */
+  if (iv->use_loc == OUTSIDE_LOOP && exit_edge != NULL)
+    {
+      unsigned int i;
+
+      gcc_assert (iv->outside_uses_vec != NULL);
+      gcc_assert (iv->outside_uses_vec->length () > 0);
+
+      FOR_EACH_VEC_ELT ((*iv->outside_uses_vec), i, use)
+	{
+	  gcc_assert (use->outside_use_p);
+	  gcc_assert (use->type == USE_NONLINEAR_EXPR);
+
+	  if (exit_edge == use->exit_edge)
+	    return use;
+	}
     }
 
   if (integer_zerop (iv->step))
@@ -1277,8 +1403,16 @@  static struct iv_use *
       record_invariant (data, op, true);
       return NULL;
     }
-  iv->have_use_for = true;
 
+  iv->use_loc = exit_edge ? OUTSIDE_LOOP : INSIDE_LOOP;
+
+  /* Allocate vector for outside uses and record it.  */
+  if (exit_edge != NULL && iv->outside_uses_vec == NULL)
+    {
+      vec_alloc (iv->outside_uses_vec, 4);
+      data->alloc_uses_vecs.safe_push ((void *)iv->outside_uses_vec);
+    }
+
   civ = XNEW (struct iv);
   *civ = *iv;
 
@@ -1286,8 +1420,11 @@  static struct iv_use *
   gcc_assert (gimple_code (stmt) == GIMPLE_PHI
 	      || is_gimple_assign (stmt));
 
-  use = record_use (data, NULL, civ, stmt, USE_NONLINEAR_EXPR);
-  iv->use_id = use->id;
+  use = record_use (data, NULL, civ, stmt, USE_NONLINEAR_EXPR, exit_edge);
+  if (iv->use_loc == OUTSIDE_LOOP)
+    iv->outside_uses_vec->safe_push (use);
+  else
+    iv->inside_use = use;
 
   return use;
 }
@@ -1368,14 +1505,14 @@  find_interesting_uses_cond (struct ivopts_data *da
 
   if (!extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL))
     {
-      find_interesting_uses_op (data, *var_p);
-      find_interesting_uses_op (data, *bound_p);
+      find_interesting_uses_op (data, *var_p, NULL);
+      find_interesting_uses_op (data, *bound_p, NULL);
       return;
     }
 
   civ = XNEW (struct iv);
   *civ = *var_iv;
-  record_use (data, NULL, civ, stmt, USE_COMPARE);
+  record_use (data, NULL, civ, stmt, USE_COMPARE, NULL);
 }
 
 /* Returns the outermost loop EXPR is obviously invariant in
@@ -1560,11 +1697,11 @@  idx_record_use (tree base, tree *idx,
 		void *vdata)
 {
   struct ivopts_data *data = (struct ivopts_data *) vdata;
-  find_interesting_uses_op (data, *idx);
+  find_interesting_uses_op (data, *idx, NULL);
   if (TREE_CODE (base) == ARRAY_REF || TREE_CODE (base) == ARRAY_RANGE_REF)
     {
-      find_interesting_uses_op (data, array_ref_element_size (base));
-      find_interesting_uses_op (data, array_ref_low_bound (base));
+      find_interesting_uses_op (data, array_ref_element_size (base), NULL);
+      find_interesting_uses_op (data, array_ref_low_bound (base), NULL);
     }
   return true;
 }
@@ -1831,7 +1968,7 @@  find_interesting_uses_address (struct ivopts_data
     }
 
   civ = alloc_iv (base, step);
-  record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  record_use (data, op_p, civ, stmt, USE_ADDRESS, NULL);
   return;
 
 fail:
@@ -1897,7 +2034,7 @@  find_interesting_uses_stmt (struct ivopts_data *da
 	  if (REFERENCE_CLASS_P (*rhs))
 	    find_interesting_uses_address (data, stmt, rhs);
 	  else
-	    find_interesting_uses_op (data, *rhs);
+	    find_interesting_uses_op (data, *rhs, NULL);
 
 	  if (REFERENCE_CLASS_P (*lhs))
 	    find_interesting_uses_address (data, stmt, lhs);
@@ -1938,26 +2075,30 @@  find_interesting_uses_stmt (struct ivopts_data *da
       if (!iv)
 	continue;
 
-      find_interesting_uses_op (data, op);
+      find_interesting_uses_op (data, op, NULL);
     }
 }
 
 /* Finds interesting uses of induction variables outside of loops
-   on loop exit edge EXIT.  */
+   on loop exit edge EXIT_EDGE.  NORMAL_EDGE_P is true if EXIT_EDGE
+   is a normal edge.  */
 
 static void
-find_interesting_uses_outside (struct ivopts_data *data, edge exit)
+find_interesting_uses_outside (struct ivopts_data *data,
+			       edge exit_edge, bool normal_edge_p)
 {
   gimple phi;
   gimple_stmt_iterator psi;
   tree def;
 
-  for (psi = gsi_start_phis (exit->dest); !gsi_end_p (psi); gsi_next (&psi))
+  for (psi = gsi_start_phis (exit_edge->dest);
+       !gsi_end_p (psi); gsi_next (&psi))
     {
       phi = gsi_stmt (psi);
-      def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+      def = PHI_ARG_DEF_FROM_EDGE (phi, exit_edge);
       if (!virtual_operand_p (def))
-        find_interesting_uses_op (data, def);
+	find_interesting_uses_op (data, def,
+				  normal_edge_p ? exit_edge : NULL);
     }
 }
 
@@ -1976,6 +2117,11 @@  find_interesting_uses (struct ivopts_data *data)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Uses:\n\n");
 
+  /* Deliberately find uses in two passes with each for inside loop iv
+     uses and outside loop iv uses.  By this strategy, we don't need
+     to escalate outside use to inside one.
+
+     Find inside uses in the first pass.  */
   for (i = 0; i < data->current_loop->num_nodes; i++)
     {
       edge_iterator ei;
@@ -1983,8 +2129,11 @@  find_interesting_uses (struct ivopts_data *data)
 
       FOR_EACH_EDGE (e, ei, bb->succs)
 	if (e->dest != EXIT_BLOCK_PTR
+	    /* Outside use along abnormal exit edge is treated as
+	       inside one.  */
+	    && (e->flags & (EDGE_COMPLEX | EDGE_FAKE)) != 0
 	    && !flow_bb_inside_loop_p (data->current_loop, e->dest))
-	  find_interesting_uses_outside (data, e);
+	  find_interesting_uses_outside (data, e, false);
 
       for (bsi = gsi_start_phis (bb); !gsi_end_p (bsi); gsi_next (&bsi))
 	find_interesting_uses_stmt (data, gsi_stmt (bsi));
@@ -1993,6 +2142,19 @@  find_interesting_uses (struct ivopts_data *data)
 	  find_interesting_uses_stmt (data, gsi_stmt (bsi));
     }
 
+  /* Find outside uses in the second pass.  */
+  for (i = 0; i < data->current_loop->num_nodes; i++)
+    {
+      edge_iterator ei;
+      bb = body[i];
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	if (e->dest != EXIT_BLOCK_PTR
+	    && (e->flags & (EDGE_COMPLEX | EDGE_FAKE)) == 0
+	    && !flow_bb_inside_loop_p (data->current_loop, e->dest))
+	  find_interesting_uses_outside (data, e, true);
+    }
+
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       bitmap_iterator bi;
@@ -3097,7 +3259,17 @@  get_computation_at (struct loop *loop,
 static tree
 get_computation (struct loop *loop, struct iv_use *use, struct iv_cand *cand)
 {
-  return get_computation_at (loop, use, cand, use->stmt);
+  gimple at = NULL;
+
+  /* Outside use should be computed at the end of source block of exit
+     edge.  */
+  if (use->outside_use_p)
+    at = last_stmt (use->exit_edge->src);
+
+  if (at == NULL)
+    at = use->stmt;
+
+  return get_computation_at (loop, use, cand, at);
 }
 
 /* Adjust the cost COST for being in loop setup rather than loop body.
@@ -4242,6 +4414,8 @@  determine_use_iv_cost_generic (struct ivopts_data
 
   cost = get_computation_cost (data, use, cand, false, &depends_on,
                                NULL, &inv_expr_id);
+  if (use->outside_use_p)
+    cost.cost = adjust_setup_cost (data, cost.cost);
 
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE, ERROR_MARK,
                    inv_expr_id);
@@ -6093,7 +6267,7 @@  create_new_iv (struct ivopts_data *data, struct iv
       name_info (data, cand->var_after)->preserve_biv = true;
 
       /* Rewrite the increment so that it uses var_before directly.  */
-      find_interesting_uses_op (data, cand->var_after)->selected = cand;
+      find_interesting_uses_op (data, cand->var_after, NULL)->selected = cand;
       return;
     }
 
@@ -6239,6 +6413,139 @@  rewrite_use_nonlinear_expr (struct ivopts_data *da
     }
 }
 
+/* Rewrites outside USE (definition of iv used in a nonlinear expression
+   outside of loop) using candidate CAND.  The implementation is like
+   rewrite_use_nonlinear_expr except the exit edge will be split and
+   the computation of USE will be inserted in the newly created basic
+   block.  */
+
+static void
+rewrite_use_outside_of_loop (struct ivopts_data *data,
+			     struct iv_use *use, struct iv_cand *cand)
+{
+  tree comp;
+  tree op, tgt, new_tgt;
+  gimple ass;
+  gimple_stmt_iterator bsi;
+  edge e;
+  basic_block new_bb;
+  struct edge_info *einfo;
+
+  gcc_assert (use->outside_use_p && use->exit_edge != NULL);
+  /* An important special case -- if we are asked to express value of
+     the original iv by itself, just exit; there is no need to
+     introduce a new computation (that might also need casting the
+     variable to unsigned and back).  */
+  if (cand->pos == IP_ORIGINAL
+      && cand->incremented_at == use->stmt)
+    {
+      enum tree_code stmt_code;
+
+      gcc_assert (is_gimple_assign (use->stmt));
+      gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);
+
+      /* Check whether we may leave the computation unchanged.
+	 This is the case only if it does not rely on other
+	 computations in the loop -- otherwise, the computation
+	 we rely upon may be removed in remove_unused_ivs,
+	 thus leading to ICE.  */
+      stmt_code = gimple_assign_rhs_code (use->stmt);
+      if (stmt_code == PLUS_EXPR
+	  || stmt_code == MINUS_EXPR
+	  || stmt_code == POINTER_PLUS_EXPR)
+	{
+	  if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
+	    op = gimple_assign_rhs2 (use->stmt);
+	  else if (gimple_assign_rhs2 (use->stmt) == cand->var_before)
+	    op = gimple_assign_rhs1 (use->stmt);
+	  else
+	    op = NULL_TREE;
+	}
+      else
+	op = NULL_TREE;
+
+      if (op && expr_invariant_in_loop_p (data->current_loop, op))
+	return;
+    }
+
+  comp = get_computation (data->current_loop, use, cand);
+  gcc_assert (comp != NULL_TREE);
+
+  switch (gimple_code (use->stmt))
+    {
+    case GIMPLE_PHI:
+      tgt = PHI_RESULT (use->stmt);
+      break;
+
+    case GIMPLE_ASSIGN:
+      tgt = gimple_assign_lhs (use->stmt);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  einfo = get_edge_info (data, use->exit_edge);
+  gcc_assert (einfo->exit_edge == use->exit_edge);
+  if (einfo->split_edge == NULL)
+    {
+      gcc_assert (einfo->split_bb == NULL);
+      new_bb = split_edge (use->exit_edge);
+      e = single_succ_edge (new_bb);
+      einfo->split_bb = new_bb;
+      einfo->split_edge = e;
+    }
+  else
+    {
+      gcc_assert (einfo->split_bb != NULL);
+      new_bb = einfo->split_bb;
+      e = einfo->split_edge;
+    }
+  bsi = gsi_last_bb (new_bb);
+  /* Record basic blocks containing use which does not satisfy loop
+     closed ssa property.  */
+  bitmap_set_bit (data->changed_bbs, new_bb->index);
+  bitmap_set_bit (data->changed_bbs, e->dest->index);
+
+  if (!valid_gimple_rhs_p (comp))
+    {
+      comp = force_gimple_operand_gsi (&bsi, comp, true, NULL_TREE,
+				       false, GSI_CONTINUE_LINKING);
+      if (POINTER_TYPE_P (TREE_TYPE (tgt)))
+	{
+	  duplicate_ssa_name_ptr_info (comp, SSA_NAME_PTR_INFO (tgt));
+	  /* As this isn't a plain copy we have to reset alignment
+	     information.  */
+	  if (SSA_NAME_PTR_INFO (comp))
+	    mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (comp));
+	}
+    }
+  new_tgt = make_temp_ssa_name (TREE_TYPE (tgt), NULL, "iv2tmp");
+
+  if (POINTER_TYPE_P (TREE_TYPE (tgt)))
+    {
+      duplicate_ssa_name_ptr_info (new_tgt, SSA_NAME_PTR_INFO (tgt));
+      /* As this isn't a plain copy we have to reset alignment
+	 information.  */
+      if (SSA_NAME_PTR_INFO (new_tgt))
+	mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (new_tgt));
+    }
+
+  ass = gimple_build_assign (new_tgt, comp);
+  gsi_insert_after (&bsi, ass, GSI_CONTINUE_LINKING);
+
+  for (bsi = gsi_start_phis (e->dest); !gsi_end_p (bsi); gsi_next (&bsi))
+    {
+      gimple phi = gsi_stmt (bsi);
+      tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
+      if (def != tgt)
+	continue;
+
+      SET_PHI_ARG_DEF (phi, e->dest_idx, new_tgt);
+      update_stmt (phi);
+    }
+}
+
 /* Performs a peephole optimization to reorder the iv update statement with
    a mem ref to enable instruction combining in later phases. The mem ref uses
    the iv value before the update, so the reordering transformation requires
@@ -6420,7 +6727,14 @@  rewrite_use (struct ivopts_data *data, struct iv_u
   switch (use->type)
     {
       case USE_NONLINEAR_EXPR:
-	rewrite_use_nonlinear_expr (data, use, cand);
+	if (use->outside_use_p)
+	  {
+	    gcc_assert (use->exit_edge != NULL);
+	    rewrite_use_outside_of_loop (data, use, cand);
+	  }
+	else
+	  rewrite_use_nonlinear_expr (data, use, cand);
+
 	break;
 
       case USE_ADDRESS:
@@ -6477,11 +6791,12 @@  remove_unused_ivs (struct ivopts_data *data)
       if (info->iv
 	  && !integer_zerop (info->iv->step)
 	  && !info->inv_id
-	  && !info->iv->have_use_for
+	  /* Outside use is computated on exit edge.  */
+	  && info->iv->use_loc != INSIDE_LOOP
 	  && !info->preserve_biv)
 	{
 	  bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name));
-	  
+
 	  tree def = info->iv->ssa_name;
 
 	  if (MAY_HAVE_DEBUG_STMTS && SSA_NAME_DEF_STMT (def))
@@ -6625,6 +6940,9 @@  free_loop_data (struct ivopts_data *data)
       struct version_info *info;
 
       info = ver_info (data, i);
+      if (info->iv != NULL)
+	info->iv->outside_uses_vec = NULL;
+
       free (info->iv);
       info->iv = NULL;
       info->has_nonlin_use = false;
@@ -6637,6 +6955,8 @@  free_loop_data (struct ivopts_data *data)
   for (i = 0; i < n_iv_uses (data); i++)
     {
       struct iv_use *use = iv_use (data, i);
+      if (use->iv != NULL)
+	use->iv->outside_uses_vec = NULL;
 
       free (use->iv);
       BITMAP_FREE (use->related_cands);
@@ -6648,9 +6968,19 @@  free_loop_data (struct ivopts_data *data)
     }
   data->iv_uses.truncate (0);
 
+  for (i = 0; i < data->alloc_uses_vecs.length (); i++)
+    {
+      vec<iv_use_p> *uses_vec = (vec<iv_use_p> *)data->alloc_uses_vecs[i];
+      gcc_assert (uses_vec != NULL);
+      vec_free (uses_vec);
+    }
+  data->alloc_uses_vecs.truncate (0);
+
   for (i = 0; i < n_iv_cands (data); i++)
     {
       struct iv_cand *cand = iv_cand (data, i);
+      if (cand->iv != NULL)
+	cand->iv->outside_uses_vec = NULL;
 
       free (cand->iv);
       if (cand->depends_on)
@@ -6666,6 +6996,12 @@  free_loop_data (struct ivopts_data *data)
       data->version_info = XCNEWVEC (struct version_info, data->version_info_size);
     }
 
+  if (data->edge_map != NULL)
+    {
+      pointer_map_destroy (data->edge_map);
+      data->edge_map = NULL;
+    }
+
   data->max_inv_id = 0;
 
   FOR_EACH_VEC_ELT (decl_rtl_to_reset, i, obj)
@@ -6687,9 +7023,13 @@  tree_ssa_iv_optimize_finalize (struct ivopts_data
   free (data->version_info);
   BITMAP_FREE (data->relevant);
   BITMAP_FREE (data->important_candidates);
+  BITMAP_FREE (data->changed_bbs);
+  data->edge_map = NULL;
+  obstack_free (&data->edge_obstack, NULL);
 
   decl_rtl_to_reset.release ();
   data->iv_uses.release ();
+  data->alloc_uses_vecs.release ();
   data->iv_candidates.release ();
   data->inv_expr_tab.dispose ();
 }
@@ -6726,6 +7066,8 @@  tree_ssa_iv_optimize_loop (struct ivopts_data *dat
   gcc_assert (!data->niters);
   data->current_loop = loop;
   data->speed = optimize_loop_for_speed_p (loop);
+  /* Allocate the mapping from edge to edge_info for this loop.  */
+  data->edge_map = pointer_map_create ();
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -6812,6 +7154,12 @@  tree_ssa_iv_optimize (void)
 	flow_loop_dump (loop, dump_file, NULL, 1);
 
       tree_ssa_iv_optimize_loop (&data, loop);
+      /* Update ssa form if it's changed.  */
+      if (!bitmap_empty_p (data.changed_bbs))
+	{
+	  rewrite_into_loop_closed_ssa (data.changed_bbs, TODO_update_ssa);
+	  bitmap_clear (data.changed_bbs);
+	}
     }
 
   tree_ssa_iv_optimize_finalize (&data);