diff mbox series

unroll: Avoid unnecessary tail loops for constant niters

Message ID mpt1r7txare.fsf@arm.com
State New
Headers show
Series unroll: Avoid unnecessary tail loops for constant niters | expand

Commit Message

Richard Sandiford July 20, 2021, 3:31 p.m. UTC
unroll and jam can decide to unroll the outer loop of a nest like:

  for (int j = 0; j < n; ++j)
    for (int i = 0; i < n; ++i)
      x[i] += __builtin_expf (y[j][i]);

It then uses a tail loop to handle any left-over iterations.

However, the code is structured so that this tail loop is always used.
If n is a multiple of the unroll factor UF, the final UF iterations will
use the tail loop rather than the unrolled loop.

“Fixing” that for variable loop counts would mean introducing another
runtime test: a branch around the tail loop if there are no more
iterations.  There's at least an argument that the overhead of doing
that test might not pay for itself.

But we use this structure even if the iteration count is provably
a multiple of UF at compile time.  E.g. with s/n/100/ and an
unroll factor of 2, the first 98 iterations use the unrolled loop
and the final 2 iterations use the original loop.

This patch makes the unroller avoid a tail loop in that case.
The end result seemed easier to follow if variables were declared
at the point of initialisation, so that it's more obvious which
ones are meaningful even when there's no tail loop.

Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
OK to install if testing passes?

Richard


gcc/
	* tree-ssa-loop-manip.c (determine_exit_conditions): Return a null
	exit condition if no tail loop is needed, and if the original exit
	condition should therefore be kept as-is.
	(tree_transform_and_unroll_loop): Handle that case here too.

gcc/testsuite/
	* gcc.dg/unroll-9.c: New test/
---
 gcc/testsuite/gcc.dg/unroll-9.c |  12 ++
 gcc/tree-ssa-loop-manip.c       | 306 +++++++++++++++++---------------
 2 files changed, 176 insertions(+), 142 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/unroll-9.c

Comments

Richard Biener July 20, 2021, 6:01 p.m. UTC | #1
On July 20, 2021 5:31:17 PM GMT+02:00, Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>unroll and jam can decide to unroll the outer loop of a nest like:
>
>  for (int j = 0; j < n; ++j)
>    for (int i = 0; i < n; ++i)
>      x[i] += __builtin_expf (y[j][i]);
>
>It then uses a tail loop to handle any left-over iterations.
>
>However, the code is structured so that this tail loop is always used.
>If n is a multiple of the unroll factor UF, the final UF iterations
>will
>use the tail loop rather than the unrolled loop.
>
>“Fixing” that for variable loop counts would mean introducing another
>runtime test: a branch around the tail loop if there are no more
>iterations.  There's at least an argument that the overhead of doing
>that test might not pay for itself.
>
>But we use this structure even if the iteration count is provably
>a multiple of UF at compile time.  E.g. with s/n/100/ and an
>unroll factor of 2, the first 98 iterations use the unrolled loop
>and the final 2 iterations use the original loop.
>
>This patch makes the unroller avoid a tail loop in that case.
>The end result seemed easier to follow if variables were declared
>at the point of initialisation, so that it's more obvious which
>ones are meaningful even when there's no tail loop.
>
>Tested on aarch64-linux-gnu so far, will test on x86_64-linux-gnu too.
>OK to install if testing passes?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>	* tree-ssa-loop-manip.c (determine_exit_conditions): Return a null
>	exit condition if no tail loop is needed, and if the original exit
>	condition should therefore be kept as-is.
>	(tree_transform_and_unroll_loop): Handle that case here too.
>
>gcc/testsuite/
>	* gcc.dg/unroll-9.c: New test/
>---
> gcc/testsuite/gcc.dg/unroll-9.c |  12 ++
> gcc/tree-ssa-loop-manip.c       | 306 +++++++++++++++++---------------
> 2 files changed, 176 insertions(+), 142 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/unroll-9.c
>
>diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
>index 28ae1316fa0..41f9872ca10 100644
>--- a/gcc/tree-ssa-loop-manip.c
>+++ b/gcc/tree-ssa-loop-manip.c
>@@ -997,8 +997,10 @@ can_unroll_loop_p (class loop *loop, unsigned
>factor,
>/* Determines the conditions that control execution of LOOP unrolled
>FACTOR
>    times.  DESC is number of iterations of LOOP.  ENTER_COND is set to
>    condition that must be true if the main loop can be entered.
>+   If the loop does not always iterate an exact multiple of FACTOR
>times,
>EXIT_BASE, EXIT_STEP, EXIT_CMP and EXIT_BOUND are set to values
>describing
>-   how the exit from the unrolled loop should be controlled.  */
>+   how the exit from the unrolled loop should be controlled. 
>Otherwise,
>+   the trees are set to null and EXIT_CMP is set to ERROR_MARK.  */
> 
> static void
>determine_exit_conditions (class loop *loop, class tree_niter_desc
>*desc,
>@@ -1079,6 +1081,16 @@ determine_exit_conditions (class loop *loop,
>class tree_niter_desc *desc,
>   assum = fold_build2 (cmp, boolean_type_node, base, bound);
>   cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, assum, cond);
> 
>+  if (integer_nonzerop (cond)
>+      && integer_zerop (desc->may_be_zero))
>+    {
>+      /* Convert the latch count to an iteration count.  */
>+      tree niter = fold_build2 (PLUS_EXPR, type, desc->niter,
>+				build_one_cst (type));
>+      if (multiple_of_p (type, niter, bigstep))
>+	return;
>+    }
>+
>cond = force_gimple_operand (unshare_expr (cond), &stmts, false,
>NULL_TREE);
>   if (stmts)
>  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
>@@ -1234,137 +1246,138 @@ tree_transform_and_unroll_loop (class loop
>*loop, unsigned factor,
> 				transform_callback transform,
> 				void *data)
> {
>-  gcond *exit_if;
>-  tree ctr_before, ctr_after;
>-  tree enter_main_cond, exit_base, exit_step, exit_bound;
>-  enum tree_code exit_cmp;
>-  gphi *phi_old_loop, *phi_new_loop, *phi_rest;
>-  gphi_iterator psi_old_loop, psi_new_loop;
>-  tree init, next, new_init;
>-  class loop *new_loop;
>-  basic_block rest, exit_bb;
>-  edge old_entry, new_entry, old_latch, precond_edge, new_exit;
>-  edge new_nonexit, e;
>-  gimple_stmt_iterator bsi;
>-  use_operand_p op;
>-  bool ok;
>-  unsigned i;
>-  profile_probability prob, prob_entry, scale_unrolled;
>-  profile_count freq_e, freq_h;
>   gcov_type new_est_niter = niter_for_unrolled_loop (loop, factor);
>unsigned irr = loop_preheader_edge (loop)->flags &
>EDGE_IRREDUCIBLE_LOOP;
>-  auto_vec<edge> to_remove;
> 
>+  enum tree_code exit_cmp;
>+  tree enter_main_cond, exit_base, exit_step, exit_bound;
>   determine_exit_conditions (loop, desc, factor,
> 			     &enter_main_cond, &exit_base, &exit_step,
> 			     &exit_cmp, &exit_bound);
>+  bool single_loop_p = !exit_base;
> 
>/* Let us assume that the unrolled loop is quite likely to be entered. 
>*/
>+  profile_probability prob_entry;
>   if (integer_nonzerop (enter_main_cond))
>     prob_entry = profile_probability::always ();
>   else
>     prob_entry = profile_probability::guessed_always ()
> 			.apply_scale (PROB_UNROLLED_LOOP_ENTERED, 100);
> 
>-  /* The values for scales should keep profile consistent, and
>somewhat close
>-     to correct.
>-
>-     TODO: The current value of SCALE_REST makes it appear that the
>loop that
>-     is created by splitting the remaining iterations of the unrolled
>loop is
>-     executed the same number of times as the original loop, and with
>the same
>-     frequencies, which is obviously wrong.  This does not appear to
>cause
>-     problems, so we do not bother with fixing it for now.  To make
>the profile
>-     correct, we would need to change the probability of the exit edge
>of the
>-     loop, and recompute the distribution of frequencies in its body
>because
>-     of this change (scale the frequencies of blocks before and after
>the exit
>-     by appropriate factors).  */
>-  scale_unrolled = prob_entry;
>-
>-  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
>-			   prob_entry.invert (), scale_unrolled,
>-			   profile_probability::guessed_always (),
>-			   true);
>-  gcc_assert (new_loop != NULL);
>-  update_ssa (TODO_update_ssa);
>-
>-  /* Prepare the cfg and update the phi nodes.  Move the loop exit to
>the
>-     loop latch (and make its condition dummy, for the moment).  */
>-  rest = loop_preheader_edge (new_loop)->src;
>-  precond_edge = single_pred_edge (rest);
>-  split_edge (loop_latch_edge (loop));
>-  exit_bb = single_pred (loop->latch);
>-
>-  /* Since the exit edge will be removed, the frequency of all the
>blocks
>-     in the loop that are dominated by it must be scaled by
>-     1 / (1 - exit->probability).  */
>-  if (exit->probability.initialized_p ())
>-    scale_dominated_blocks_in_loop (loop, exit->src,
>-				    /* We are scaling up here so probability
>-				       does not fit.  */
>-				    loop->header->count,
>-				    loop->header->count
>-				    - loop->header->count.apply_probability
>-					 (exit->probability));
>-
>-  bsi = gsi_last_bb (exit_bb);
>-  exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node,
>-			       integer_zero_node,
>-			       NULL_TREE, NULL_TREE);
>-
>-  gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT);
>-  new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr);
>-  rescan_loop_exit (new_exit, true, false);
>-
>-  /* Set the probability of new exit to the same of the old one.  Fix
>-     the frequency of the latch block, by scaling it back by
>-     1 - exit->probability.  */
>-  new_exit->probability = exit->probability;
>-  new_nonexit = single_pred_edge (loop->latch);
>-  new_nonexit->probability = exit->probability.invert ();
>-  new_nonexit->flags = EDGE_TRUE_VALUE;
>-  if (new_nonexit->probability.initialized_p ())
>-    scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability);
>-
>-  old_entry = loop_preheader_edge (loop);
>-  new_entry = loop_preheader_edge (new_loop);
>-  old_latch = loop_latch_edge (loop);
>-  for (psi_old_loop = gsi_start_phis (loop->header),
>-       psi_new_loop = gsi_start_phis (new_loop->header);
>-       !gsi_end_p (psi_old_loop);
>-       gsi_next (&psi_old_loop), gsi_next (&psi_new_loop))
>+  gcond *exit_if = nullptr;
>+  class loop *new_loop = nullptr;
>+  basic_block rest;
>+  edge new_exit;
>+  if (!single_loop_p)
>     {
>-      phi_old_loop = psi_old_loop.phi ();
>-      phi_new_loop = psi_new_loop.phi ();
>-
>-      init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
>-      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
>-      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR
>(op)));
>-      next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
>-
>-      /* Prefer using original variable as a base for the new ssa
>name.
>-	 This is necessary for virtual ops, and useful in order to avoid
>-	 losing debug info for real ops.  */
>-      if (TREE_CODE (next) == SSA_NAME
>-	  && useless_type_conversion_p (TREE_TYPE (next),
>-					TREE_TYPE (init)))
>-	new_init = copy_ssa_name (next);
>-      else if (TREE_CODE (init) == SSA_NAME
>-	       && useless_type_conversion_p (TREE_TYPE (init),
>-					     TREE_TYPE (next)))
>-	new_init = copy_ssa_name (init);
>-      else if (useless_type_conversion_p (TREE_TYPE (next), TREE_TYPE
>(init)))
>-	new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, "unrinittmp");
>-      else
>-	new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, "unrinittmp");
>+      /* The values for scales should keep profile consistent, and
>somewhat
>+	 close to correct.
>+
>+	 TODO: The current value of SCALE_REST makes it appear that the loop
>+	 that is created by splitting the remaining iterations of the
>unrolled
>+	 loop is executed the same number of times as the original loop, and
>+	 with the same frequencies, which is obviously wrong.  This does not
>+	 appear to cause problems, so we do not bother with fixing it for
>now.
>+	 To make the profile correct, we would need to change the probability
>+	 of the exit edge of the loop, and recompute the distribution of
>+	 frequencies in its body because of this change (scale the
>frequencies
>+	 of blocks before and after the exit by appropriate factors).  */
>+      profile_probability scale_unrolled = prob_entry;
>+      new_loop = loop_version (loop, enter_main_cond, NULL,
>prob_entry,
>+			       prob_entry.invert (), scale_unrolled,
>+			       profile_probability::guessed_always (),
>+			       true);
>+      gcc_assert (new_loop != NULL);
>+      update_ssa (TODO_update_ssa);
>+
>+      /* Prepare the cfg and update the phi nodes.  Move the loop exit
>to the
>+	 loop latch (and make its condition dummy, for the moment).  */
>+      rest = loop_preheader_edge (new_loop)->src;
>+      edge precond_edge = single_pred_edge (rest);
>+      split_edge (loop_latch_edge (loop));
>+      basic_block exit_bb = single_pred (loop->latch);
>+
>+      /* Since the exit edge will be removed, the frequency of all the
>blocks
>+	 in the loop that are dominated by it must be scaled by
>+	 1 / (1 - exit->probability).  */
>+      if (exit->probability.initialized_p ())
>+	scale_dominated_blocks_in_loop (loop, exit->src,
>+					/* We are scaling up here so
>+					   probability does not fit.  */
>+					loop->header->count,
>+					loop->header->count
>+					- loop->header->count.apply_probability
>+					    (exit->probability));
>+
>+      gimple_stmt_iterator bsi = gsi_last_bb (exit_bb);
>+      exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node,
>+				   integer_zero_node,
>+				   NULL_TREE, NULL_TREE);
>+
>+      gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT);
>+      new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr);
>+      rescan_loop_exit (new_exit, true, false);
>+
>+      /* Set the probability of new exit to the same of the old one. 
>Fix
>+	 the frequency of the latch block, by scaling it back by
>+	 1 - exit->probability.  */
>+      new_exit->probability = exit->probability;
>+      edge new_nonexit = single_pred_edge (loop->latch);
>+      new_nonexit->probability = exit->probability.invert ();
>+      new_nonexit->flags = EDGE_TRUE_VALUE;
>+      if (new_nonexit->probability.initialized_p ())
>+	scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability);
>+
>+      edge old_entry = loop_preheader_edge (loop);
>+      edge new_entry = loop_preheader_edge (new_loop);
>+      edge old_latch = loop_latch_edge (loop);
>+      for (gphi_iterator psi_old_loop = gsi_start_phis (loop->header),
>+	     psi_new_loop = gsi_start_phis (new_loop->header);
>+	   !gsi_end_p (psi_old_loop);
>+	   gsi_next (&psi_old_loop), gsi_next (&psi_new_loop))
>+	{
>+	  gphi *phi_old_loop = psi_old_loop.phi ();
>+	  gphi *phi_new_loop = psi_new_loop.phi ();
>+
>+	  tree init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
>+	  use_operand_p op
>+	    = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
>+	  gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
>+	  tree next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
>+
>+	  /* Prefer using original variable as a base for the new ssa name.
>+	     This is necessary for virtual ops, and useful in order to avoid
>+	     losing debug info for real ops.  */
>+	  tree new_init;
>+	  if (TREE_CODE (next) == SSA_NAME
>+	      && useless_type_conversion_p (TREE_TYPE (next),
>+					    TREE_TYPE (init)))
>+	    new_init = copy_ssa_name (next);
>+	  else if (TREE_CODE (init) == SSA_NAME
>+		   && useless_type_conversion_p (TREE_TYPE (init),
>+						 TREE_TYPE (next)))
>+	    new_init = copy_ssa_name (init);
>+	  else if (useless_type_conversion_p (TREE_TYPE (next),
>+					      TREE_TYPE (init)))
>+	    new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
>+					   "unrinittmp");
>+	  else
>+	    new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
>+					   "unrinittmp");
> 
>-      phi_rest = create_phi_node (new_init, rest);
>+	  gphi *phi_rest = create_phi_node (new_init, rest);
>+	  add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION);
>+	  add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION);
>+	  SET_USE (op, new_init);
>+	}
> 
>-      add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION);
>-      add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION);
>-      SET_USE (op, new_init);
>+      remove_path (exit);
>+    }
>+  else
>+    {
>+      new_exit = exit;
>+      rest = exit->dest;
>     }
>-
>-  remove_path (exit);
> 
>   /* Transform the loop.  */
>   if (transform)
>@@ -1376,57 +1389,66 @@ tree_transform_and_unroll_loop (class loop
>*loop, unsigned factor,
>   bitmap_ones (wont_exit);
>   bitmap_clear_bit (wont_exit, factor - 1);
> 
>-  ok = gimple_duplicate_loop_to_header_edge
>+  auto_vec<edge> to_remove;
>+  bool ok = gimple_duplicate_loop_to_header_edge
> 	  (loop, loop_latch_edge (loop), factor - 1,
> 	   wont_exit, new_exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ);
>   gcc_assert (ok);
> 
>-  FOR_EACH_VEC_ELT (to_remove, i, e)
>+  for (edge e : to_remove)
>     {
>       ok = remove_path (e);
>       gcc_assert (ok);
>     }
>   update_ssa (TODO_update_ssa);
> 
>-  /* Ensure that the frequencies in the loop match the new estimated
>-     number of iterations, and change the probability of the new
>-     exit edge.  */
>-
>-  freq_h = loop->header->count;
>-  freq_e = (loop_preheader_edge (loop))->count ();
>-  if (freq_h.nonzero_p ())
>+  if (!single_loop_p)
>     {
>-      /* Avoid dropping loop body profile counter to 0 because of zero
>count
>-	 in loop's preheader.  */
>-      if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ()))
>-        freq_e = freq_e.force_nonzero ();
>-      scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>+      /* Ensure that the frequencies in the loop match the new
>estimated
>+	 number of iterations, and change the probability of the new
>+	 exit edge.  */
>+
>+      profile_count freq_h = loop->header->count;
>+      profile_count freq_e = (loop_preheader_edge (loop))->count ();
>+      if (freq_h.nonzero_p ())
>+	{
>+	  /* Avoid dropping loop body profile counter to 0 because of zero
>+	     count in loop's preheader.  */
>+	  if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ()))
>+	    freq_e = freq_e.force_nonzero ();
>+	  scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>+	}
>     }
> 
>-  exit_bb = single_pred (loop->latch);
>+  basic_block exit_bb = single_pred (loop->latch);
>   new_exit = find_edge (exit_bb, rest);
>   new_exit->probability = profile_probability::always ()
> 				.apply_scale (1, new_est_niter + 1);
> 
>-  rest->count += new_exit->count ();
>+  if (!single_loop_p)
>+    rest->count += new_exit->count ();
> 
>-  new_nonexit = single_pred_edge (loop->latch);
>-  prob = new_nonexit->probability;
>+  edge new_nonexit = single_pred_edge (loop->latch);
>+  profile_probability prob = new_nonexit->probability;
>   new_nonexit->probability = new_exit->probability.invert ();
>   prob = new_nonexit->probability / prob;
>   if (prob.initialized_p ())
>     scale_bbs_frequencies (&loop->latch, 1, prob);
> 
>-  /* Finally create the new counter for number of iterations and add
>the new
>-     exit instruction.  */
>-  bsi = gsi_last_nondebug_bb (exit_bb);
>-  exit_if = as_a <gcond *> (gsi_stmt (bsi));
>-  create_iv (exit_base, exit_step, NULL_TREE, loop,
>-	     &bsi, false, &ctr_before, &ctr_after);
>-  gimple_cond_set_code (exit_if, exit_cmp);
>-  gimple_cond_set_lhs (exit_if, ctr_after);
>-  gimple_cond_set_rhs (exit_if, exit_bound);
>-  update_stmt (exit_if);
>+  if (!single_loop_p)
>+    {
>+      /* Finally create the new counter for number of iterations and
>add
>+	 the new exit instruction.  */
>+      tree ctr_before, ctr_after;
>+      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb);
>+      exit_if = as_a <gcond *> (gsi_stmt (bsi));
>+      create_iv (exit_base, exit_step, NULL_TREE, loop,
>+		 &bsi, false, &ctr_before, &ctr_after);
>+      gimple_cond_set_code (exit_if, exit_cmp);
>+      gimple_cond_set_lhs (exit_if, ctr_after);
>+      gimple_cond_set_rhs (exit_if, exit_bound);
>+      update_stmt (exit_if);
>+    }
> 
>   checking_verify_flow_info ();
>   checking_verify_loop_structure ();
>diff --git a/gcc/testsuite/gcc.dg/unroll-9.c
>b/gcc/testsuite/gcc.dg/unroll-9.c
>new file mode 100644
>index 00000000000..2d65ec3691d
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/unroll-9.c
>@@ -0,0 +1,12 @@
>+/* { dg-options "-O3 -fdump-tree-unrolljam -fno-math-errno" } */
>+
>+void
>+f (float *restrict x, float y[100][100])
>+{
>+  for (int j = 0; j < 100; ++j)
>+    for (int i = 0; i < 100; ++i)
>+      x[i] += __builtin_expf (y[j][i]);
>+}
>+
>+/* The loop should be unrolled 2 times, without a tail loop.  */
>+/* { dg-final { scan-tree-dump-times "__builtin_expf" 2 "unrolljam" }
>} */
diff mbox series

Patch

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 28ae1316fa0..41f9872ca10 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -997,8 +997,10 @@  can_unroll_loop_p (class loop *loop, unsigned factor,
 /* Determines the conditions that control execution of LOOP unrolled FACTOR
    times.  DESC is number of iterations of LOOP.  ENTER_COND is set to
    condition that must be true if the main loop can be entered.
+   If the loop does not always iterate an exact multiple of FACTOR times,
    EXIT_BASE, EXIT_STEP, EXIT_CMP and EXIT_BOUND are set to values describing
-   how the exit from the unrolled loop should be controlled.  */
+   how the exit from the unrolled loop should be controlled.  Otherwise,
+   the trees are set to null and EXIT_CMP is set to ERROR_MARK.  */
 
 static void
 determine_exit_conditions (class loop *loop, class tree_niter_desc *desc,
@@ -1079,6 +1081,16 @@  determine_exit_conditions (class loop *loop, class tree_niter_desc *desc,
   assum = fold_build2 (cmp, boolean_type_node, base, bound);
   cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, assum, cond);
 
+  if (integer_nonzerop (cond)
+      && integer_zerop (desc->may_be_zero))
+    {
+      /* Convert the latch count to an iteration count.  */
+      tree niter = fold_build2 (PLUS_EXPR, type, desc->niter,
+				build_one_cst (type));
+      if (multiple_of_p (type, niter, bigstep))
+	return;
+    }
+
   cond = force_gimple_operand (unshare_expr (cond), &stmts, false, NULL_TREE);
   if (stmts)
     gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
@@ -1234,137 +1246,138 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 				transform_callback transform,
 				void *data)
 {
-  gcond *exit_if;
-  tree ctr_before, ctr_after;
-  tree enter_main_cond, exit_base, exit_step, exit_bound;
-  enum tree_code exit_cmp;
-  gphi *phi_old_loop, *phi_new_loop, *phi_rest;
-  gphi_iterator psi_old_loop, psi_new_loop;
-  tree init, next, new_init;
-  class loop *new_loop;
-  basic_block rest, exit_bb;
-  edge old_entry, new_entry, old_latch, precond_edge, new_exit;
-  edge new_nonexit, e;
-  gimple_stmt_iterator bsi;
-  use_operand_p op;
-  bool ok;
-  unsigned i;
-  profile_probability prob, prob_entry, scale_unrolled;
-  profile_count freq_e, freq_h;
   gcov_type new_est_niter = niter_for_unrolled_loop (loop, factor);
   unsigned irr = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
-  auto_vec<edge> to_remove;
 
+  enum tree_code exit_cmp;
+  tree enter_main_cond, exit_base, exit_step, exit_bound;
   determine_exit_conditions (loop, desc, factor,
 			     &enter_main_cond, &exit_base, &exit_step,
 			     &exit_cmp, &exit_bound);
+  bool single_loop_p = !exit_base;
 
   /* Let us assume that the unrolled loop is quite likely to be entered.  */
+  profile_probability prob_entry;
   if (integer_nonzerop (enter_main_cond))
     prob_entry = profile_probability::always ();
   else
     prob_entry = profile_probability::guessed_always ()
 			.apply_scale (PROB_UNROLLED_LOOP_ENTERED, 100);
 
-  /* The values for scales should keep profile consistent, and somewhat close
-     to correct.
-
-     TODO: The current value of SCALE_REST makes it appear that the loop that
-     is created by splitting the remaining iterations of the unrolled loop is
-     executed the same number of times as the original loop, and with the same
-     frequencies, which is obviously wrong.  This does not appear to cause
-     problems, so we do not bother with fixing it for now.  To make the profile
-     correct, we would need to change the probability of the exit edge of the
-     loop, and recompute the distribution of frequencies in its body because
-     of this change (scale the frequencies of blocks before and after the exit
-     by appropriate factors).  */
-  scale_unrolled = prob_entry;
-
-  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
-			   prob_entry.invert (), scale_unrolled,
-			   profile_probability::guessed_always (),
-			   true);
-  gcc_assert (new_loop != NULL);
-  update_ssa (TODO_update_ssa);
-
-  /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
-     loop latch (and make its condition dummy, for the moment).  */
-  rest = loop_preheader_edge (new_loop)->src;
-  precond_edge = single_pred_edge (rest);
-  split_edge (loop_latch_edge (loop));
-  exit_bb = single_pred (loop->latch);
-
-  /* Since the exit edge will be removed, the frequency of all the blocks
-     in the loop that are dominated by it must be scaled by
-     1 / (1 - exit->probability).  */
-  if (exit->probability.initialized_p ())
-    scale_dominated_blocks_in_loop (loop, exit->src,
-				    /* We are scaling up here so probability
-				       does not fit.  */
-				    loop->header->count,
-				    loop->header->count
-				    - loop->header->count.apply_probability
-					 (exit->probability));
-
-  bsi = gsi_last_bb (exit_bb);
-  exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node,
-			       integer_zero_node,
-			       NULL_TREE, NULL_TREE);
-
-  gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT);
-  new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr);
-  rescan_loop_exit (new_exit, true, false);
-
-  /* Set the probability of new exit to the same of the old one.  Fix
-     the frequency of the latch block, by scaling it back by
-     1 - exit->probability.  */
-  new_exit->probability = exit->probability;
-  new_nonexit = single_pred_edge (loop->latch);
-  new_nonexit->probability = exit->probability.invert ();
-  new_nonexit->flags = EDGE_TRUE_VALUE;
-  if (new_nonexit->probability.initialized_p ())
-    scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability);
-
-  old_entry = loop_preheader_edge (loop);
-  new_entry = loop_preheader_edge (new_loop);
-  old_latch = loop_latch_edge (loop);
-  for (psi_old_loop = gsi_start_phis (loop->header),
-       psi_new_loop = gsi_start_phis (new_loop->header);
-       !gsi_end_p (psi_old_loop);
-       gsi_next (&psi_old_loop), gsi_next (&psi_new_loop))
+  gcond *exit_if = nullptr;
+  class loop *new_loop = nullptr;
+  basic_block rest;
+  edge new_exit;
+  if (!single_loop_p)
     {
-      phi_old_loop = psi_old_loop.phi ();
-      phi_new_loop = psi_new_loop.phi ();
-
-      init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
-      op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
-      gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
-      next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
-
-      /* Prefer using original variable as a base for the new ssa name.
-	 This is necessary for virtual ops, and useful in order to avoid
-	 losing debug info for real ops.  */
-      if (TREE_CODE (next) == SSA_NAME
-	  && useless_type_conversion_p (TREE_TYPE (next),
-					TREE_TYPE (init)))
-	new_init = copy_ssa_name (next);
-      else if (TREE_CODE (init) == SSA_NAME
-	       && useless_type_conversion_p (TREE_TYPE (init),
-					     TREE_TYPE (next)))
-	new_init = copy_ssa_name (init);
-      else if (useless_type_conversion_p (TREE_TYPE (next), TREE_TYPE (init)))
-	new_init = make_temp_ssa_name (TREE_TYPE (next), NULL, "unrinittmp");
-      else
-	new_init = make_temp_ssa_name (TREE_TYPE (init), NULL, "unrinittmp");
+      /* The values for scales should keep profile consistent, and somewhat
+	 close to correct.
+
+	 TODO: The current value of SCALE_REST makes it appear that the loop
+	 that is created by splitting the remaining iterations of the unrolled
+	 loop is executed the same number of times as the original loop, and
+	 with the same frequencies, which is obviously wrong.  This does not
+	 appear to cause problems, so we do not bother with fixing it for now.
+	 To make the profile correct, we would need to change the probability
+	 of the exit edge of the loop, and recompute the distribution of
+	 frequencies in its body because of this change (scale the frequencies
+	 of blocks before and after the exit by appropriate factors).  */
+      profile_probability scale_unrolled = prob_entry;
+      new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
+			       prob_entry.invert (), scale_unrolled,
+			       profile_probability::guessed_always (),
+			       true);
+      gcc_assert (new_loop != NULL);
+      update_ssa (TODO_update_ssa);
+
+      /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
+	 loop latch (and make its condition dummy, for the moment).  */
+      rest = loop_preheader_edge (new_loop)->src;
+      edge precond_edge = single_pred_edge (rest);
+      split_edge (loop_latch_edge (loop));
+      basic_block exit_bb = single_pred (loop->latch);
+
+      /* Since the exit edge will be removed, the frequency of all the blocks
+	 in the loop that are dominated by it must be scaled by
+	 1 / (1 - exit->probability).  */
+      if (exit->probability.initialized_p ())
+	scale_dominated_blocks_in_loop (loop, exit->src,
+					/* We are scaling up here so
+					   probability does not fit.  */
+					loop->header->count,
+					loop->header->count
+					- loop->header->count.apply_probability
+					    (exit->probability));
+
+      gimple_stmt_iterator bsi = gsi_last_bb (exit_bb);
+      exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node,
+				   integer_zero_node,
+				   NULL_TREE, NULL_TREE);
+
+      gsi_insert_after (&bsi, exit_if, GSI_NEW_STMT);
+      new_exit = make_edge (exit_bb, rest, EDGE_FALSE_VALUE | irr);
+      rescan_loop_exit (new_exit, true, false);
+
+      /* Set the probability of new exit to the same of the old one.  Fix
+	 the frequency of the latch block, by scaling it back by
+	 1 - exit->probability.  */
+      new_exit->probability = exit->probability;
+      edge new_nonexit = single_pred_edge (loop->latch);
+      new_nonexit->probability = exit->probability.invert ();
+      new_nonexit->flags = EDGE_TRUE_VALUE;
+      if (new_nonexit->probability.initialized_p ())
+	scale_bbs_frequencies (&loop->latch, 1, new_nonexit->probability);
+
+      edge old_entry = loop_preheader_edge (loop);
+      edge new_entry = loop_preheader_edge (new_loop);
+      edge old_latch = loop_latch_edge (loop);
+      for (gphi_iterator psi_old_loop = gsi_start_phis (loop->header),
+	     psi_new_loop = gsi_start_phis (new_loop->header);
+	   !gsi_end_p (psi_old_loop);
+	   gsi_next (&psi_old_loop), gsi_next (&psi_new_loop))
+	{
+	  gphi *phi_old_loop = psi_old_loop.phi ();
+	  gphi *phi_new_loop = psi_new_loop.phi ();
+
+	  tree init = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_entry);
+	  use_operand_p op
+	    = PHI_ARG_DEF_PTR_FROM_EDGE (phi_new_loop, new_entry);
+	  gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR (op)));
+	  tree next = PHI_ARG_DEF_FROM_EDGE (phi_old_loop, old_latch);
+
+	  /* Prefer using original variable as a base for the new ssa name.
+	     This is necessary for virtual ops, and useful in order to avoid
+	     losing debug info for real ops.  */
+	  tree new_init;
+	  if (TREE_CODE (next) == SSA_NAME
+	      && useless_type_conversion_p (TREE_TYPE (next),
+					    TREE_TYPE (init)))
+	    new_init = copy_ssa_name (next);
+	  else if (TREE_CODE (init) == SSA_NAME
+		   && useless_type_conversion_p (TREE_TYPE (init),
+						 TREE_TYPE (next)))
+	    new_init = copy_ssa_name (init);
+	  else if (useless_type_conversion_p (TREE_TYPE (next),
+					      TREE_TYPE (init)))
+	    new_init = make_temp_ssa_name (TREE_TYPE (next), NULL,
+					   "unrinittmp");
+	  else
+	    new_init = make_temp_ssa_name (TREE_TYPE (init), NULL,
+					   "unrinittmp");
 
-      phi_rest = create_phi_node (new_init, rest);
+	  gphi *phi_rest = create_phi_node (new_init, rest);
+	  add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION);
+	  add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION);
+	  SET_USE (op, new_init);
+	}
 
-      add_phi_arg (phi_rest, init, precond_edge, UNKNOWN_LOCATION);
-      add_phi_arg (phi_rest, next, new_exit, UNKNOWN_LOCATION);
-      SET_USE (op, new_init);
+      remove_path (exit);
+    }
+  else
+    {
+      new_exit = exit;
+      rest = exit->dest;
     }
-
-  remove_path (exit);
 
   /* Transform the loop.  */
   if (transform)
@@ -1376,57 +1389,66 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
   bitmap_ones (wont_exit);
   bitmap_clear_bit (wont_exit, factor - 1);
 
-  ok = gimple_duplicate_loop_to_header_edge
+  auto_vec<edge> to_remove;
+  bool ok = gimple_duplicate_loop_to_header_edge
 	  (loop, loop_latch_edge (loop), factor - 1,
 	   wont_exit, new_exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ);
   gcc_assert (ok);
 
-  FOR_EACH_VEC_ELT (to_remove, i, e)
+  for (edge e : to_remove)
     {
       ok = remove_path (e);
       gcc_assert (ok);
     }
   update_ssa (TODO_update_ssa);
 
-  /* Ensure that the frequencies in the loop match the new estimated
-     number of iterations, and change the probability of the new
-     exit edge.  */
-
-  freq_h = loop->header->count;
-  freq_e = (loop_preheader_edge (loop))->count ();
-  if (freq_h.nonzero_p ())
+  if (!single_loop_p)
     {
-      /* Avoid dropping loop body profile counter to 0 because of zero count
-	 in loop's preheader.  */
-      if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ()))
-        freq_e = freq_e.force_nonzero ();
-      scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
+      /* Ensure that the frequencies in the loop match the new estimated
+	 number of iterations, and change the probability of the new
+	 exit edge.  */
+
+      profile_count freq_h = loop->header->count;
+      profile_count freq_e = (loop_preheader_edge (loop))->count ();
+      if (freq_h.nonzero_p ())
+	{
+	  /* Avoid dropping loop body profile counter to 0 because of zero
+	     count in loop's preheader.  */
+	  if (freq_h.nonzero_p () && !(freq_e == profile_count::zero ()))
+	    freq_e = freq_e.force_nonzero ();
+	  scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
+	}
     }
 
-  exit_bb = single_pred (loop->latch);
+  basic_block exit_bb = single_pred (loop->latch);
   new_exit = find_edge (exit_bb, rest);
   new_exit->probability = profile_probability::always ()
 				.apply_scale (1, new_est_niter + 1);
 
-  rest->count += new_exit->count ();
+  if (!single_loop_p)
+    rest->count += new_exit->count ();
 
-  new_nonexit = single_pred_edge (loop->latch);
-  prob = new_nonexit->probability;
+  edge new_nonexit = single_pred_edge (loop->latch);
+  profile_probability prob = new_nonexit->probability;
   new_nonexit->probability = new_exit->probability.invert ();
   prob = new_nonexit->probability / prob;
   if (prob.initialized_p ())
     scale_bbs_frequencies (&loop->latch, 1, prob);
 
-  /* Finally create the new counter for number of iterations and add the new
-     exit instruction.  */
-  bsi = gsi_last_nondebug_bb (exit_bb);
-  exit_if = as_a <gcond *> (gsi_stmt (bsi));
-  create_iv (exit_base, exit_step, NULL_TREE, loop,
-	     &bsi, false, &ctr_before, &ctr_after);
-  gimple_cond_set_code (exit_if, exit_cmp);
-  gimple_cond_set_lhs (exit_if, ctr_after);
-  gimple_cond_set_rhs (exit_if, exit_bound);
-  update_stmt (exit_if);
+  if (!single_loop_p)
+    {
+      /* Finally create the new counter for number of iterations and add
+	 the new exit instruction.  */
+      tree ctr_before, ctr_after;
+      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb);
+      exit_if = as_a <gcond *> (gsi_stmt (bsi));
+      create_iv (exit_base, exit_step, NULL_TREE, loop,
+		 &bsi, false, &ctr_before, &ctr_after);
+      gimple_cond_set_code (exit_if, exit_cmp);
+      gimple_cond_set_lhs (exit_if, ctr_after);
+      gimple_cond_set_rhs (exit_if, exit_bound);
+      update_stmt (exit_if);
+    }
 
   checking_verify_flow_info ();
   checking_verify_loop_structure ();
diff --git a/gcc/testsuite/gcc.dg/unroll-9.c b/gcc/testsuite/gcc.dg/unroll-9.c
new file mode 100644
index 00000000000..2d65ec3691d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/unroll-9.c
@@ -0,0 +1,12 @@ 
+/* { dg-options "-O3 -fdump-tree-unrolljam -fno-math-errno" } */
+
+void
+f (float *restrict x, float y[100][100])
+{
+  for (int j = 0; j < 100; ++j)
+    for (int i = 0; i < 100; ++i)
+      x[i] += __builtin_expf (y[j][i]);
+}
+
+/* The loop should be unrolled 2 times, without a tail loop.  */
+/* { dg-final { scan-tree-dump-times "__builtin_expf" 2 "unrolljam" } } */