diff mbox series

loop: Fix profile updates after unrolling [PR102385]

Message ID mptv92b4mnd.fsf@arm.com
State New
Headers show
Series loop: Fix profile updates after unrolling [PR102385] | expand

Commit Message

Richard Sandiford Oct. 5, 2021, 1:38 p.m. UTC
In g:62acc72a957b5614 I'd stopped the unroller from using
an epilogue loop in cases where the iteration count was
known to be a multiple of the unroll factor.  The epilogue
and non-epilogue cases still shared this (preexisting) code
to update the edge frequencies:

  basic_block exit_bb = single_pred (loop->latch);
  new_exit = find_edge (exit_bb, rest);
  new_exit->probability = profile_probability::always ()
                               .apply_scale (1, new_est_niter + 1);
  [etc]

But of course (in hindsight) that only makes sense for the
epilogue case, where we've already moved the main loop's exit edge
to be a sibling of the latch edge.  For the non-epilogue case,
the exit edge stays (and needs to stay) in its original position.

I don't really understand what the code is trying to do for
the epilogue case.  It has:

      /* Ensure that the frequencies in the loop match the new estimated
	 number of iterations, and change the probability of the new
	 exit edge.  */

      profile_count freq_h = loop->header->count;
      profile_count freq_e = (loop_preheader_edge (loop))->count ();
      if (freq_h.nonzero_p ())
	{
          ...
	  scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
	}

Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
header block, this has the effect of:

  new header count = freq_h * (freq_e / freq_h)

i.e. we say that the header executes exactly as often as the
preheader edge, which would only make sense if the loop never
iterates.  Also, after setting the probability of the nonexit edge
(correctly) to new_est_niter / (new_est_niter + 1), the code does:

    scale_bbs_frequencies (&loop->latch, 1, prob);

for this new probability.  I think that only makes sense if the
nonexit edge was previously unconditional (100%).  But the code
carefully preserved the probability of the original exit edge
when creating the new one.

All I'm trying to do here though is fix the mess I created
and get the probabilities right for the non-epilogue case.
Things are simpler there since we don't have to worry about
loop versioning.  Hopefully the comments explain the approach.

The function's current interface implies that it can cope with
multiple exit edges and that the function only needs the iteration
count relative to one of those edges in order to work correctly.
In practice that's not the case: it assumes there is exactly one
exit edge and all current callers also ensure that the exit test
dominates the latch.  I think the function is easier to follow
if we remove the implied generality.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	PR tree-optimization/102385
	* predict.h (change_edge_frequency): Declare.
	* predict.c (change_edge_frequency): New function.
	* tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
	edge argument.
	(tree_unroll_loop): Likewise.
	* gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
	* tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
	Likewise.
	* tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
	(tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
	to retrieve the exit edges.  Make all the old profile update code
	conditional on !single_loop_p -- the case it was written for --
	and use a different approach for the single-loop case.

gcc/testsuite/
	* testsuite/gcc.dg/pr102385.c: New test.
---
 gcc/gimple-loop-jam.c           |   3 +-
 gcc/predict.c                   |  37 +++++++++++
 gcc/predict.h                   |   1 +
 gcc/testsuite/gcc.dg/pr102385.c |  14 ++++
 gcc/tree-predcom.c              |   3 +-
 gcc/tree-ssa-loop-manip.c       | 111 ++++++++++++++++++++++++--------
 gcc/tree-ssa-loop-manip.h       |   5 +-
 gcc/tree-ssa-loop-prefetch.c    |   3 +-
 8 files changed, 140 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102385.c

Comments

Richard Biener Oct. 8, 2021, 7:14 a.m. UTC | #1
On Tue, Oct 5, 2021 at 3:39 PM Richard Sandiford via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> In g:62acc72a957b5614 I'd stopped the unroller from using
> an epilogue loop in cases where the iteration count was
> known to be a multiple of the unroll factor.  The epilogue
> and non-epilogue cases still shared this (preexisting) code
> to update the edge frequencies:
>
>   basic_block exit_bb = single_pred (loop->latch);
>   new_exit = find_edge (exit_bb, rest);
>   new_exit->probability = profile_probability::always ()
>                                .apply_scale (1, new_est_niter + 1);
>   [etc]
>
> But of course (in hindsight) that only makes sense for the
> epilogue case, where we've already moved the main loop's exit edge
> to be a sibling of the latch edge.  For the non-epilogue case,
> the exit edge stays (and needs to stay) in its original position.
>
> I don't really understand what the code is trying to do for
> the epilogue case.  It has:
>
>       /* Ensure that the frequencies in the loop match the new estimated
>          number of iterations, and change the probability of the new
>          exit edge.  */
>
>       profile_count freq_h = loop->header->count;
>       profile_count freq_e = (loop_preheader_edge (loop))->count ();
>       if (freq_h.nonzero_p ())
>         {
>           ...
>           scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>         }
>
> Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
> header block, this has the effect of:
>
>   new header count = freq_h * (freq_e / freq_h)
>
> i.e. we say that the header executes exactly as often as the
> preheader edge, which would only make sense if the loop never
> iterates.  Also, after setting the probability of the nonexit edge
> (correctly) to new_est_niter / (new_est_niter + 1), the code does:
>
>     scale_bbs_frequencies (&loop->latch, 1, prob);
>
> for this new probability.  I think that only makes sense if the
> nonexit edge was previously unconditional (100%).  But the code
> carefully preserved the probability of the original exit edge
> when creating the new one.
>
> All I'm trying to do here though is fix the mess I created
> and get the probabilities right for the non-epilogue case.
> Things are simpler there since we don't have to worry about
> loop versioning.  Hopefully the comments explain the approach.
>
> The function's current interface implies that it can cope with
> multiple exit edges and that the function only needs the iteration
> count relative to one of those edges in order to work correctly.
> In practice that's not the case: it assumes there is exactly one
> exit edge and all current callers also ensure that the exit test
> dominates the latch.  I think the function is easier to follow
> if we remove the implied generality.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         PR tree-optimization/102385
>         * predict.h (change_edge_frequency): Declare.
>         * predict.c (change_edge_frequency): New function.
>         * tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
>         edge argument.
>         (tree_unroll_loop): Likewise.
>         * gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
>         * tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
>         Likewise.
>         * tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
>         (tree_transform_and_unroll_loop): Likewise.  Use single_dom_exit
>         to retrieve the exit edges.  Make all the old profile update code
>         conditional on !single_loop_p -- the case it was written for --
>         and use a different approach for the single-loop case.
>
> gcc/testsuite/
>         * testsuite/gcc.dg/pr102385.c: New test.
> ---
>  gcc/gimple-loop-jam.c           |   3 +-
>  gcc/predict.c                   |  37 +++++++++++
>  gcc/predict.h                   |   1 +
>  gcc/testsuite/gcc.dg/pr102385.c |  14 ++++
>  gcc/tree-predcom.c              |   3 +-
>  gcc/tree-ssa-loop-manip.c       | 111 ++++++++++++++++++++++++--------
>  gcc/tree-ssa-loop-manip.h       |   5 +-
>  gcc/tree-ssa-loop-prefetch.c    |   3 +-
>  8 files changed, 140 insertions(+), 37 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102385.c
>
> diff --git a/gcc/predict.h b/gcc/predict.h
> index 8860cafa31c..4df51bd615c 100644
> --- a/gcc/predict.h
> +++ b/gcc/predict.h
> @@ -100,6 +100,7 @@ extern void rebuild_frequencies (void);
>  extern void report_predictor_hitrates (void);
>  extern void force_edge_cold (edge, bool);
>  extern void propagate_unlikely_bbs_forward (void);
> +extern void change_edge_frequency (edge, profile_probability);
>
>  extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
>
> diff --git a/gcc/predict.c b/gcc/predict.c
> index d9c7249831e..68b11135680 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -4481,6 +4481,43 @@ force_edge_cold (edge e, bool impossible)
>      }
>  }
>
> +/* Change E's probability to NEW_E_PROB, redistributing the probabilities
> +   of other outgoing edges proportionally.
> +
> +   Note that this function does not change the profile counts of any
> +   basic blocks.  The caller must do that instead, using whatever
> +   information it has about the region that needs updating.  */
> +
> +void
> +change_edge_frequency (edge e, profile_probability new_e_prob)
> +{
> +  profile_probability old_e_prob = e->probability;
> +  profile_probability old_other_prob = old_e_prob.invert ();
> +  profile_probability new_other_prob = new_e_prob.invert ();
> +
> +  e->probability = new_e_prob;
> +  profile_probability cumulative_prob = new_e_prob;
> +
> +  unsigned int num_other = EDGE_COUNT (e->src->succs) - 1;
> +  edge other_e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (other_e, ei, e->src->succs)
> +    if (other_e != e)
> +      {
> +       num_other -= 1;
> +       if (num_other == 0)
> +         /* Ensure that the probabilities add up to 1 without
> +            rounding error.  */
> +         other_e->probability = cumulative_prob.invert ();
> +       else
> +         {
> +           other_e->probability /= old_other_prob;
> +           other_e->probability *= new_other_prob;
> +           cumulative_prob += other_e->probability;
> +         }
> +      }
> +}
> +
>  #if CHECKING_P
>
>  namespace selftest {
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index 86fc118b6be..5e2d7fa11a4 100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -50,10 +50,9 @@ extern bool can_unroll_loop_p (class loop *loop, unsigned factor,
>                                class tree_niter_desc *niter);
>  extern gcov_type niter_for_unrolled_loop (class loop *, unsigned);
>  extern void tree_transform_and_unroll_loop (class loop *, unsigned,
> -                                           edge, class tree_niter_desc *,
> +                                           tree_niter_desc *,
>                                             transform_callback, void *);
> -extern void tree_unroll_loop (class loop *, unsigned,
> -                             edge, class tree_niter_desc *);
> +extern void tree_unroll_loop (class loop *, unsigned, tree_niter_desc *);
>  extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
>  extern unsigned int loop_invariant_motion_in_fun (function *, bool);
>
> diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
> index 6b195d1914f..8f1cf4d44a2 100644
> --- a/gcc/tree-predcom.c
> +++ b/gcc/tree-predcom.c
> @@ -3397,8 +3397,7 @@ pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
>          the phi nodes in execute_pred_commoning_cbck.  A bit hacky.  */
>        replace_phis_by_defined_names (m_chains);
>
> -      edge exit = single_dom_exit (m_loop);
> -      tree_transform_and_unroll_loop (m_loop, unroll_factor, exit, &desc,
> +      tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
>                                       execute_pred_commoning_cbck, &dta);
>        eliminate_temp_copies (m_loop, tmp_vars);
>      }
> diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
> index d212e391489..611d3805304 100644
> --- a/gcc/gimple-loop-jam.c
> +++ b/gcc/gimple-loop-jam.c
> @@ -587,8 +587,7 @@ tree_loop_unroll_and_jam (void)
>                              "applying unroll and jam with factor %d\n",
>                              unroll_factor);
>           initialize_original_copy_tables ();
> -         tree_unroll_loop (outer, unroll_factor, single_dom_exit (outer),
> -                           &desc);
> +         tree_unroll_loop (outer, unroll_factor, &desc);
>           free_original_copy_tables ();
>           fuse_loops (outer->inner);
>           todo |= TODO_cleanup_cfg;
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 85977e23245..04b2b33c1ba 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1962,8 +1962,7 @@ loop_prefetch_arrays (class loop *loop)
>       iterations so that we do not issue superfluous prefetches.  */
>    if (unroll_factor != 1)
>      {
> -      tree_unroll_loop (loop, unroll_factor,
> -                       single_dom_exit (loop), &desc);
> +      tree_unroll_loop (loop, unroll_factor, &desc);
>        unrolled = true;
>      }
>
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index c7a2f67b129..2d576bfa391 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -1182,8 +1182,9 @@ niter_for_unrolled_loop (class loop *loop, unsigned factor)
>    return new_est_niter;
>  }
>
> -/* Unroll LOOP FACTOR times.  DESC describes number of iterations of LOOP.
> -   EXIT is the exit of the loop to that DESC corresponds.
> +/* Unroll LOOP FACTOR times.  LOOP is known to have a single exit edge
> +   whose source block dominates the latch.  DESC describes the number of
> +   iterations of LOOP.
>
>     If N is number of iterations of the loop and MAY_BE_ZERO is the condition
>     under that loop exits in the first iteration even if N != 0,
> @@ -1241,7 +1242,7 @@ niter_for_unrolled_loop (class loop *loop, unsigned factor)
>
>  void
>  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
> -                               edge exit, class tree_niter_desc *desc,
> +                               class tree_niter_desc *desc,
>                                 transform_callback transform,
>                                 void *data)
>  {
> @@ -1265,10 +1266,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>
>    gcond *exit_if = nullptr;
>    class loop *new_loop = nullptr;
> -  basic_block rest;
>    edge new_exit;
>    if (!single_loop_p)
>      {
> +      edge exit = single_dom_exit (loop);
> +
>        /* The values for scales should keep profile consistent, and somewhat
>          close to correct.
>
> @@ -1291,7 +1293,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>
>        /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
>          loop latch (and make its condition dummy, for the moment).  */
> -      rest = loop_preheader_edge (new_loop)->src;
> +      basic_block rest = loop_preheader_edge (new_loop)->src;
>        edge precond_edge = single_pred_edge (rest);
>        split_edge (loop_latch_edge (loop));
>        basic_block exit_bb = single_pred (loop->latch);
> @@ -1373,10 +1375,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>        remove_path (exit);
>      }
>    else
> -    {
> -      new_exit = exit;
> -      rest = exit->dest;
> -    }
> +    new_exit = single_dom_exit (loop);
>
>    /* Transform the loop.  */
>    if (transform)
> @@ -1401,6 +1400,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>      }
>    update_ssa (TODO_update_ssa);
>
> +  new_exit = single_dom_exit (loop);
>    if (!single_loop_p)
>      {
>        /* Ensure that the frequencies in the loop match the new estimated
> @@ -1417,29 +1417,24 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>             freq_e = freq_e.force_nonzero ();
>           scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
>         }
> -    }
>
> -  basic_block exit_bb = single_pred (loop->latch);
> -  new_exit = find_edge (exit_bb, rest);
> -  new_exit->probability = profile_probability::always ()
> -                               .apply_scale (1, new_est_niter + 1);
> +      basic_block rest = new_exit->dest;
> +      new_exit->probability = profile_probability::always ()
> +       .apply_scale (1, new_est_niter + 1);
>
> -  if (!single_loop_p)
> -    rest->count += new_exit->count ();
> +      rest->count += new_exit->count ();
>
> -  edge new_nonexit = single_pred_edge (loop->latch);
> -  profile_probability prob = new_nonexit->probability;
> -  new_nonexit->probability = new_exit->probability.invert ();
> -  prob = new_nonexit->probability / prob;
> -  if (prob.initialized_p ())
> -    scale_bbs_frequencies (&loop->latch, 1, prob);
> +      edge new_nonexit = single_pred_edge (loop->latch);
> +      profile_probability prob = new_nonexit->probability;
> +      new_nonexit->probability = new_exit->probability.invert ();
> +      prob = new_nonexit->probability / prob;
> +      if (prob.initialized_p ())
> +       scale_bbs_frequencies (&loop->latch, 1, prob);
>
> -  if (!single_loop_p)
> -    {
>        /* Finally create the new counter for number of iterations and add
>          the new exit instruction.  */
>        tree ctr_before, ctr_after;
> -      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb);
> +      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (new_exit->src);
>        exit_if = as_a <gcond *> (gsi_stmt (bsi));
>        create_iv (exit_base, exit_step, NULL_TREE, loop,
>                  &bsi, false, &ctr_before, &ctr_after);
> @@ -1448,6 +1443,67 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>        gimple_cond_set_rhs (exit_if, exit_bound);
>        update_stmt (exit_if);
>      }
> +  else
> +    {
> +      /* gimple_duplicate_loop_to_header_edge has adjusted the loop body's
> +        original profile counts in line with the unroll factor.  However,
> +        the old counts might not have been consistent with the old
> +        iteration count.
> +
> +        Therefore, if the iteration count is known exactly, make sure that the
> +        profile counts of the loop header (and any other blocks that might be
> +        executed in the final iteration) are consistent with the combination
> +        of (a) the incoming profile count and (b) the new iteration count.  */
> +      profile_count in_count = loop_preheader_edge (loop)->count ();
> +      profile_count old_header_count = loop->header->count;
> +      if (in_count.nonzero_p ()
> +         && old_header_count.nonzero_p ()
> +         && TREE_CODE (desc->niter) == INTEGER_CST)
> +       {
> +         /* The + 1 converts latch counts to iteration counts.  */
> +         profile_count new_header_count
> +           = (in_count.apply_scale (new_est_niter + 1, 1));
> +         basic_block *body = get_loop_body (loop);
> +         scale_bbs_frequencies_profile_count (body, loop->num_nodes,
> +                                              new_header_count,
> +                                              old_header_count);
> +         free (body);
> +       }
> +
> +      /* gimple_duplicate_loop_to_header_edge discarded FACTOR - 1
> +        exit edges and adjusted the loop body's profile counts for the
> +        new probabilities of the remaining non-exit edges.  However,
> +        the remaining exit edge still has the same probability as it
> +        did before, even though it is now more likely.
> +
> +        Therefore, all blocks executed after a failed exit test now have
> +        a profile count that is too high, and the sum of the profile counts
> +        for the header's incoming edges is greater than the profile count
> +        of the header itself.
> +
> +        Adjust the profile counts of all code in the loop body after
> +        the exit test so that the sum of the counts on entry to the
> +        header agree.  */
> +      profile_count old_latch_count = loop_latch_edge (loop)->count ();
> +      profile_count new_latch_count = loop->header->count - in_count;
> +      if (old_latch_count.nonzero_p () && new_latch_count.nonzero_p ())
> +       scale_dominated_blocks_in_loop (loop, new_exit->src, new_latch_count,
> +                                       old_latch_count);
> +
> +      /* Set the probability of the exit edge based on NEW_EST_NITER
> +        (which estimates latch counts rather than iteration counts).
> +        Update the probabilities of other edges to match.
> +
> +        If the profile counts are large enough to give the required
> +        precision, the updates above will have made
> +
> +           e->dest->count / e->src->count ~= new e->probability
> +
> +        for every outgoing edge e of NEW_EXIT->src.  */
> +      profile_probability new_exit_prob = profile_probability::always ()
> +       .apply_scale (1, new_est_niter + 1);
> +      change_edge_frequency (new_exit, new_exit_prob);
> +    }
>
>    checking_verify_flow_info ();
>    checking_verify_loop_structure ();
> @@ -1461,10 +1517,9 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
>
>  void
>  tree_unroll_loop (class loop *loop, unsigned factor,
> -                 edge exit, class tree_niter_desc *desc)
> +                 class tree_niter_desc *desc)
>  {
> -  tree_transform_and_unroll_loop (loop, factor, exit, desc,
> -                                 NULL, NULL);
> +  tree_transform_and_unroll_loop (loop, factor, desc, NULL, NULL);
>  }
>
>  /* Rewrite the phi node at position PSI in function of the main
> diff --git a/gcc/testsuite/gcc.dg/pr102385.c b/gcc/testsuite/gcc.dg/pr102385.c
> new file mode 100644
> index 00000000000..1339540c722
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102385.c
> @@ -0,0 +1,14 @@
> +/* { dg-options "-Wall -Wextra -O2 -fno-toplevel-reorder -fno-tree-ch -fno-tree-dce -fno-tree-dominator-opts -fno-tree-dse -fno-tree-loop-ivcanon -fpredictive-commoning" } */
> +
> +short a, b;
> +int c[9];
> +void(d)() {}
> +void e() {
> +  a = 0;
> +  for (; a <= 4; a++) {
> +    short *f = &b;
> +    c[a] || (*f = 0);
> +    d(c[a + 2]);
> +  }
> +}
> +int main() {return 0;}
diff mbox series

Patch

diff --git a/gcc/predict.h b/gcc/predict.h
index 8860cafa31c..4df51bd615c 100644
--- a/gcc/predict.h
+++ b/gcc/predict.h
@@ -100,6 +100,7 @@  extern void rebuild_frequencies (void);
 extern void report_predictor_hitrates (void);
 extern void force_edge_cold (edge, bool);
 extern void propagate_unlikely_bbs_forward (void);
+extern void change_edge_frequency (edge, profile_probability);
 
 extern void add_reg_br_prob_note (rtx_insn *, profile_probability);
 
diff --git a/gcc/predict.c b/gcc/predict.c
index d9c7249831e..68b11135680 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -4481,6 +4481,43 @@  force_edge_cold (edge e, bool impossible)
     }
 }
 
+/* Change E's probability to NEW_E_PROB, redistributing the probabilities
+   of other outgoing edges proportionally.
+
+   Note that this function does not change the profile counts of any
+   basic blocks.  The caller must do that instead, using whatever
+   information it has about the region that needs updating.  */
+
+void
+change_edge_frequency (edge e, profile_probability new_e_prob)
+{
+  profile_probability old_e_prob = e->probability;
+  profile_probability old_other_prob = old_e_prob.invert ();
+  profile_probability new_other_prob = new_e_prob.invert ();
+
+  e->probability = new_e_prob;
+  profile_probability cumulative_prob = new_e_prob;
+
+  unsigned int num_other = EDGE_COUNT (e->src->succs) - 1;
+  edge other_e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (other_e, ei, e->src->succs)
+    if (other_e != e)
+      {
+	num_other -= 1;
+	if (num_other == 0)
+	  /* Ensure that the probabilities add up to 1 without
+	     rounding error.  */
+	  other_e->probability = cumulative_prob.invert ();
+	else
+	  {
+	    other_e->probability /= old_other_prob;
+	    other_e->probability *= new_other_prob;
+	    cumulative_prob += other_e->probability;
+	  }
+      }
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index 86fc118b6be..5e2d7fa11a4 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -50,10 +50,9 @@  extern bool can_unroll_loop_p (class loop *loop, unsigned factor,
 			       class tree_niter_desc *niter);
 extern gcov_type niter_for_unrolled_loop (class loop *, unsigned);
 extern void tree_transform_and_unroll_loop (class loop *, unsigned,
-					    edge, class tree_niter_desc *,
+					    tree_niter_desc *,
 					    transform_callback, void *);
-extern void tree_unroll_loop (class loop *, unsigned,
-			      edge, class tree_niter_desc *);
+extern void tree_unroll_loop (class loop *, unsigned, tree_niter_desc *);
 extern tree canonicalize_loop_ivs (class loop *, tree *, bool);
 extern unsigned int loop_invariant_motion_in_fun (function *, bool);
 
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 6b195d1914f..8f1cf4d44a2 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -3397,8 +3397,7 @@  pcom_worker::tree_predictive_commoning_loop (bool allow_unroll_p)
 	 the phi nodes in execute_pred_commoning_cbck.  A bit hacky.  */
       replace_phis_by_defined_names (m_chains);
 
-      edge exit = single_dom_exit (m_loop);
-      tree_transform_and_unroll_loop (m_loop, unroll_factor, exit, &desc,
+      tree_transform_and_unroll_loop (m_loop, unroll_factor, &desc,
 				      execute_pred_commoning_cbck, &dta);
       eliminate_temp_copies (m_loop, tmp_vars);
     }
diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
index d212e391489..611d3805304 100644
--- a/gcc/gimple-loop-jam.c
+++ b/gcc/gimple-loop-jam.c
@@ -587,8 +587,7 @@  tree_loop_unroll_and_jam (void)
 			     "applying unroll and jam with factor %d\n",
 			     unroll_factor);
 	  initialize_original_copy_tables ();
-	  tree_unroll_loop (outer, unroll_factor, single_dom_exit (outer),
-			    &desc);
+	  tree_unroll_loop (outer, unroll_factor, &desc);
 	  free_original_copy_tables ();
 	  fuse_loops (outer->inner);
 	  todo |= TODO_cleanup_cfg;
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 85977e23245..04b2b33c1ba 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -1962,8 +1962,7 @@  loop_prefetch_arrays (class loop *loop)
      iterations so that we do not issue superfluous prefetches.  */
   if (unroll_factor != 1)
     {
-      tree_unroll_loop (loop, unroll_factor,
-			single_dom_exit (loop), &desc);
+      tree_unroll_loop (loop, unroll_factor, &desc);
       unrolled = true;
     }
 
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index c7a2f67b129..2d576bfa391 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1182,8 +1182,9 @@  niter_for_unrolled_loop (class loop *loop, unsigned factor)
   return new_est_niter;
 }
 
-/* Unroll LOOP FACTOR times.  DESC describes number of iterations of LOOP.
-   EXIT is the exit of the loop to that DESC corresponds.
+/* Unroll LOOP FACTOR times.  LOOP is known to have a single exit edge
+   whose source block dominates the latch.  DESC describes the number of
+   iterations of LOOP.
 
    If N is number of iterations of the loop and MAY_BE_ZERO is the condition
    under that loop exits in the first iteration even if N != 0,
@@ -1241,7 +1242,7 @@  niter_for_unrolled_loop (class loop *loop, unsigned factor)
 
 void
 tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
-				edge exit, class tree_niter_desc *desc,
+				class tree_niter_desc *desc,
 				transform_callback transform,
 				void *data)
 {
@@ -1265,10 +1266,11 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 
   gcond *exit_if = nullptr;
   class loop *new_loop = nullptr;
-  basic_block rest;
   edge new_exit;
   if (!single_loop_p)
     {
+      edge exit = single_dom_exit (loop);
+
       /* The values for scales should keep profile consistent, and somewhat
 	 close to correct.
 
@@ -1291,7 +1293,7 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 
       /* Prepare the cfg and update the phi nodes.  Move the loop exit to the
 	 loop latch (and make its condition dummy, for the moment).  */
-      rest = loop_preheader_edge (new_loop)->src;
+      basic_block rest = loop_preheader_edge (new_loop)->src;
       edge precond_edge = single_pred_edge (rest);
       split_edge (loop_latch_edge (loop));
       basic_block exit_bb = single_pred (loop->latch);
@@ -1373,10 +1375,7 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
       remove_path (exit);
     }
   else
-    {
-      new_exit = exit;
-      rest = exit->dest;
-    }
+    new_exit = single_dom_exit (loop);
 
   /* Transform the loop.  */
   if (transform)
@@ -1401,6 +1400,7 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
     }
   update_ssa (TODO_update_ssa);
 
+  new_exit = single_dom_exit (loop);
   if (!single_loop_p)
     {
       /* Ensure that the frequencies in the loop match the new estimated
@@ -1417,29 +1417,24 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 	    freq_e = freq_e.force_nonzero ();
 	  scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
 	}
-    }
 
-  basic_block exit_bb = single_pred (loop->latch);
-  new_exit = find_edge (exit_bb, rest);
-  new_exit->probability = profile_probability::always ()
-				.apply_scale (1, new_est_niter + 1);
+      basic_block rest = new_exit->dest;
+      new_exit->probability = profile_probability::always ()
+	.apply_scale (1, new_est_niter + 1);
 
-  if (!single_loop_p)
-    rest->count += new_exit->count ();
+      rest->count += new_exit->count ();
 
-  edge new_nonexit = single_pred_edge (loop->latch);
-  profile_probability prob = new_nonexit->probability;
-  new_nonexit->probability = new_exit->probability.invert ();
-  prob = new_nonexit->probability / prob;
-  if (prob.initialized_p ())
-    scale_bbs_frequencies (&loop->latch, 1, prob);
+      edge new_nonexit = single_pred_edge (loop->latch);
+      profile_probability prob = new_nonexit->probability;
+      new_nonexit->probability = new_exit->probability.invert ();
+      prob = new_nonexit->probability / prob;
+      if (prob.initialized_p ())
+	scale_bbs_frequencies (&loop->latch, 1, prob);
 
-  if (!single_loop_p)
-    {
       /* Finally create the new counter for number of iterations and add
 	 the new exit instruction.  */
       tree ctr_before, ctr_after;
-      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb);
+      gimple_stmt_iterator bsi = gsi_last_nondebug_bb (new_exit->src);
       exit_if = as_a <gcond *> (gsi_stmt (bsi));
       create_iv (exit_base, exit_step, NULL_TREE, loop,
 		 &bsi, false, &ctr_before, &ctr_after);
@@ -1448,6 +1443,67 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
       gimple_cond_set_rhs (exit_if, exit_bound);
       update_stmt (exit_if);
     }
+  else
+    {
+      /* gimple_duplicate_loop_to_header_edge has adjusted the loop body's
+	 original profile counts in line with the unroll factor.  However,
+	 the old counts might not have been consistent with the old
+	 iteration count.
+
+	 Therefore, if the iteration count is known exactly, make sure that the
+	 profile counts of the loop header (and any other blocks that might be
+	 executed in the final iteration) are consistent with the combination
+	 of (a) the incoming profile count and (b) the new iteration count.  */
+      profile_count in_count = loop_preheader_edge (loop)->count ();
+      profile_count old_header_count = loop->header->count;
+      if (in_count.nonzero_p ()
+	  && old_header_count.nonzero_p ()
+	  && TREE_CODE (desc->niter) == INTEGER_CST)
+	{
+	  /* The + 1 converts latch counts to iteration counts.  */
+	  profile_count new_header_count
+	    = (in_count.apply_scale (new_est_niter + 1, 1));
+	  basic_block *body = get_loop_body (loop);
+	  scale_bbs_frequencies_profile_count (body, loop->num_nodes,
+					       new_header_count,
+					       old_header_count);
+	  free (body);
+	}
+
+      /* gimple_duplicate_loop_to_header_edge discarded FACTOR - 1
+	 exit edges and adjusted the loop body's profile counts for the
+	 new probabilities of the remaining non-exit edges.  However,
+	 the remaining exit edge still has the same probability as it
+	 did before, even though it is now more likely.
+
+	 Therefore, all blocks executed after a failed exit test now have
+	 a profile count that is too high, and the sum of the profile counts
+	 for the header's incoming edges is greater than the profile count
+	 of the header itself.
+
+	 Adjust the profile counts of all code in the loop body after
+	 the exit test so that the sum of the counts on entry to the
+	 header agree.  */
+      profile_count old_latch_count = loop_latch_edge (loop)->count ();
+      profile_count new_latch_count = loop->header->count - in_count;
+      if (old_latch_count.nonzero_p () && new_latch_count.nonzero_p ())
+	scale_dominated_blocks_in_loop (loop, new_exit->src, new_latch_count,
+					old_latch_count);
+
+      /* Set the probability of the exit edge based on NEW_EST_NITER
+	 (which estimates latch counts rather than iteration counts).
+	 Update the probabilities of other edges to match.
+
+	 If the profile counts are large enough to give the required
+	 precision, the updates above will have made
+
+	    e->dest->count / e->src->count ~= new e->probability
+
+	 for every outgoing edge e of NEW_EXIT->src.  */
+      profile_probability new_exit_prob = profile_probability::always ()
+	.apply_scale (1, new_est_niter + 1);
+      change_edge_frequency (new_exit, new_exit_prob);
+    }
 
   checking_verify_flow_info ();
   checking_verify_loop_structure ();
@@ -1461,10 +1517,9 @@  tree_transform_and_unroll_loop (class loop *loop, unsigned factor,
 
 void
 tree_unroll_loop (class loop *loop, unsigned factor,
-		  edge exit, class tree_niter_desc *desc)
+		  class tree_niter_desc *desc)
 {
-  tree_transform_and_unroll_loop (loop, factor, exit, desc,
-				  NULL, NULL);
+  tree_transform_and_unroll_loop (loop, factor, desc, NULL, NULL);
 }
 
 /* Rewrite the phi node at position PSI in function of the main
diff --git a/gcc/testsuite/gcc.dg/pr102385.c b/gcc/testsuite/gcc.dg/pr102385.c
new file mode 100644
index 00000000000..1339540c722
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102385.c
@@ -0,0 +1,14 @@ 
+/* { dg-options "-Wall -Wextra -O2 -fno-toplevel-reorder -fno-tree-ch -fno-tree-dce -fno-tree-dominator-opts -fno-tree-dse -fno-tree-loop-ivcanon -fpredictive-commoning" } */
+
+short a, b;
+int c[9];
+void(d)() {}
+void e() {
+  a = 0;
+  for (; a <= 4; a++) {
+    short *f = &b;
+    c[a] || (*f = 0);
+    d(c[a + 2]);
+  }
+}
+int main() {return 0;}