diff mbox series

predcom: Adjust some unnecessary update_ssa calls

Message ID 6f695498-7f72-fe2f-b88b-4240f0d4569a@linux.ibm.com
State New
Headers show
Series predcom: Adjust some unnecessary update_ssa calls | expand

Commit Message

Kewen.Lin June 2, 2021, 9:29 a.m. UTC
Hi,

As Richi suggested in PR100794, this patch is to remove
some unnecessary update_ssa calls with flag
TODO_update_ssa_only_virtuals, also do some refactoring.

Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, built well
on Power9 ppc64le with --with-build-config=bootstrap-O3,
and passed both P8 and P9 SPEC2017 full build with
{-O3, -Ofast} + {,-funroll-loops}.

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
	(tree_predictive_commoning_loop): Factor some cleanup stuffs into
	lambda function cleanup, remove scev_reset call, and adjust return
	value.
	(tree_predictive_commoning): Adjust for different changed values,
	only set flag TODO_update_ssa_only_virtuals if changed.
	(pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
	from todo_flags_finish.
gcc/tree-predcom.c | 50 ++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

Comments

Richard Biener June 7, 2021, 2:46 p.m. UTC | #1
On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> As Richi suggested in PR100794, this patch is to remove
> some unnecessary update_ssa calls with flag
> TODO_update_ssa_only_virtuals, also do some refactoring.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, built well
> on Power9 ppc64le with --with-build-config=bootstrap-O3,
> and passed both P8 and P9 SPEC2017 full build with
> {-O3, -Ofast} + {,-funroll-loops}.
>
> Is it ok for trunk?

LGTM, minor comment on the fancy C++:

+  auto cleanup = [&]() {
+    release_chains (chains);
+    free_data_refs (datarefs);
+    BITMAP_FREE (looparound_phis);
+    free_affine_expand_cache (&name_expansions);
+  };

+      cleanup ();
+      return 0;

so that could have been

  class cleanup {
     ~cleanup()
        {
          release_chains (chains);
          free_data_refs (datarefs);
          BITMAP_FREE (looparound_phis);
          free_affine_expand_cache (&name_expansions);
        }
  } cleanup;

?  Or some other means of adding registering a RAII-style cleanup?
I mean, we can't wrap it all in

  try {...}
  finally {...}

because C++ doesn't have finally.

OK with this tiny part of the C++ refactoring delayed, but we can also simply
discuss best options.  At least for looparound_phis a good cleanup would
be to pass the bitmap around and use auto_bitmap local to
tree_predictive_commoning_loop ...

Thanks,
Richard.

> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
>         (tree_predictive_commoning_loop): Factor some cleanup stuffs into
>         lambda function cleanup, remove scev_reset call, and adjust return
>         value.
>         (tree_predictive_commoning): Adjust for different changed values,
>         only set flag TODO_update_ssa_only_virtuals if changed.
>         (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
>         from todo_flags_finish.
>
Martin Sebor June 7, 2021, 3:20 p.m. UTC | #2
On 6/7/21 8:46 AM, Richard Biener via Gcc-patches wrote:
> On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> As Richi suggested in PR100794, this patch is to remove
>> some unnecessary update_ssa calls with flag
>> TODO_update_ssa_only_virtuals, also do some refactoring.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, built well
>> on Power9 ppc64le with --with-build-config=bootstrap-O3,
>> and passed both P8 and P9 SPEC2017 full build with
>> {-O3, -Ofast} + {,-funroll-loops}.
>>
>> Is it ok for trunk?
> 
> LGTM, minor comment on the fancy C++:
> 
> +  auto cleanup = [&]() {
> +    release_chains (chains);
> +    free_data_refs (datarefs);
> +    BITMAP_FREE (looparound_phis);
> +    free_affine_expand_cache (&name_expansions);
> +  };
> 
> +      cleanup ();
> +      return 0;
> 
> so that could have been
> 
>    class cleanup {
>       ~cleanup()
>          {
>            release_chains (chains);
>            free_data_refs (datarefs);
>            BITMAP_FREE (looparound_phis);
>            free_affine_expand_cache (&name_expansions);
>          }
>    } cleanup;
> 
> ?  Or some other means of adding registering a RAII-style cleanup?

I agree this would be better than invoking the cleanup lambda
explicitly.

Going a step further would be to encapsulate all the data in a class
(eliminating the static variables) and make
tree_predictive_commoning_loop() its member function (along with
whatever others it calls), and have the dtor take care of
the cleanup.

Of course, if the data types were classes with ctors and dtors
(including, for example, auto_vec rather than vec for chains)
the cleanup would just happen and none of the explicit calls
would be necessary.

Martin

> I mean, we can't wrap it all in
> 
>    try {...}
>    finally {...}
> 
> because C++ doesn't have finally.
> 
> OK with this tiny part of the C++ refactoring delayed, but we can also simply
> discuss best options.  At least for looparound_phis a good cleanup would
> be to pass the bitmap around and use auto_bitmap local to
> tree_predictive_commoning_loop ...
> 
> Thanks,
> Richard.
> 
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>          * tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
>>          (tree_predictive_commoning_loop): Factor some cleanup stuffs into
>>          lambda function cleanup, remove scev_reset call, and adjust return
>>          value.
>>          (tree_predictive_commoning): Adjust for different changed values,
>>          only set flag TODO_update_ssa_only_virtuals if changed.
>>          (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
>>          from todo_flags_finish.
>>
Kewen.Lin June 8, 2021, 9:30 a.m. UTC | #3
on 2021/6/7 下午10:46, Richard Biener wrote:
> On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> As Richi suggested in PR100794, this patch is to remove
>> some unnecessary update_ssa calls with flag
>> TODO_update_ssa_only_virtuals, also do some refactoring.
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, built well
>> on Power9 ppc64le with --with-build-config=bootstrap-O3,
>> and passed both P8 and P9 SPEC2017 full build with
>> {-O3, -Ofast} + {,-funroll-loops}.
>>
>> Is it ok for trunk?
> 
> LGTM, minor comment on the fancy C++:
> 
> +  auto cleanup = [&]() {
> +    release_chains (chains);
> +    free_data_refs (datarefs);
> +    BITMAP_FREE (looparound_phis);
> +    free_affine_expand_cache (&name_expansions);
> +  };
> 
> +      cleanup ();
> +      return 0;
> 
> so that could have been
> 
>   class cleanup {
>      ~cleanup()
>         {
>           release_chains (chains);
>           free_data_refs (datarefs);
>           BITMAP_FREE (looparound_phis);
>           free_affine_expand_cache (&name_expansions);
>         }
>   } cleanup;
> 
> ?  Or some other means of adding registering a RAII-style cleanup?
> I mean, we can't wrap it all in
> 
>   try {...}
>   finally {...}
> 
> because C++ doesn't have finally.
> 
> OK with this tiny part of the C++ refactoring delayed, but we can also simply
> discuss best options.  At least for looparound_phis a good cleanup would
> be to pass the bitmap around and use auto_bitmap local to
> tree_predictive_commoning_loop ...
> 

Thanks Richi!  One draft (not ready for review) is attached for the further
discussion.  It follows the idea of RAII-style cleanup.  I noticed that
Martin suggested stepping forward to make tree_predictive_commoning_loop
and its callees into one class (Thanks Martin), since there are not many
this kind of C++-style work functions, I want to double confirm which option
do you guys prefer?

One point you might have seen is that to make tree_predictive_commoning_loop
and its callees as member functions of one class can avoid to pass bitmap
looparound_phis all around what's in the draft.  :)

BR,
Kewen
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index ac1674d5486..75acc342c5a 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -375,13 +375,40 @@ struct component
   struct component *next;
 };
 
-/* Bitmap of ssa names defined by looparound phi nodes covered by chains.  */
+typedef hash_map<tree, name_expansion *> tree_expand_map_t;
+static void release_chains (vec<chain_p> chains);
 
-static bitmap looparound_phis;
+/* Class for predictive commoning data structure for one LOOP.  */
+class loop_pcom_info
+{
+public:
+  loop_pcom_info (loop_p l)
+    : loop (l), datarefs (vNULL), dependences (vNULL), chains (vNULL),
+      cache (NULL)
+  {
+    dependences.create (10);
+    datarefs.create (10);
+  }
 
-/* Cache used by tree_to_aff_combination_expand.  */
+  ~loop_pcom_info ()
+  {
+    free_data_refs (datarefs);
+    free_dependence_relations (dependences);
+    release_chains (chains);
+    free_affine_expand_cache (&cache);
+  }
 
-static hash_map<tree, name_expansion *> *name_expansions;
+  /* The pointer to the given loop.  */
+  loop_p loop;
+  /* All data references.  */
+  vec<data_reference_p> datarefs;
+  /* All data dependences.  */
+  vec<ddr_p> dependences;
+  /* All chains.  */
+  vec<chain_p> chains;
+  /* Cache used by tree_to_aff_combination_expand.  */
+  tree_expand_map_t *cache;
+};
 
 /* Dumps data reference REF to FILE.  */
 
@@ -673,13 +700,13 @@ suitable_reference_p (struct data_reference *a, enum ref_step_type *ref_step)
 /* Stores DR_OFFSET (DR) + DR_INIT (DR) to OFFSET.  */
 
 static void
-aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset)
+aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset,
+			   tree_expand_map_t **cache_ptr)
 {
   tree type = TREE_TYPE (DR_OFFSET (dr));
   aff_tree delta;
 
-  tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset,
-				  &name_expansions);
+  tree_to_aff_combination_expand (DR_OFFSET (dr), type, offset, cache_ptr);
   aff_combination_const (&delta, type, wi::to_poly_widest (DR_INIT (dr)));
   aff_combination_add (offset, &delta);
 }
@@ -692,7 +719,7 @@ aff_combination_dr_offset (struct data_reference *dr, aff_tree *offset)
 
 static bool
 determine_offset (struct data_reference *a, struct data_reference *b,
-		  poly_widest_int *off)
+		  poly_widest_int *off, tree_expand_map_t **cache_ptr)
 {
   aff_tree diff, baseb, step;
   tree typea, typeb;
@@ -720,13 +747,13 @@ determine_offset (struct data_reference *a, struct data_reference *b,
 
   /* Compare the offsets of the addresses, and check whether the difference
      is a multiple of step.  */
-  aff_combination_dr_offset (a, &diff);
-  aff_combination_dr_offset (b, &baseb);
+  aff_combination_dr_offset (a, &diff, cache_ptr);
+  aff_combination_dr_offset (b, &baseb, cache_ptr);
   aff_combination_scale (&baseb, -1);
   aff_combination_add (&diff, &baseb);
 
-  tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)),
-				  &step, &name_expansions);
+  tree_to_aff_combination_expand (DR_STEP (a), TREE_TYPE (DR_STEP (a)), &step,
+				  cache_ptr);
   return aff_combination_constant_multiple_p (&diff, &step, off);
 }
 
@@ -750,9 +777,9 @@ last_always_executed_block (class loop *loop)
 /* Splits dependence graph on DATAREFS described by DEPENDS to components.  */
 
 static struct component *
-split_data_refs_to_components (class loop *loop,
-			       vec<data_reference_p> datarefs,
-			       vec<ddr_p> depends)
+split_data_refs_to_components (class loop *loop, vec<data_reference_p> datarefs,
+			       vec<ddr_p> depends,
+			       tree_expand_map_t **cache_ptr)
 {
   unsigned i, n = datarefs.length ();
   unsigned ca, ia, ib, bad;
@@ -827,7 +854,7 @@ split_data_refs_to_components (class loop *loop,
       if (DR_IS_READ (dra) && DR_IS_READ (drb))
 	{
 	  if (ia == bad || ib == bad
-	      || !determine_offset (dra, drb, &dummy_off))
+	      || !determine_offset (dra, drb, &dummy_off, cache_ptr))
 	    continue;
 	}
       /* If A is read and B write or vice versa and there is unsuitable
@@ -842,7 +869,7 @@ split_data_refs_to_components (class loop *loop,
 	      bitmap_set_bit (no_store_store_comps, ib);
 	      continue;
 	    }
-	  else if (!determine_offset (dra, drb, &dummy_off))
+	  else if (!determine_offset (dra, drb, &dummy_off, cache_ptr))
 	    {
 	      bitmap_set_bit (no_store_store_comps, ib);
 	      merge_comps (comp_father, comp_size, bad, ia);
@@ -856,7 +883,7 @@ split_data_refs_to_components (class loop *loop,
 	      bitmap_set_bit (no_store_store_comps, ia);
 	      continue;
 	    }
-	  else if (!determine_offset (dra, drb, &dummy_off))
+	  else if (!determine_offset (dra, drb, &dummy_off, cache_ptr))
 	    {
 	      bitmap_set_bit (no_store_store_comps, ia);
 	      merge_comps (comp_father, comp_size, bad, ib);
@@ -865,7 +892,7 @@ split_data_refs_to_components (class loop *loop,
 	}
       else if (DR_IS_WRITE (dra) && DR_IS_WRITE (drb)
 	       && ia != bad && ib != bad
-	       && !determine_offset (dra, drb, &dummy_off))
+	       && !determine_offset (dra, drb, &dummy_off, cache_ptr))
 	{
 	  merge_comps (comp_father, comp_size, bad, ia);
 	  merge_comps (comp_father, comp_size, bad, ib);
@@ -949,7 +976,7 @@ end:
    loop.  */
 
 static bool
-suitable_component_p (class loop *loop, struct component *comp)
+suitable_component_p (class loop *loop, struct component *comp, tree_expand_map_t **cache_ptr)
 {
   unsigned i;
   dref a, first;
@@ -980,7 +1007,7 @@ suitable_component_p (class loop *loop, struct component *comp)
       /* Polynomial offsets are no use, since we need to know the
 	 gap between iteration numbers at compile time.  */
       poly_widest_int offset;
-      if (!determine_offset (first->ref, a->ref, &offset)
+      if (!determine_offset (first->ref, a->ref, &offset, cache_ptr)
 	  || !offset.is_constant (&a->offset))
 	return false;
 
@@ -1005,14 +1032,15 @@ suitable_component_p (class loop *loop, struct component *comp)
    the beginning of this file.  LOOP is the current loop.  */
 
 static struct component *
-filter_suitable_components (class loop *loop, struct component *comps)
+filter_suitable_components (class loop *loop, struct component *comps,
+			    tree_expand_map_t **cache_ptr)
 {
   struct component **comp, *act;
 
   for (comp = &comps; *comp; )
     {
       act = *comp;
-      if (suitable_component_p (loop, act))
+      if (suitable_component_p (loop, act, cache_ptr))
 	comp = &act->next;
       else
 	{
@@ -1206,8 +1234,8 @@ name_for_ref (dref ref)
    iterations of the innermost enclosing loop).  */
 
 static bool
-valid_initializer_p (struct data_reference *ref,
-		     unsigned distance, struct data_reference *root)
+valid_initializer_p (struct data_reference *ref, unsigned distance,
+		     struct data_reference *root, tree_expand_map_t **cache_ptr)
 {
   aff_tree diff, base, step;
   poly_widest_int off;
@@ -1228,13 +1256,13 @@ valid_initializer_p (struct data_reference *ref,
 
   /* Verify that this index of REF is equal to the root's index at
      -DISTANCE-th iteration.  */
-  aff_combination_dr_offset (root, &diff);
-  aff_combination_dr_offset (ref, &base);
+  aff_combination_dr_offset (root, &diff, cache_ptr);
+  aff_combination_dr_offset (ref, &base, cache_ptr);
   aff_combination_scale (&base, -1);
   aff_combination_add (&diff, &base);
 
   tree_to_aff_combination_expand (DR_STEP (root), TREE_TYPE (DR_STEP (root)),
-				  &step, &name_expansions);
+				  &step, cache_ptr);
   if (!aff_combination_constant_multiple_p (&diff, &step, &off))
     return false;
 
@@ -1250,7 +1278,8 @@ valid_initializer_p (struct data_reference *ref,
    is the root of the current chain.  */
 
 static gphi *
-find_looparound_phi (class loop *loop, dref ref, dref root)
+find_looparound_phi (class loop *loop, dref ref, dref root,
+		     tree_expand_map_t **cache_ptr)
 {
   tree name, init, init_ref;
   gphi *phi = NULL;
@@ -1303,7 +1332,7 @@ find_looparound_phi (class loop *loop, dref ref, dref root)
 			     init_stmt))
     return NULL;
 
-  if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref))
+  if (!valid_initializer_p (&init_dr, ref->distance + 1, root->ref, cache_ptr))
     return NULL;
 
   return phi;
@@ -1339,7 +1368,8 @@ insert_looparound_copy (chain_p chain, dref ref, gphi *phi)
    (also, it may allow us to combine chains together).  */
 
 static void
-add_looparound_copies (class loop *loop, chain_p chain)
+add_looparound_copies (class loop *loop, chain_p chain, bitmap looparound_phis,
+		       tree_expand_map_t **cache_ptr)
 {
   unsigned i;
   dref ref, root = get_chain_root (chain);
@@ -1350,7 +1380,7 @@ add_looparound_copies (class loop *loop, chain_p chain)
 
   FOR_EACH_VEC_ELT (chain->refs, i, ref)
     {
-      phi = find_looparound_phi (loop, ref, root);
+      phi = find_looparound_phi (loop, ref, root, cache_ptr);
       if (!phi)
 	continue;
 
@@ -1364,9 +1394,9 @@ add_looparound_copies (class loop *loop, chain_p chain)
    loop.  */
 
 static void
-determine_roots_comp (class loop *loop,
-		      struct component *comp,
-		      vec<chain_p> *chains)
+determine_roots_comp (class loop *loop, struct component *comp,
+		      vec<chain_p> *chains, bitmap looparound_phis,
+		      tree_expand_map_t **cache_ptr)
 {
   unsigned i;
   dref a;
@@ -1422,7 +1452,7 @@ determine_roots_comp (class loop *loop,
 	{
 	  if (nontrivial_chain_p (chain))
 	    {
-	      add_looparound_copies (loop, chain);
+	      add_looparound_copies (loop, chain, looparound_phis, cache_ptr);
 	      chains->safe_push (chain);
 	    }
 	  else
@@ -1443,7 +1473,7 @@ determine_roots_comp (class loop *loop,
 
   if (nontrivial_chain_p (chain))
     {
-      add_looparound_copies (loop, chain);
+      add_looparound_copies (loop, chain, looparound_phis, cache_ptr);
       chains->safe_push (chain);
     }
   else
@@ -1454,13 +1484,14 @@ determine_roots_comp (class loop *loop,
    separates the references to CHAINS.  LOOP is the current loop.  */
 
 static void
-determine_roots (class loop *loop,
-		 struct component *comps, vec<chain_p> *chains)
+determine_roots (class loop *loop, struct component *comps,
+		 vec<chain_p> *chains, bitmap looparound_phis,
+		 tree_expand_map_t **cache_ptr)
 {
   struct component *comp;
 
   for (comp = comps; comp; comp = comp->next)
-    determine_roots_comp (loop, comp, chains);
+    determine_roots_comp (loop, comp, chains, looparound_phis, cache_ptr);
 }
 
 /* Replace the reference in statement STMT with temporary variable
@@ -2028,7 +2059,7 @@ execute_load_motion (class loop *loop, chain_p chain, bitmap tmp_vars)
    such statement, or more statements, NULL is returned.  */
 
 static gimple *
-single_nonlooparound_use (tree name)
+single_nonlooparound_use (tree name, bitmap looparound_phis)
 {
   use_operand_p use;
   imm_use_iterator it;
@@ -2063,7 +2094,7 @@ single_nonlooparound_use (tree name)
    used.  */
 
 static void
-remove_stmt (gimple *stmt)
+remove_stmt (gimple *stmt, bitmap looparound_phis)
 {
   tree name;
   gimple *next;
@@ -2072,7 +2103,7 @@ remove_stmt (gimple *stmt)
   if (gimple_code (stmt) == GIMPLE_PHI)
     {
       name = PHI_RESULT (stmt);
-      next = single_nonlooparound_use (name);
+      next = single_nonlooparound_use (name, looparound_phis);
       reset_debug_uses (stmt);
       psi = gsi_for_stmt (stmt);
       remove_phi_node (&psi, true);
@@ -2094,7 +2125,7 @@ remove_stmt (gimple *stmt)
       name = gimple_assign_lhs (stmt);
       if (TREE_CODE (name) == SSA_NAME)
 	{
-	  next = single_nonlooparound_use (name);
+	  next = single_nonlooparound_use (name, looparound_phis);
 	  reset_debug_uses (stmt);
 	}
       else
@@ -2122,7 +2153,7 @@ remove_stmt (gimple *stmt)
 
 static void
 execute_pred_commoning_chain (class loop *loop, chain_p chain,
-			      bitmap tmp_vars)
+			      bitmap tmp_vars, bitmap looparound_phis)
 {
   unsigned i;
   dref a;
@@ -2172,7 +2203,7 @@ execute_pred_commoning_chain (class loop *loop, chain_p chain,
 	      if (last_store_p)
 		last_store_p = false;
 	      else
-		remove_stmt (a->stmt);
+		remove_stmt (a->stmt, looparound_phis);
 
 	      continue;
 	    }
@@ -2252,8 +2283,8 @@ determine_unroll_factor (vec<chain_p> chains)
    Uids of the newly created temporary variables are marked in TMP_VARS.  */
 
 static void
-execute_pred_commoning (class loop *loop, vec<chain_p> chains,
-			bitmap tmp_vars)
+execute_pred_commoning (class loop *loop, vec<chain_p> chains, bitmap tmp_vars,
+			bitmap looparound_phis)
 {
   chain_p chain;
   unsigned i;
@@ -2263,7 +2294,7 @@ execute_pred_commoning (class loop *loop, vec<chain_p> chains,
       if (chain->type == CT_INVARIANT)
 	execute_load_motion (loop, chain, tmp_vars);
       else
-	execute_pred_commoning_chain (loop, chain, tmp_vars);
+	execute_pred_commoning_chain (loop, chain, tmp_vars, looparound_phis);
     }
 
   FOR_EACH_VEC_ELT (chains, i, chain)
@@ -2277,7 +2308,7 @@ execute_pred_commoning (class loop *loop, vec<chain_p> chains,
 	  dref a;
 	  unsigned j;
 	  for (j = 1; chain->refs.iterate (j, &a); j++)
-	    remove_stmt (a->stmt);
+	    remove_stmt (a->stmt, looparound_phis);
 	}
     }
 }
@@ -2330,6 +2361,7 @@ struct epcc_data
 {
   vec<chain_p> chains;
   bitmap tmp_vars;
+  bitmap looparound_phis;
 };
 
 static void
@@ -2341,7 +2373,8 @@ execute_pred_commoning_cbck (class loop *loop, void *data)
      tree_transform_and_unroll_loop (see detailed description in
      tree_predictive_commoning_loop).  */
   replace_names_by_phis (dta->chains);
-  execute_pred_commoning (loop, dta->chains, dta->tmp_vars);
+  execute_pred_commoning (loop, dta->chains, dta->tmp_vars,
+			  dta->looparound_phis);
 }
 
 /* Base NAME and all the names in the chain of phi nodes that use it
@@ -2434,7 +2467,7 @@ chain_can_be_combined_p (chain_p chain)
    statement.  */
 
 static gimple *
-find_use_stmt (tree *name)
+find_use_stmt (tree *name, bitmap looparound_phis)
 {
   gimple *stmt;
   tree rhs, lhs;
@@ -2442,7 +2475,7 @@ find_use_stmt (tree *name)
   /* Skip over assignments.  */
   while (1)
     {
-      stmt = single_nonlooparound_use (*name);
+      stmt = single_nonlooparound_use (*name, looparound_phis);
       if (!stmt)
 	return NULL;
 
@@ -2487,7 +2520,7 @@ may_reassociate_p (tree type, enum tree_code code)
    is stored in DISTANCE.  */
 
 static gimple *
-find_associative_operation_root (gimple *stmt, unsigned *distance)
+find_associative_operation_root (gimple *stmt, unsigned *distance, bitmap looparound_phis)
 {
   tree lhs;
   gimple *next;
@@ -2503,7 +2536,7 @@ find_associative_operation_root (gimple *stmt, unsigned *distance)
       lhs = gimple_assign_lhs (stmt);
       gcc_assert (TREE_CODE (lhs) == SSA_NAME);
 
-      next = find_use_stmt (&lhs);
+      next = find_use_stmt (&lhs, looparound_phis);
       if (!next
 	  || gimple_assign_rhs_code (next) != code)
 	break;
@@ -2524,25 +2557,25 @@ find_associative_operation_root (gimple *stmt, unsigned *distance)
    NAME2.  */
 
 static gimple *
-find_common_use_stmt (tree *name1, tree *name2)
+find_common_use_stmt (tree *name1, tree *name2, bitmap looparound_phis)
 {
   gimple *stmt1, *stmt2;
 
-  stmt1 = find_use_stmt (name1);
+  stmt1 = find_use_stmt (name1, looparound_phis);
   if (!stmt1)
     return NULL;
 
-  stmt2 = find_use_stmt (name2);
+  stmt2 = find_use_stmt (name2, looparound_phis);
   if (!stmt2)
     return NULL;
 
   if (stmt1 == stmt2)
     return stmt1;
 
-  stmt1 = find_associative_operation_root (stmt1, NULL);
+  stmt1 = find_associative_operation_root (stmt1, NULL, looparound_phis);
   if (!stmt1)
     return NULL;
-  stmt2 = find_associative_operation_root (stmt2, NULL);
+  stmt2 = find_associative_operation_root (stmt2, NULL, looparound_phis);
   if (!stmt2)
     return NULL;
 
@@ -2555,7 +2588,7 @@ find_common_use_stmt (tree *name1, tree *name2)
 
 static bool
 combinable_refs_p (dref r1, dref r2,
-		   enum tree_code *code, bool *swap, tree *rslt_type)
+		   enum tree_code *code, bool *swap, tree *rslt_type, bitmap looparound_phis)
 {
   enum tree_code acode;
   bool aswap;
@@ -2567,7 +2600,7 @@ combinable_refs_p (dref r1, dref r2,
   name2 = name_for_ref (r2);
   gcc_assert (name1 != NULL_TREE && name2 != NULL_TREE);
 
-  stmt = find_common_use_stmt (&name1, &name2);
+  stmt = find_common_use_stmt (&name1, &name2, looparound_phis);
 
   if (!stmt
       /* A simple post-dominance check - make sure the combination
@@ -2623,7 +2656,7 @@ remove_name_from_operation (gimple *stmt, tree op)
    are combined in a single statement, and returns this statement.  */
 
 static gimple *
-reassociate_to_the_same_stmt (tree name1, tree name2)
+reassociate_to_the_same_stmt (tree name1, tree name2, bitmap looparound_phis)
 {
   gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
   gassign *new_stmt, *tmp_stmt;
@@ -2633,10 +2666,10 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
   tree type = TREE_TYPE (name1);
   gimple_stmt_iterator bsi;
 
-  stmt1 = find_use_stmt (&name1);
-  stmt2 = find_use_stmt (&name2);
-  root1 = find_associative_operation_root (stmt1, &dist1);
-  root2 = find_associative_operation_root (stmt2, &dist2);
+  stmt1 = find_use_stmt (&name1, looparound_phis);
+  stmt2 = find_use_stmt (&name2, looparound_phis);
+  root1 = find_associative_operation_root (stmt1, &dist1, looparound_phis);
+  root2 = find_associative_operation_root (stmt2, &dist2, looparound_phis);
   code = gimple_assign_rhs_code (stmt1);
 
   gcc_assert (root1 && root2 && root1 == root2
@@ -2651,22 +2684,22 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
 
   while (dist1 > dist2)
     {
-      s1 = find_use_stmt (&r1);
+      s1 = find_use_stmt (&r1, looparound_phis);
       r1 = gimple_assign_lhs (s1);
       dist1--;
     }
   while (dist2 > dist1)
     {
-      s2 = find_use_stmt (&r2);
+      s2 = find_use_stmt (&r2, looparound_phis);
       r2 = gimple_assign_lhs (s2);
       dist2--;
     }
 
   while (s1 != s2)
     {
-      s1 = find_use_stmt (&r1);
+      s1 = find_use_stmt (&r1, looparound_phis);
       r1 = gimple_assign_lhs (s1);
-      s2 = find_use_stmt (&r2);
+      s2 = find_use_stmt (&r2, looparound_phis);
       r2 = gimple_assign_lhs (s2);
     }
 
@@ -2708,25 +2741,25 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
    the expression so that they are used in the same statement.  */
 
 static gimple *
-stmt_combining_refs (dref r1, dref r2)
+stmt_combining_refs (dref r1, dref r2, bitmap looparound_phis)
 {
   gimple *stmt1, *stmt2;
   tree name1 = name_for_ref (r1);
   tree name2 = name_for_ref (r2);
 
-  stmt1 = find_use_stmt (&name1);
-  stmt2 = find_use_stmt (&name2);
+  stmt1 = find_use_stmt (&name1, looparound_phis);
+  stmt2 = find_use_stmt (&name2, looparound_phis);
   if (stmt1 == stmt2)
     return stmt1;
 
-  return reassociate_to_the_same_stmt (name1, name2);
+  return reassociate_to_the_same_stmt (name1, name2, looparound_phis);
 }
 
 /* Tries to combine chains CH1 and CH2 together.  If this succeeds, the
    description of the new chain is returned, otherwise we return NULL.  */
 
 static chain_p
-combine_chains (chain_p ch1, chain_p ch2)
+combine_chains (chain_p ch1, chain_p ch2, bitmap looparound_phis)
 {
   dref r1, r2, nw;
   enum tree_code op = ERROR_MARK;
@@ -2749,7 +2782,7 @@ combine_chains (chain_p ch1, chain_p ch2)
       if (r1->distance != r2->distance)
 	return NULL;
 
-      if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type))
+      if (!combinable_refs_p (r1, r2, &op, &swap, &rslt_type, looparound_phis))
 	return NULL;
     }
 
@@ -2768,7 +2801,7 @@ combine_chains (chain_p ch1, chain_p ch2)
 	       && ch2->refs.iterate (i, &r2)); i++)
     {
       nw = XCNEW (class dref_d);
-      nw->stmt = stmt_combining_refs (r1, r2);
+      nw->stmt = stmt_combining_refs (r1, r2, looparound_phis);
       nw->distance = r1->distance;
 
       new_chain->refs.safe_push (nw);
@@ -2817,7 +2850,8 @@ pcom_stmt_dominates_stmt_p (gimple *s1, gimple *s2)
 /* Try to combine the CHAINS in LOOP.  */
 
 static void
-try_combine_chains (class loop *loop, vec<chain_p> *chains)
+try_combine_chains (class loop *loop, vec<chain_p> *chains,
+		    bitmap looparound_phis)
 {
   unsigned i, j;
   chain_p ch1, ch2, cch;
@@ -2839,7 +2873,7 @@ try_combine_chains (class loop *loop, vec<chain_p> *chains)
 	  if (!chain_can_be_combined_p (ch2))
 	    continue;
 
-	  cch = combine_chains (ch1, ch2);
+	  cch = combine_chains (ch1, ch2, looparound_phis);
 	  if (cch)
 	    {
 	      worklist.safe_push (cch);
@@ -3172,6 +3206,7 @@ insert_init_seqs (class loop *loop, vec<chain_p> chains)
       }
 }
 
+
 /* Performs predictive commoning for LOOP.  Sets bit 1<<1 of return value
    if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
    form was corrupted.  Non-zero return value indicates some changes were
@@ -3180,10 +3215,7 @@ insert_init_seqs (class loop *loop, vec<chain_p> chains)
 static unsigned
 tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
 {
-  vec<data_reference_p> datarefs;
-  vec<ddr_p> dependences;
   struct component *components;
-  vec<chain_p> chains = vNULL;
   unsigned unroll_factor = 0;
   class tree_niter_desc desc;
   bool unroll = false, loop_closed_ssa = false;
@@ -3202,31 +3234,23 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
 
   /* Find the data references and split them into components according to their
      dependence relations.  */
+  loop_pcom_info info (loop);
   auto_vec<loop_p, 3> loop_nest;
-  dependences.create (10);
-  datarefs.create (10);
-  if (! compute_data_dependences_for_loop (loop, true, &loop_nest, &datarefs,
-					   &dependences))
+  if (!compute_data_dependences_for_loop (loop, true, &loop_nest,
+					  &info.datarefs, &info.dependences))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Cannot analyze data dependencies\n");
-      free_data_refs (datarefs);
-      free_dependence_relations (dependences);
       return 0;
     }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-    dump_data_dependence_relations (dump_file, dependences);
+    dump_data_dependence_relations (dump_file, info.dependences);
 
-  components = split_data_refs_to_components (loop, datarefs, dependences);
-  loop_nest.release ();
-  free_dependence_relations (dependences);
+  components = split_data_refs_to_components (loop, info.datarefs,
+					      info.dependences, &info.cache);
   if (!components)
-    {
-      free_data_refs (datarefs);
-      free_affine_expand_cache (&name_expansions);
       return 0;
-    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -3235,47 +3259,40 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
     }
 
   /* Find the suitable components and split them into chains.  */
-  components = filter_suitable_components (loop, components);
+  components = filter_suitable_components (loop, components, &info.cache);
 
   auto_bitmap tmp_vars;
-  looparound_phis = BITMAP_ALLOC (NULL);
-  determine_roots (loop, components, &chains);
+  /* Bitmap of ssa names defined by looparound phi nodes covered by chains.  */
+  auto_bitmap looparound_phis;
+  determine_roots (loop, components, &info.chains, looparound_phis, &info.cache);
   release_components (components);
 
-  auto cleanup = [&]() {
-    release_chains (chains);
-    free_data_refs (datarefs);
-    BITMAP_FREE (looparound_phis);
-    free_affine_expand_cache (&name_expansions);
-  };
-
-  if (!chains.exists ())
+  if (!info.chains.exists ())
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
 		 "Predictive commoning failed: no suitable chains\n");
-      cleanup ();
       return 0;
     }
 
-  prepare_initializers (loop, chains);
-  loop_closed_ssa = prepare_finalizers (loop, chains);
+  prepare_initializers (loop, info.chains);
+  loop_closed_ssa = prepare_finalizers (loop, info.chains);
 
   /* Try to combine the chains that are always worked with together.  */
-  try_combine_chains (loop, &chains);
+  try_combine_chains (loop, &info.chains, looparound_phis);
 
-  insert_init_seqs (loop, chains);
+  insert_init_seqs (loop, info.chains);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       fprintf (dump_file, "Before commoning:\n\n");
-      dump_chains (dump_file, chains);
+      dump_chains (dump_file, info.chains);
     }
 
   if (allow_unroll_p)
     /* Determine the unroll factor, and if the loop should be unrolled, ensure
        that its number of iterations is divisible by the factor.  */
-    unroll_factor = determine_unroll_factor (chains);
+    unroll_factor = determine_unroll_factor (info.chains);
 
   if (unroll_factor > 1)
     unroll = can_unroll_loop_p (loop, unroll_factor, &desc);
@@ -3289,8 +3306,9 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Unrolling %u times.\n", unroll_factor);
 
-      dta.chains = chains;
+      dta.chains = info.chains;
       dta.tmp_vars = tmp_vars;
+      dta.looparound_phis = looparound_phis;
 
       /* Cfg manipulations performed in tree_transform_and_unroll_loop before
 	 execute_pred_commoning_cbck is called may cause phi nodes to be
@@ -3298,7 +3316,7 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
 	 statements.  To fix this, we store the ssa names defined by the
 	 phi nodes here instead of the phi nodes themselves, and restore
 	 the phi nodes in execute_pred_commoning_cbck.  A bit hacky.  */
-      replace_phis_by_defined_names (chains);
+      replace_phis_by_defined_names (info.chains);
 
       edge exit = single_dom_exit (loop);
       tree_transform_and_unroll_loop (loop, unroll_factor, exit, &desc,
@@ -3310,11 +3328,9 @@ tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
 		 "Executing predictive commoning without unrolling.\n");
-      execute_pred_commoning (loop, chains, tmp_vars);
+      execute_pred_commoning (loop, info.chains, tmp_vars, looparound_phis);
     }
 
-  cleanup ();
-
   return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
 }
Richard Biener June 8, 2021, 11:02 a.m. UTC | #4
On Tue, Jun 8, 2021 at 11:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/6/7 下午10:46, Richard Biener wrote:
> > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> As Richi suggested in PR100794, this patch is to remove
> >> some unnecessary update_ssa calls with flag
> >> TODO_update_ssa_only_virtuals, also do some refactoring.
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu, built well
> >> on Power9 ppc64le with --with-build-config=bootstrap-O3,
> >> and passed both P8 and P9 SPEC2017 full build with
> >> {-O3, -Ofast} + {,-funroll-loops}.
> >>
> >> Is it ok for trunk?
> >
> > LGTM, minor comment on the fancy C++:
> >
> > +  auto cleanup = [&]() {
> > +    release_chains (chains);
> > +    free_data_refs (datarefs);
> > +    BITMAP_FREE (looparound_phis);
> > +    free_affine_expand_cache (&name_expansions);
> > +  };
> >
> > +      cleanup ();
> > +      return 0;
> >
> > so that could have been
> >
> >   class cleanup {
> >      ~cleanup()
> >         {
> >           release_chains (chains);
> >           free_data_refs (datarefs);
> >           BITMAP_FREE (looparound_phis);
> >           free_affine_expand_cache (&name_expansions);
> >         }
> >   } cleanup;
> >
> > ?  Or some other means of adding registering a RAII-style cleanup?
> > I mean, we can't wrap it all in
> >
> >   try {...}
> >   finally {...}
> >
> > because C++ doesn't have finally.
> >
> > OK with this tiny part of the C++ refactoring delayed, but we can also simply
> > discuss best options.  At least for looparound_phis a good cleanup would
> > be to pass the bitmap around and use auto_bitmap local to
> > tree_predictive_commoning_loop ...
> >
>
> Thanks Richi!  One draft (not ready for review) is attached for the further
> discussion.  It follows the idea of RAII-style cleanup.  I noticed that
> Martin suggested stepping forward to make tree_predictive_commoning_loop
> and its callees into one class (Thanks Martin), since there are not many
> this kind of C++-style work functions, I want to double confirm which option
> do you guys prefer?
>
> One point you might have seen is that to make tree_predictive_commoning_loop
> and its callees as member functions of one class can avoid to pass bitmap
> looparound_phis all around what's in the draft.  :)

Such general cleanup is of course desired - Giuliano started some of it within
GSoC two years ago in the attempt to thread the compilation process.  The
cleanup then helps to get rid of global state which of course interferes here
(and avoids unnecessary use of TLS vars).

So yes, encapsulating global state into a class and making accessors
member functions is something that is desired (but a lot of mechanical
work).

Thanks
Richard.

> BR,
> Kewen
>
Richard Biener June 8, 2021, 11:09 a.m. UTC | #5
On Tue, Jun 8, 2021 at 1:02 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Jun 8, 2021 at 11:31 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >
> > on 2021/6/7 下午10:46, Richard Biener wrote:
> > > On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> As Richi suggested in PR100794, this patch is to remove
> > >> some unnecessary update_ssa calls with flag
> > >> TODO_update_ssa_only_virtuals, also do some refactoring.
> > >>
> > >> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
> > >> x86_64-redhat-linux and aarch64-linux-gnu, built well
> > >> on Power9 ppc64le with --with-build-config=bootstrap-O3,
> > >> and passed both P8 and P9 SPEC2017 full build with
> > >> {-O3, -Ofast} + {,-funroll-loops}.
> > >>
> > >> Is it ok for trunk?
> > >
> > > LGTM, minor comment on the fancy C++:
> > >
> > > +  auto cleanup = [&]() {
> > > +    release_chains (chains);
> > > +    free_data_refs (datarefs);
> > > +    BITMAP_FREE (looparound_phis);
> > > +    free_affine_expand_cache (&name_expansions);
> > > +  };
> > >
> > > +      cleanup ();
> > > +      return 0;
> > >
> > > so that could have been
> > >
> > >   class cleanup {
> > >      ~cleanup()
> > >         {
> > >           release_chains (chains);
> > >           free_data_refs (datarefs);
> > >           BITMAP_FREE (looparound_phis);
> > >           free_affine_expand_cache (&name_expansions);
> > >         }
> > >   } cleanup;
> > >
> > > ?  Or some other means of adding registering a RAII-style cleanup?
> > > I mean, we can't wrap it all in
> > >
> > >   try {...}
> > >   finally {...}
> > >
> > > because C++ doesn't have finally.
> > >
> > > OK with this tiny part of the C++ refactoring delayed, but we can also simply
> > > discuss best options.  At least for looparound_phis a good cleanup would
> > > be to pass the bitmap around and use auto_bitmap local to
> > > tree_predictive_commoning_loop ...
> > >
> >
> > Thanks Richi!  One draft (not ready for review) is attached for the further
> > discussion.  It follows the idea of RAII-style cleanup.  I noticed that
> > Martin suggested stepping forward to make tree_predictive_commoning_loop
> > and its callees into one class (Thanks Martin), since there are not many
> > this kind of C++-style work functions, I want to double confirm which option
> > do you guys prefer?
> >
> > One point you might have seen is that to make tree_predictive_commoning_loop
> > and its callees as member functions of one class can avoid to pass bitmap
> > looparound_phis all around what's in the draft.  :)
>
> Such general cleanup is of course desired - Giuliano started some of it within
> GSoC two years ago in the attempt to thread the compilation process.  The
> cleanup then helps to get rid of global state which of course interferes here
> (and avoids unnecessary use of TLS vars).
>
> So yes, encapsulating global state into a class and making accessors
> member functions is something that is desired (but a lot of mechanical
> work).

Btw, the patch you posted is OK with me as well, it achieves the global
state removal, too.

Richard.

> Thanks
> Richard.
>
> > BR,
> > Kewen
> >
Martin Sebor June 8, 2021, 2:26 p.m. UTC | #6
On 6/8/21 3:30 AM, Kewen.Lin wrote:
> on 2021/6/7 下午10:46, Richard Biener wrote:
>> On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> As Richi suggested in PR100794, this patch is to remove
>>> some unnecessary update_ssa calls with flag
>>> TODO_update_ssa_only_virtuals, also do some refactoring.
>>>
>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
>>> x86_64-redhat-linux and aarch64-linux-gnu, built well
>>> on Power9 ppc64le with --with-build-config=bootstrap-O3,
>>> and passed both P8 and P9 SPEC2017 full build with
>>> {-O3, -Ofast} + {,-funroll-loops}.
>>>
>>> Is it ok for trunk?
>>
>> LGTM, minor comment on the fancy C++:
>>
>> +  auto cleanup = [&]() {
>> +    release_chains (chains);
>> +    free_data_refs (datarefs);
>> +    BITMAP_FREE (looparound_phis);
>> +    free_affine_expand_cache (&name_expansions);
>> +  };
>>
>> +      cleanup ();
>> +      return 0;
>>
>> so that could have been
>>
>>    class cleanup {
>>       ~cleanup()
>>          {
>>            release_chains (chains);
>>            free_data_refs (datarefs);
>>            BITMAP_FREE (looparound_phis);
>>            free_affine_expand_cache (&name_expansions);
>>          }
>>    } cleanup;
>>
>> ?  Or some other means of adding registering a RAII-style cleanup?
>> I mean, we can't wrap it all in
>>
>>    try {...}
>>    finally {...}
>>
>> because C++ doesn't have finally.
>>
>> OK with this tiny part of the C++ refactoring delayed, but we can also simply
>> discuss best options.  At least for looparound_phis a good cleanup would
>> be to pass the bitmap around and use auto_bitmap local to
>> tree_predictive_commoning_loop ...
>>
> 
> Thanks Richi!  One draft (not ready for review) is attached for the further
> discussion.  It follows the idea of RAII-style cleanup.  I noticed that
> Martin suggested stepping forward to make tree_predictive_commoning_loop
> and its callees into one class (Thanks Martin), since there are not many
> this kind of C++-style work functions, I want to double confirm which option
> do you guys prefer?

I meant that not necessarily as something to include in this patch
but as a suggestion for a future improvement.  If you'd like to
tackle it at any point that would be great of course :)  In any
event, thanks for double-checking!

The attached patch looks good to me as well (more for the sake of
style than anything else, declaring the class copy ctor and copy 
assignment = delete would make it clear it's not meant to be
copied, although in this case it's unlikely to make a practical
difference).

> 
> One point you might have seen is that to make tree_predictive_commoning_loop
> and its callees as member functions of one class can avoid to pass bitmap
> looparound_phis all around what's in the draft.  :)

Yes, that would simplify the interfaces of all the functions that
the info members are passed to as arguments.

Martin

> 
> BR,
> Kewen
>
diff mbox series

Patch

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 5482f50198a..02f911a08bb 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2280,8 +2280,6 @@  execute_pred_commoning (class loop *loop, vec<chain_p> chains,
 	    remove_stmt (a->stmt);
 	}
     }
-
-  update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 /* For each reference in CHAINS, if its defining statement is
@@ -3174,9 +3172,10 @@  insert_init_seqs (class loop *loop, vec<chain_p> chains)
       }
 }
 
-/* Performs predictive commoning for LOOP.  Sets bit 1<<0 of return value
-   if LOOP was unrolled; Sets bit 1<<1 of return value if loop closed ssa
-   form was corrupted.  */
+/* Performs predictive commoning for LOOP.  Sets bit 1<<1 of return value
+   if LOOP was unrolled; Sets bit 1<<2 of return value if loop closed ssa
+   form was corrupted.  Non-zero return value indicates some changes were
+   applied to this loop.  */
 
 static unsigned
 tree_predictive_commoning_loop (class loop *loop)
@@ -3188,7 +3187,6 @@  tree_predictive_commoning_loop (class loop *loop)
   unsigned unroll_factor;
   class tree_niter_desc desc;
   bool unroll = false, loop_closed_ssa = false;
-  edge exit;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Processing loop %d\n",  loop->num);
@@ -3244,13 +3242,22 @@  tree_predictive_commoning_loop (class loop *loop)
   determine_roots (loop, components, &chains);
   release_components (components);
 
+  auto cleanup = [&]() {
+    release_chains (chains);
+    free_data_refs (datarefs);
+    BITMAP_FREE (looparound_phis);
+    free_affine_expand_cache (&name_expansions);
+  };
+
   if (!chains.exists ())
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
 		 "Predictive commoning failed: no suitable chains\n");
-      goto end;
+      cleanup ();
+      return 0;
     }
+
   prepare_initializers (loop, chains);
   loop_closed_ssa = prepare_finalizers (loop, chains);
 
@@ -3268,10 +3275,8 @@  tree_predictive_commoning_loop (class loop *loop)
   /* Determine the unroll factor, and if the loop should be unrolled, ensure
      that its number of iterations is divisible by the factor.  */
   unroll_factor = determine_unroll_factor (chains);
-  scev_reset ();
   unroll = (unroll_factor > 1
 	    && can_unroll_loop_p (loop, unroll_factor, &desc));
-  exit = single_dom_exit (loop);
 
   /* Execute the predictive commoning transformations, and possibly unroll the
      loop.  */
@@ -3285,8 +3290,6 @@  tree_predictive_commoning_loop (class loop *loop)
       dta.chains = chains;
       dta.tmp_vars = tmp_vars;
 
-      update_ssa (TODO_update_ssa_only_virtuals);
-
       /* Cfg manipulations performed in tree_transform_and_unroll_loop before
 	 execute_pred_commoning_cbck is called may cause phi nodes to be
 	 reallocated, which is a problem since CHAINS may point to these
@@ -3295,6 +3298,7 @@  tree_predictive_commoning_loop (class loop *loop)
 	 the phi nodes in execute_pred_commoning_cbck.  A bit hacky.  */
       replace_phis_by_defined_names (chains);
 
+      edge exit = single_dom_exit (loop);
       tree_transform_and_unroll_loop (loop, unroll_factor, exit, &desc,
 				      execute_pred_commoning_cbck, &dta);
       eliminate_temp_copies (loop, tmp_vars);
@@ -3307,14 +3311,9 @@  tree_predictive_commoning_loop (class loop *loop)
       execute_pred_commoning (loop, chains, tmp_vars);
     }
 
-end: ;
-  release_chains (chains);
-  free_data_refs (datarefs);
-  BITMAP_FREE (looparound_phis);
+  cleanup ();
 
-  free_affine_expand_cache (&name_expansions);
-
-  return (unroll ? 1 : 0) | (loop_closed_ssa ? 2 : 0);
+  return (unroll ? 2 : 1) | (loop_closed_ssa ? 4 : 1);
 }
 
 /* Runs predictive commoning.  */
@@ -3335,12 +3334,19 @@  tree_predictive_commoning (void)
 
   if (changed > 0)
     {
-      scev_reset ();
+      ret = TODO_update_ssa_only_virtuals;
 
+      /* Some loop(s) got unrolled.  */
       if (changed > 1)
-	rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+	{
+	  scev_reset ();
 
-      ret = TODO_cleanup_cfg;
+	  /* Need to fix up loop closed SSA.  */
+	  if (changed >= 4)
+	    rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
+	  ret |= TODO_cleanup_cfg;
+	}
     }
 
   return ret;
@@ -3369,7 +3375,7 @@  const pass_data pass_data_predcom =
   0, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa_only_virtuals, /* todo_flags_finish */
+  0, /* todo_flags_finish */
 };
 
 class pass_predcom : public gimple_opt_pass