diff mbox series

Loop-ch improvements, part 1

Message ID ZK1PLZaV457x2pTt@kam.mff.cuni.cz
State New
Headers show
Series Loop-ch improvements, part 1 | expand

Commit Message

Jan Hubicka July 11, 2023, 12:46 p.m. UTC
Hi,
this patch improves profile update in loop-ch to handle situation where duplicated header
has loop invariant test.  In this case we konw that all count of the exit edge belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to duplicaiton
I also added code to turn those conditionals to constants so we do not need later
jump threading pass to clean up.

This made me to work out that the propagatoin was buggy in few aspects
 1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
    to be IV-like when they are not
 2) it did not check for novops calls that are not required to return same
    value on every invocation.
 3) I also added check for asm statement since those are not necessarily
    reproducible either.

I would like to do more changes, but tried to prevent this patch from
snowballing.  The analysis of what statements will remain after duplication can
be improved.  I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then.  Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.

I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.

Bootstrapped/regtsted x86_64-linux, OK?

Honza

gcc/ChangeLog:

	* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
	parameter and rewrite profile updating code to handle edges elimination.
	* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
	* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
	(loop_iv_derived_p): New function.
	(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
	of PHIs and propagation of IV derived variables.
	(ch_base::copy_headers): Pass around the invariant edges hash set.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.

Comments

Richard Biener July 12, 2023, 9:36 a.m. UTC | #1
On Tue, 11 Jul 2023, Jan Hubicka wrote:

> Hi,
> this patch improves profile update in loop-ch to handle situation where duplicated header
> has loop invariant test.  In this case we konw that all count of the exit edge belongs to
> the duplicated loop header edge and can update probabilities accordingly.
> Since we also do all the work to track this information from analysis to duplicaiton
> I also added code to turn those conditionals to constants so we do not need later
> jump threading pass to clean up.
> 
> This made me to work out that the propagatoin was buggy in few aspects
>  1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
>     to be IV-like when they are not
>  2) it did not check for novops calls that are not required to return same
>     value on every invocation.
>  3) I also added check for asm statement since those are not necessarily
>     reproducible either.
> 
> I would like to do more changes, but tried to prevent this patch from
> snowballing.  The analysis of what statements will remain after duplication can
> be improved.  I think we should use ranger query for other than first basic
> block, too and possibly drop the IV heuristics then.  Also it seems that a lot
> of this logic is pretty much same to analysis in peeling pass, so unifying this
> would be nice.

Indeed.

> I also think I should move the profile update out of
> gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
> since those regions are singe entry multiple exit.

Please.  Maybe move it to tree-ssa-loop-ch.cc as well.

> Bootstrapped/regtsted x86_64-linux, OK?

OK, thanks for improving this.

Richard.

> Honza
> 
> gcc/ChangeLog:
> 
> 	* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
> 	parameter and rewrite profile updating code to handle edges elimination.
> 	* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
> 	* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
> 	(loop_iv_derived_p): New function.
> 	(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
> 	of PHIs and propagation of IV derived variables.
> 	(ch_base::copy_headers): Pass around the invariant edges hash set.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.
> 
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 4989906706c..3879fb7c4c1 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -6661,14 +6661,16 @@ add_phi_args_after_copy (basic_block *region_copy, unsigned n_region,
>     true otherwise.
>  
>     ELIMINATED_EDGE is an edge that is known to be removed in the dupicated
> -   region.  */
> +   region.  ORIG_ELIMINATED_EDGES, if non-NULL is set of edges known to be
> +   removed from the original region.  */
>  
>  bool
>  gimple_duplicate_sese_region (edge entry, edge exit,
>  			      basic_block *region, unsigned n_region,
>  			      basic_block *region_copy,
>  			      bool update_dominance,
> -			      edge eliminated_edge)
> +			      edge eliminated_edge,
> +			      hash_set <edge> *orig_eliminated_edges)
>  {
>    unsigned i;
>    bool free_region_copy = false, copying_header = false;
> @@ -6747,7 +6749,8 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>  	    split_edge_bb_loc (entry), update_dominance);
>    if (total_count.initialized_p () && entry_count.initialized_p ())
>      {
> -      if (!eliminated_edge)
> +      if (!eliminated_edge
> +	  && (!orig_eliminated_edges || orig_eliminated_edges->is_empty ()))
>  	{
>  	  scale_bbs_frequencies_profile_count (region, n_region,
>  					       total_count - entry_count,
> @@ -6765,7 +6768,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>  		 if (cond1)         <- this condition will become false
>  				       and we update probabilities
>  		   goto loop_exit;
> -		 if (cond2)
> +		 if (cond2)         <- this condition is loop invariant
>  		   goto loop_exit;
>  		 goto loop_header   <- this will be redirected to loop.
>  	       // region_copy_end
> @@ -6776,6 +6779,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>  		       if (cond1)   <- we need to update probabbility here
>  			 goto loop_exit;
>  		       if (cond2)   <- and determine scaling factor here.
> +				       moreover cond2 is now always true
>  			 goto loop_exit;
>  		       else
>  			 goto loop;
> @@ -6785,53 +6789,84 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>  	     but only consumer so far is tree-ssa-loop-ch and it uses only this
>  	     to handle the common case of peeling headers which have
>  	     conditionals known to be always true upon entry.  */
> -	  gcc_assert (eliminated_edge->src == region[0]
> -		      && EDGE_COUNT (region[0]->succs) == 2
> -		      && copying_header);
> -
> -	  edge e, e_copy, eliminated_edge_copy;
> -	  if (EDGE_SUCC (region[0], 0) == eliminated_edge)
> -	    {
> -	      e = EDGE_SUCC (region[0], 1);
> -	      e_copy = EDGE_SUCC (region_copy[0], 1);
> -	      eliminated_edge_copy = EDGE_SUCC (region_copy[0], 0);
> -	    }
> -	  else
> +	  gcc_checking_assert (copying_header);
> +	  for (unsigned int i = 0; i < n_region; i++)
>  	    {
> -	      e = EDGE_SUCC (region[0], 0);
> -	      e_copy = EDGE_SUCC (region_copy[0], 0);
> -	      eliminated_edge_copy = EDGE_SUCC (region_copy[0], 1);
> -	    }
> -	  gcc_checking_assert (e != e_copy
> -			       && eliminated_edge_copy != eliminated_edge
> -			       && eliminated_edge_copy->dest
> -				  == eliminated_edge->dest);
> -
> +	      edge exit_e, exit_e_copy, e, e_copy;
> +	      if (EDGE_COUNT (region[i]->succs) == 1)
> +		{
> +		  region_copy[i]->count = entry_count;
> +		  region[i]->count -= entry_count;
> +		  continue;
> +		}
>  
> -	  /* Handle first basic block in duplicated region as in the
> -	     non-eliminating case.  */
> -	  scale_bbs_frequencies_profile_count (region_copy, n_region,
> -					       entry_count, total_count);
> -	  /* Now update redirecting eliminated edge to the other edge.
> -	     Actual CFG update is done by caller.  */
> -	  e_copy->probability = profile_probability::always ();
> -	  eliminated_edge_copy->probability = profile_probability::never ();
> -	  /* Header copying is a special case of jump threading, so use
> -	     common code to update loop body exit condition.  */
> -	  update_bb_profile_for_threading (region[0], e_copy->count (), e);
> -	  /* If we duplicated more conditionals first scale the profile of
> -	     rest of the preheader.  Then work out the probability of
> -	     entering the loop and scale rest of the loop.  */
> -	  if (n_region > 1)
> -	    {
> -	      scale_bbs_frequencies_profile_count (region_copy + 1,
> -						   n_region - 1,
> -						   e_copy->count (),
> -						   region_copy[1]->count);
> -	      scale_bbs_frequencies_profile_count (region + 1, n_region - 1,
> -						   e->count (),
> -						   region[1]->count);
> +	      gcc_checking_assert (EDGE_COUNT (region[i]->succs) == 2);
> +	      if (loop_exit_edge_p (region[0]->loop_father,
> +				    EDGE_SUCC (region[i], 0)))
> +		{
> +		  exit_e = EDGE_SUCC (region[i], 0);
> +		  exit_e_copy = EDGE_SUCC (region_copy[i], 0);
> +		  e = EDGE_SUCC (region[i], 1);
> +		  e_copy = EDGE_SUCC (region_copy[i], 1);
> +		}
> +	      else
> +		{
> +		  exit_e = EDGE_SUCC (region[i], 1);
> +		  exit_e_copy = EDGE_SUCC (region_copy[i], 1);
> +		  e = EDGE_SUCC (region[i], 0);
> +		  e_copy = EDGE_SUCC (region_copy[i], 0);
> +		}
> +	      gcc_assert (i == n_region - 1
> +			  || (e->dest == region[i + 1]
> +			      && e_copy->dest == region_copy[i + 1]));
> +	      region_copy[i]->count = entry_count;
> +	      profile_count exit_e_count = exit_e->count ();
> +	      if (eliminated_edge == exit_e)
> +		{
> +		  /* Update profile and the conditional.
> +		     CFG update is done by caller.  */
> +		  e_copy->probability = profile_probability::always ();
> +		  exit_e_copy->probability = profile_probability::never ();
> +		  gcond *cond_stmt
> +			  = as_a <gcond *>(*gsi_last_bb (region_copy[i]));
> +		  if (e_copy->flags & EDGE_TRUE_VALUE)
> +		    gimple_cond_make_true (cond_stmt);
> +		  else
> +		    gimple_cond_make_false (cond_stmt);
> +		  update_stmt (cond_stmt);
> +		  /* Header copying is a special case of jump threading, so use
> +		     common code to update loop body exit condition.  */
> +		  update_bb_profile_for_threading (region[i], entry_count, e);
> +		  eliminated_edge = NULL;
> +		}
> +	      else
> +		region[i]->count -= region_copy[i]->count;
> +	      if (orig_eliminated_edges->contains (exit_e))
> +		{
> +		  orig_eliminated_edges->remove (exit_e);
> +		  /* All exits will happen in exit_e_copy which is out of the
> +		     loop, so increase probability accordingly.
> +		     If the edge is eliminated_edge we already corrected
> +		     profile above.  */
> +		  if (entry_count.nonzero_p () && eliminated_edge != exit_e)
> +		    set_edge_probability_and_rescale_others
> +			    (exit_e_copy, exit_e_count.probability_in
> +								(entry_count));
> +		  /* Eliminate in-loop conditional.  */
> +		  e->probability = profile_probability::always ();
> +		  exit_e->probability = profile_probability::never ();
> +		  gcond *cond_stmt = as_a <gcond *>(*gsi_last_bb (region[i]));
> +		  if (e->flags & EDGE_TRUE_VALUE)
> +		    gimple_cond_make_true (cond_stmt);
> +		  else
> +		    gimple_cond_make_false (cond_stmt);
> +		  update_stmt (cond_stmt);
> +		}
> +	      entry_count = e_copy->count ();
>  	    }
> +	  /* Be sure that we seen all edges we are supposed to update.  */
> +	  gcc_checking_assert (!eliminated_edge
> +			       && orig_eliminated_edges->is_empty ());
>  	}
>      }
>  
> diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
> index b9ccd58c3db..a7cc37f3b97 100644
> --- a/gcc/tree-cfg.h
> +++ b/gcc/tree-cfg.h
> @@ -70,7 +70,8 @@ extern void add_phi_args_after_copy_bb (basic_block);
>  extern void add_phi_args_after_copy (basic_block *, unsigned, edge);
>  extern basic_block split_edge_bb_loc (edge);
>  extern bool gimple_duplicate_sese_region (edge, edge, basic_block *, unsigned,
> -					basic_block *, bool, edge);
> +					basic_block *, bool, edge,
> +				       	hash_set <edge> *);
>  extern bool gimple_duplicate_sese_tail (edge, edge, basic_block *, unsigned,
>  				      basic_block *);
>  extern void gather_blocks_in_sese_region (basic_block entry, basic_block exit,
> diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> index 72792cec21f..cae6266be46 100644
> --- a/gcc/tree-ssa-loop-ch.cc
> +++ b/gcc/tree-ssa-loop-ch.cc
> @@ -97,13 +97,42 @@ static_loop_exit (class loop *l, gimple_ranger *ranger)
>    return r == desired_static_range ? ret_e : NULL;
>  }
>  
> +/* Return true if OP is invariant.  */
> +
> +static bool
> +loop_invariant_op_p (class loop *loop,
> +		     tree op)
> +{
> +  if (is_gimple_min_invariant (op))
> +    return true;
> +  if (SSA_NAME_IS_DEFAULT_DEF (op)
> +      || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (op))))
> +    return true;
> +  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 2;
> +}
> +
> +/* Return true if OP looks like it is derived from IV.  */
> +
> +static bool
> +loop_iv_derived_p (class loop *loop,
> +		    tree op)
> +{
> +  /* Always check for invariant first.  */
> +  gcc_checking_assert (!is_gimple_min_invariant (op)
> +		       && !SSA_NAME_IS_DEFAULT_DEF (op)
> +		       && flow_bb_inside_loop_p (loop,
> +			       gimple_bb (SSA_NAME_DEF_STMT (op))));
> +  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> +}
> +
>  /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
>     instructions should be duplicated, limit is decreased by the actual
>     amount.  */
>  
>  static bool
>  should_duplicate_loop_header_p (basic_block header, class loop *loop,
> -				int *limit)
> +				int *limit,
> +				hash_set <edge> *invariant_exits)
>  {
>    gimple_stmt_iterator bsi;
>  
> @@ -152,15 +181,20 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>  
>    for (gphi_iterator psi = gsi_start_phis (header); !gsi_end_p (psi);
>         gsi_next (&psi))
> -    {
> -      gphi *phi = psi.phi ();
> -      tree res = gimple_phi_result (phi);
> -      if (INTEGRAL_TYPE_P (TREE_TYPE (res))
> -	  || POINTER_TYPE_P (TREE_TYPE (res)))
> -	gimple_set_uid (phi, 1 /* IV */);
> -      else
> -	gimple_set_uid (phi, 0);
> -    }
> +    /* If this is actual loop header PHIs indicate that the SSA_NAME
> +       may be IV.  Otherwise just give up.  */
> +    if (header == loop->header)
> +      {
> +	gphi *phi = psi.phi ();
> +	tree res = gimple_phi_result (phi);
> +	if (INTEGRAL_TYPE_P (TREE_TYPE (res))
> +	    || POINTER_TYPE_P (TREE_TYPE (res)))
> +	  gimple_set_uid (phi, 1 /* IV */);
> +	else
> +	  gimple_set_uid (phi, 0);
> +      }
> +    else
> +      gimple_set_uid (psi.phi (), 0);
>  
>    /* Count number of instructions and punt on calls.
>       Populate stmts INV/IV flag to later apply heuristics to the
> @@ -201,21 +235,26 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>        /* Classify the stmt based on whether its computation is based
>           on a IV or whether it is invariant in the loop.  */
>        gimple_set_uid (last, 0);
> -      if (!gimple_vuse (last))
> +      if (!gimple_vuse (last)
> +	  && gimple_code (last) != GIMPLE_ASM
> +	  && (gimple_code (last) != GIMPLE_CALL
> +	      || gimple_call_flags (last) & ECF_CONST))
>  	{
>  	  bool inv = true;
> -	  bool iv = false;
> +	  bool iv = true;
>  	  ssa_op_iter i;
>  	  tree op;
>  	  FOR_EACH_SSA_TREE_OPERAND (op, last, i, SSA_OP_USE)
> -	    if (!SSA_NAME_IS_DEFAULT_DEF (op)
> -		&& flow_bb_inside_loop_p (loop,
> -					  gimple_bb (SSA_NAME_DEF_STMT (op))))
> +	    if (!loop_invariant_op_p (loop, op))
>  	      {
> -		if (!(gimple_uid (SSA_NAME_DEF_STMT (op)) & 2 /* INV */))
> +		if (!loop_iv_derived_p (loop, op))
> +		  {
> +		    inv = false;
> +		    iv = false;
> +		    break;
> +		  }
> +		else
>  		  inv = false;
> -		if (gimple_uid (SSA_NAME_DEF_STMT (op)) & 1 /* IV */)
> -		  iv = true;
>  	      }
>  	  gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0));
>  	}
> @@ -225,16 +264,32 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>       the loop further.  Unless this is the original loop header.  */
>    tree lhs = gimple_cond_lhs (last);
>    tree rhs = gimple_cond_rhs (last);
> +  bool lhs_invariant = loop_invariant_op_p (loop, lhs);
> +  bool rhs_invariant = loop_invariant_op_p (loop, rhs);
> +  if (lhs_invariant && rhs_invariant)
> +    {
> +      if (invariant_exits)
> +	{
> +	  edge e;
> +	  edge_iterator ei;
> +	  FOR_EACH_EDGE (e, ei, header->succs)
> +	    if (loop_exit_edge_p (loop, e))
> +	      {
> +		if (dump_file && (dump_flags & TDF_DETAILS))
> +		  fprintf (dump_file,
> +			   "    Will elliminate invariant exit %i->%i\n",
> +			   e->src->index, e->dest->index);
> +		invariant_exits->add (e);
> +	      }
> +	}
> +      return true;
> +    }
> +  /* TODO: This is heuristics that claims that IV based ocnditionals will
> +     likely be optimized out in duplicated header.  We could use ranger
> +     query instead to tell this more precisely.  */
>    if (header != loop->header
> -      && ((TREE_CODE (lhs) == SSA_NAME
> -	   && !SSA_NAME_IS_DEFAULT_DEF (lhs)
> -	   && flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (lhs)))
> -	   && gimple_uid (SSA_NAME_DEF_STMT (lhs)) == 0)
> -	  || (TREE_CODE (rhs) == SSA_NAME
> -	      && !SSA_NAME_IS_DEFAULT_DEF (rhs)
> -	      && flow_bb_inside_loop_p (loop,
> -					gimple_bb (SSA_NAME_DEF_STMT (rhs)))
> -	      && gimple_uid (SSA_NAME_DEF_STMT (rhs)) == 0)))
> +      && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs))
> +	  || (!rhs_invariant && !loop_iv_derived_p (loop, rhs))))
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	fprintf (dump_file,
> @@ -452,7 +507,7 @@ ch_base::copy_headers (function *fun)
>  	  continue;
>  	}
>  
> -      if (should_duplicate_loop_header_p (header, loop, &remaining_limit))
> +      if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL))
>  	candidates.safe_push ({loop, static_exit});
>      }
>    /* Do not use ranger after we change the IL and not have updated SSA.  */
> @@ -482,10 +537,12 @@ ch_base::copy_headers (function *fun)
>        profile_count entry_count = profile_count::zero ();
>        edge e;
>        edge_iterator ei;
> +      hash_set <edge> invariant_exits;
>        FOR_EACH_EDGE (e, ei, loop->header->preds)
>  	if (e->src != loop->latch)
>  	  entry_count += e->count ();
> -      while (should_duplicate_loop_header_p (header, loop, &remaining_limit))
> +      while (should_duplicate_loop_header_p (header, loop, &remaining_limit,
> +					     &invariant_exits))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
> @@ -537,7 +594,8 @@ ch_base::copy_headers (function *fun)
>  
>        propagate_threaded_block_debug_into (exit->dest, entry->dest);
>        if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
> -					 true, candidate.static_exit))
> +					 true, candidate.static_exit,
> +					 &invariant_exits))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file, "Duplication failed.\n");
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
> index 16340868abf..bfb0f17284d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
> @@ -9,4 +9,4 @@ void test(int v, int q)
>  /* { dg-final { scan-tree-dump-not "Invalid sum" "ch2"} } */
>  /* dom2 optimizes out the redundant test for loop invariant v/q
>     which leads to inconsistent profile.  */
> -/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized"  { xfail *-*-* }} } */
> +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
>
diff mbox series

Patch

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 4989906706c..3879fb7c4c1 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6661,14 +6661,16 @@  add_phi_args_after_copy (basic_block *region_copy, unsigned n_region,
    true otherwise.
 
    ELIMINATED_EDGE is an edge that is known to be removed in the dupicated
-   region.  */
+   region.  ORIG_ELIMINATED_EDGES, if non-NULL is set of edges known to be
+   removed from the original region.  */
 
 bool
 gimple_duplicate_sese_region (edge entry, edge exit,
 			      basic_block *region, unsigned n_region,
 			      basic_block *region_copy,
 			      bool update_dominance,
-			      edge eliminated_edge)
+			      edge eliminated_edge,
+			      hash_set <edge> *orig_eliminated_edges)
 {
   unsigned i;
   bool free_region_copy = false, copying_header = false;
@@ -6747,7 +6749,8 @@  gimple_duplicate_sese_region (edge entry, edge exit,
 	    split_edge_bb_loc (entry), update_dominance);
   if (total_count.initialized_p () && entry_count.initialized_p ())
     {
-      if (!eliminated_edge)
+      if (!eliminated_edge
+	  && (!orig_eliminated_edges || orig_eliminated_edges->is_empty ()))
 	{
 	  scale_bbs_frequencies_profile_count (region, n_region,
 					       total_count - entry_count,
@@ -6765,7 +6768,7 @@  gimple_duplicate_sese_region (edge entry, edge exit,
 		 if (cond1)         <- this condition will become false
 				       and we update probabilities
 		   goto loop_exit;
-		 if (cond2)
+		 if (cond2)         <- this condition is loop invariant
 		   goto loop_exit;
 		 goto loop_header   <- this will be redirected to loop.
 	       // region_copy_end
@@ -6776,6 +6779,7 @@  gimple_duplicate_sese_region (edge entry, edge exit,
 		       if (cond1)   <- we need to update probabbility here
 			 goto loop_exit;
 		       if (cond2)   <- and determine scaling factor here.
+				       moreover cond2 is now always true
 			 goto loop_exit;
 		       else
 			 goto loop;
@@ -6785,53 +6789,84 @@  gimple_duplicate_sese_region (edge entry, edge exit,
 	     but only consumer so far is tree-ssa-loop-ch and it uses only this
 	     to handle the common case of peeling headers which have
 	     conditionals known to be always true upon entry.  */
-	  gcc_assert (eliminated_edge->src == region[0]
-		      && EDGE_COUNT (region[0]->succs) == 2
-		      && copying_header);
-
-	  edge e, e_copy, eliminated_edge_copy;
-	  if (EDGE_SUCC (region[0], 0) == eliminated_edge)
-	    {
-	      e = EDGE_SUCC (region[0], 1);
-	      e_copy = EDGE_SUCC (region_copy[0], 1);
-	      eliminated_edge_copy = EDGE_SUCC (region_copy[0], 0);
-	    }
-	  else
+	  gcc_checking_assert (copying_header);
+	  for (unsigned int i = 0; i < n_region; i++)
 	    {
-	      e = EDGE_SUCC (region[0], 0);
-	      e_copy = EDGE_SUCC (region_copy[0], 0);
-	      eliminated_edge_copy = EDGE_SUCC (region_copy[0], 1);
-	    }
-	  gcc_checking_assert (e != e_copy
-			       && eliminated_edge_copy != eliminated_edge
-			       && eliminated_edge_copy->dest
-				  == eliminated_edge->dest);
-
+	      edge exit_e, exit_e_copy, e, e_copy;
+	      if (EDGE_COUNT (region[i]->succs) == 1)
+		{
+		  region_copy[i]->count = entry_count;
+		  region[i]->count -= entry_count;
+		  continue;
+		}
 
-	  /* Handle first basic block in duplicated region as in the
-	     non-eliminating case.  */
-	  scale_bbs_frequencies_profile_count (region_copy, n_region,
-					       entry_count, total_count);
-	  /* Now update redirecting eliminated edge to the other edge.
-	     Actual CFG update is done by caller.  */
-	  e_copy->probability = profile_probability::always ();
-	  eliminated_edge_copy->probability = profile_probability::never ();
-	  /* Header copying is a special case of jump threading, so use
-	     common code to update loop body exit condition.  */
-	  update_bb_profile_for_threading (region[0], e_copy->count (), e);
-	  /* If we duplicated more conditionals first scale the profile of
-	     rest of the preheader.  Then work out the probability of
-	     entering the loop and scale rest of the loop.  */
-	  if (n_region > 1)
-	    {
-	      scale_bbs_frequencies_profile_count (region_copy + 1,
-						   n_region - 1,
-						   e_copy->count (),
-						   region_copy[1]->count);
-	      scale_bbs_frequencies_profile_count (region + 1, n_region - 1,
-						   e->count (),
-						   region[1]->count);
+	      gcc_checking_assert (EDGE_COUNT (region[i]->succs) == 2);
+	      if (loop_exit_edge_p (region[0]->loop_father,
+				    EDGE_SUCC (region[i], 0)))
+		{
+		  exit_e = EDGE_SUCC (region[i], 0);
+		  exit_e_copy = EDGE_SUCC (region_copy[i], 0);
+		  e = EDGE_SUCC (region[i], 1);
+		  e_copy = EDGE_SUCC (region_copy[i], 1);
+		}
+	      else
+		{
+		  exit_e = EDGE_SUCC (region[i], 1);
+		  exit_e_copy = EDGE_SUCC (region_copy[i], 1);
+		  e = EDGE_SUCC (region[i], 0);
+		  e_copy = EDGE_SUCC (region_copy[i], 0);
+		}
+	      gcc_assert (i == n_region - 1
+			  || (e->dest == region[i + 1]
+			      && e_copy->dest == region_copy[i + 1]));
+	      region_copy[i]->count = entry_count;
+	      profile_count exit_e_count = exit_e->count ();
+	      if (eliminated_edge == exit_e)
+		{
+		  /* Update profile and the conditional.
+		     CFG update is done by caller.  */
+		  e_copy->probability = profile_probability::always ();
+		  exit_e_copy->probability = profile_probability::never ();
+		  gcond *cond_stmt
+			  = as_a <gcond *>(*gsi_last_bb (region_copy[i]));
+		  if (e_copy->flags & EDGE_TRUE_VALUE)
+		    gimple_cond_make_true (cond_stmt);
+		  else
+		    gimple_cond_make_false (cond_stmt);
+		  update_stmt (cond_stmt);
+		  /* Header copying is a special case of jump threading, so use
+		     common code to update loop body exit condition.  */
+		  update_bb_profile_for_threading (region[i], entry_count, e);
+		  eliminated_edge = NULL;
+		}
+	      else
+		region[i]->count -= region_copy[i]->count;
+	      if (orig_eliminated_edges->contains (exit_e))
+		{
+		  orig_eliminated_edges->remove (exit_e);
+		  /* All exits will happen in exit_e_copy which is out of the
+		     loop, so increase probability accordingly.
+		     If the edge is eliminated_edge we already corrected
+		     profile above.  */
+		  if (entry_count.nonzero_p () && eliminated_edge != exit_e)
+		    set_edge_probability_and_rescale_others
+			    (exit_e_copy, exit_e_count.probability_in
+								(entry_count));
+		  /* Eliminate in-loop conditional.  */
+		  e->probability = profile_probability::always ();
+		  exit_e->probability = profile_probability::never ();
+		  gcond *cond_stmt = as_a <gcond *>(*gsi_last_bb (region[i]));
+		  if (e->flags & EDGE_TRUE_VALUE)
+		    gimple_cond_make_true (cond_stmt);
+		  else
+		    gimple_cond_make_false (cond_stmt);
+		  update_stmt (cond_stmt);
+		}
+	      entry_count = e_copy->count ();
 	    }
+	  /* Be sure that we seen all edges we are supposed to update.  */
+	  gcc_checking_assert (!eliminated_edge
+			       && orig_eliminated_edges->is_empty ());
 	}
     }
 
diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
index b9ccd58c3db..a7cc37f3b97 100644
--- a/gcc/tree-cfg.h
+++ b/gcc/tree-cfg.h
@@ -70,7 +70,8 @@  extern void add_phi_args_after_copy_bb (basic_block);
 extern void add_phi_args_after_copy (basic_block *, unsigned, edge);
 extern basic_block split_edge_bb_loc (edge);
 extern bool gimple_duplicate_sese_region (edge, edge, basic_block *, unsigned,
-					basic_block *, bool, edge);
+					basic_block *, bool, edge,
+				       	hash_set <edge> *);
 extern bool gimple_duplicate_sese_tail (edge, edge, basic_block *, unsigned,
 				      basic_block *);
 extern void gather_blocks_in_sese_region (basic_block entry, basic_block exit,
diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 72792cec21f..cae6266be46 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -97,13 +97,42 @@  static_loop_exit (class loop *l, gimple_ranger *ranger)
   return r == desired_static_range ? ret_e : NULL;
 }
 
+/* Return true if OP is invariant.  */
+
+static bool
+loop_invariant_op_p (class loop *loop,
+		     tree op)
+{
+  if (is_gimple_min_invariant (op))
+    return true;
+  if (SSA_NAME_IS_DEFAULT_DEF (op)
+      || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (op))))
+    return true;
+  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 2;
+}
+
+/* Return true if OP looks like it is derived from IV.  */
+
+static bool
+loop_iv_derived_p (class loop *loop,
+		    tree op)
+{
+  /* Always check for invariant first.  */
+  gcc_checking_assert (!is_gimple_min_invariant (op)
+		       && !SSA_NAME_IS_DEFAULT_DEF (op)
+		       && flow_bb_inside_loop_p (loop,
+			       gimple_bb (SSA_NAME_DEF_STMT (op))));
+  return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
+}
+
 /* Check whether we should duplicate HEADER of LOOP.  At most *LIMIT
    instructions should be duplicated, limit is decreased by the actual
    amount.  */
 
 static bool
 should_duplicate_loop_header_p (basic_block header, class loop *loop,
-				int *limit)
+				int *limit,
+				hash_set <edge> *invariant_exits)
 {
   gimple_stmt_iterator bsi;
 
@@ -152,15 +181,20 @@  should_duplicate_loop_header_p (basic_block header, class loop *loop,
 
   for (gphi_iterator psi = gsi_start_phis (header); !gsi_end_p (psi);
        gsi_next (&psi))
-    {
-      gphi *phi = psi.phi ();
-      tree res = gimple_phi_result (phi);
-      if (INTEGRAL_TYPE_P (TREE_TYPE (res))
-	  || POINTER_TYPE_P (TREE_TYPE (res)))
-	gimple_set_uid (phi, 1 /* IV */);
-      else
-	gimple_set_uid (phi, 0);
-    }
+    /* If this is actual loop header PHIs indicate that the SSA_NAME
+       may be IV.  Otherwise just give up.  */
+    if (header == loop->header)
+      {
+	gphi *phi = psi.phi ();
+	tree res = gimple_phi_result (phi);
+	if (INTEGRAL_TYPE_P (TREE_TYPE (res))
+	    || POINTER_TYPE_P (TREE_TYPE (res)))
+	  gimple_set_uid (phi, 1 /* IV */);
+	else
+	  gimple_set_uid (phi, 0);
+      }
+    else
+      gimple_set_uid (psi.phi (), 0);
 
   /* Count number of instructions and punt on calls.
      Populate stmts INV/IV flag to later apply heuristics to the
@@ -201,21 +235,26 @@  should_duplicate_loop_header_p (basic_block header, class loop *loop,
       /* Classify the stmt based on whether its computation is based
          on a IV or whether it is invariant in the loop.  */
       gimple_set_uid (last, 0);
-      if (!gimple_vuse (last))
+      if (!gimple_vuse (last)
+	  && gimple_code (last) != GIMPLE_ASM
+	  && (gimple_code (last) != GIMPLE_CALL
+	      || gimple_call_flags (last) & ECF_CONST))
 	{
 	  bool inv = true;
-	  bool iv = false;
+	  bool iv = true;
 	  ssa_op_iter i;
 	  tree op;
 	  FOR_EACH_SSA_TREE_OPERAND (op, last, i, SSA_OP_USE)
-	    if (!SSA_NAME_IS_DEFAULT_DEF (op)
-		&& flow_bb_inside_loop_p (loop,
-					  gimple_bb (SSA_NAME_DEF_STMT (op))))
+	    if (!loop_invariant_op_p (loop, op))
 	      {
-		if (!(gimple_uid (SSA_NAME_DEF_STMT (op)) & 2 /* INV */))
+		if (!loop_iv_derived_p (loop, op))
+		  {
+		    inv = false;
+		    iv = false;
+		    break;
+		  }
+		else
 		  inv = false;
-		if (gimple_uid (SSA_NAME_DEF_STMT (op)) & 1 /* IV */)
-		  iv = true;
 	      }
 	  gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0));
 	}
@@ -225,16 +264,32 @@  should_duplicate_loop_header_p (basic_block header, class loop *loop,
      the loop further.  Unless this is the original loop header.  */
   tree lhs = gimple_cond_lhs (last);
   tree rhs = gimple_cond_rhs (last);
+  bool lhs_invariant = loop_invariant_op_p (loop, lhs);
+  bool rhs_invariant = loop_invariant_op_p (loop, rhs);
+  if (lhs_invariant && rhs_invariant)
+    {
+      if (invariant_exits)
+	{
+	  edge e;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (e, ei, header->succs)
+	    if (loop_exit_edge_p (loop, e))
+	      {
+		if (dump_file && (dump_flags & TDF_DETAILS))
+		  fprintf (dump_file,
+			   "    Will elliminate invariant exit %i->%i\n",
+			   e->src->index, e->dest->index);
+		invariant_exits->add (e);
+	      }
+	}
+      return true;
+    }
+  /* TODO: This is heuristics that claims that IV based ocnditionals will
+     likely be optimized out in duplicated header.  We could use ranger
+     query instead to tell this more precisely.  */
   if (header != loop->header
-      && ((TREE_CODE (lhs) == SSA_NAME
-	   && !SSA_NAME_IS_DEFAULT_DEF (lhs)
-	   && flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (lhs)))
-	   && gimple_uid (SSA_NAME_DEF_STMT (lhs)) == 0)
-	  || (TREE_CODE (rhs) == SSA_NAME
-	      && !SSA_NAME_IS_DEFAULT_DEF (rhs)
-	      && flow_bb_inside_loop_p (loop,
-					gimple_bb (SSA_NAME_DEF_STMT (rhs)))
-	      && gimple_uid (SSA_NAME_DEF_STMT (rhs)) == 0)))
+      && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs))
+	  || (!rhs_invariant && !loop_iv_derived_p (loop, rhs))))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
@@ -452,7 +507,7 @@  ch_base::copy_headers (function *fun)
 	  continue;
 	}
 
-      if (should_duplicate_loop_header_p (header, loop, &remaining_limit))
+      if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL))
 	candidates.safe_push ({loop, static_exit});
     }
   /* Do not use ranger after we change the IL and not have updated SSA.  */
@@ -482,10 +537,12 @@  ch_base::copy_headers (function *fun)
       profile_count entry_count = profile_count::zero ();
       edge e;
       edge_iterator ei;
+      hash_set <edge> invariant_exits;
       FOR_EACH_EDGE (e, ei, loop->header->preds)
 	if (e->src != loop->latch)
 	  entry_count += e->count ();
-      while (should_duplicate_loop_header_p (header, loop, &remaining_limit))
+      while (should_duplicate_loop_header_p (header, loop, &remaining_limit,
+					     &invariant_exits))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
@@ -537,7 +594,8 @@  ch_base::copy_headers (function *fun)
 
       propagate_threaded_block_debug_into (exit->dest, entry->dest);
       if (!gimple_duplicate_sese_region (entry, exit, bbs, n_bbs, copied_bbs,
-					 true, candidate.static_exit))
+					 true, candidate.static_exit,
+					 &invariant_exits))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "Duplication failed.\n");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
index 16340868abf..bfb0f17284d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-ch-profile-1.c
@@ -9,4 +9,4 @@  void test(int v, int q)
 /* { dg-final { scan-tree-dump-not "Invalid sum" "ch2"} } */
 /* dom2 optimizes out the redundant test for loop invariant v/q
    which leads to inconsistent profile.  */
-/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized"  { xfail *-*-* }} } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */