diff mbox series

[2/4] ipa-cp: Propagation boost for recursion generated values

Message ID 3068c6c4ee451244031d8198d663de6e614f28f9.1629805719.git.mjambor@suse.cz
State New
Headers show
Series IPA-CP profile feedback handling fixes | expand

Commit Message

Martin Jambor Aug. 20, 2021, 5:43 p.m. UTC
Recursive call graph edges, even when they are hot and important for
the compiled program, can never have frequency bigger than one, even
when the actual time savings in the next recursion call are not
realized just once but depend on the depth of recursion.  The current
IPA-CP effect propagation code did not take that into account and just
used the frequency, thus severely underestimating the effect.

This patch artificially boosts values taking part in such calls.  If a
value feeds into itself through a recursive call, the frequency of the
edge is multiplied by a parameter with default value of 6, basically
assuming that the recursion will take place 6 times.  This value can
of course be subject to change.

Moreover, values which do not feed into themselves but which were
generated for a self-recursive call with an arithmetic
pass-function (aka the 548.exchange "hack" which however is generally
applicable for recursive functions which count the recursion depth in
a parameter) have the edge frequency multiplied as many times as there
are generated values in the chain.  In essence, we will assume they
are all useful.

This patch partially fixes the current situation when we fail to
optimize 548.exchange with PGO.  In the benchmark one recursive edge
count overwhelmingly dominates all other counts in the program and so
we fail to perform the first cloning (for the nonrecursive entry call)
because it looks totally insignificant.

gcc/ChangeLog:

2021-07-16  Martin Jambor  <mjambor@suse.cz>

	* params.opt (ipa-cp-recursive-freq-factor): New.
	* ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
	scc_no, self_recursion_generated_level, same_scc and
	self_recursion_generated_p.
	(ipcp_lattice::add_value): Replaced parameter unlimited with
	same_lat_gen_level, usit it determine limit of values and store it to
	the value.
	(ipcp_lattice<valtype>::print): Dump the new fileds.
	(allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
	parameter and store it to the new value.
	(self_recursively_generated_p): Removed.
	(propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
	instead of self_recursively_generated_p, store self generation level
	to such values.
	(value_topo_info<valtype>::add_val): Set scc_no.
	(value_topo_info<valtype>::propagate_effects): Multiply frequencies of
	recursively feeding values and self generated values by appropriate
	new factors.
---
 gcc/ipa-cp.c   | 161 ++++++++++++++++++++++++-------------------------
 gcc/params.opt |   4 ++
 2 files changed, 84 insertions(+), 81 deletions(-)

Comments

Jan Hubicka Oct. 6, 2021, 3:49 p.m. UTC | #1
> Recursive call graph edges, even when they are hot and important for
> the compiled program, can never have frequency bigger than one, even
> when the actual time savings in the next recursion call are not
> realized just once but depend on the depth of recursion.  The current
> IPA-CP effect propagation code did not take that into account and just
> used the frequency, thus severely underestimating the effect.
> 
> This patch artificially boosts values taking part in such calls.  If a
> value feeds into itself through a recursive call, the frequency of the
> edge is multiplied by a parameter with default value of 6, basically
> assuming that the recursion will take place 6 times.  This value can
> of course be subject to change.
> 
> Moreover, values which do not feed into themselves but which were
> generated for a self-recursive call with an arithmetic
> pass-function (aka the 548.exchange "hack" which however is generally
> applicable for recursive functions which count the recursion depth in
> a parameter) have the edge frequency multiplied as many times as there
> are generated values in the chain.  In essence, we will assume they
> are all useful.
> 
> This patch partially fixes the current situation when we fail to
> optimize 548.exchange with PGO.  In the benchmark one recursive edge
> count overwhelmingly dominates all other counts in the program and so
> we fail to perform the first cloning (for the nonrecursive entry call)
> because it looks totally insignificant.
> 
> gcc/ChangeLog:
> 
> 2021-07-16  Martin Jambor  <mjambor@suse.cz>
> 
> 	* params.opt (ipa-cp-recursive-freq-factor): New.
> 	* ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
> 	scc_no, self_recursion_generated_level, same_scc and
> 	self_recursion_generated_p.
> 	(ipcp_lattice::add_value): Replaced parameter unlimited with
> 	same_lat_gen_level, usit it determine limit of values and store it to
> 	the value.
> 	(ipcp_lattice<valtype>::print): Dump the new fileds.
> 	(allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
> 	parameter and store it to the new value.
> 	(self_recursively_generated_p): Removed.
> 	(propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
> 	instead of self_recursively_generated_p, store self generation level
> 	to such values.
> 	(value_topo_info<valtype>::add_val): Set scc_no.
> 	(value_topo_info<valtype>::propagate_effects): Multiply frequencies of
> 	recursively feeding values and self generated values by appropriate
> 	new factors.

If you boost every self fed value by factor of 6, I wonder how quickly
we run into exponential explosion of the cost (since the frequency
should be close to 1 and 6^9=10077696....

I think it would be more robust to simply assume that the job will
distribute evenly across the clones.  How hard is to implement that?

Honza
> ---
>  gcc/ipa-cp.c   | 161 ++++++++++++++++++++++++-------------------------
>  gcc/params.opt |   4 ++
>  2 files changed, 84 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 55b9216337f..b987d975793 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -184,30 +184,52 @@ public:
>    /* The actual value for the given parameter.  */
>    valtype value;
>    /* The list of sources from which this value originates.  */
> -  ipcp_value_source <valtype> *sources;
> +  ipcp_value_source <valtype> *sources = nullptr;
>    /* Next pointers in a linked list of all values in a lattice.  */
> -  ipcp_value *next;
> +  ipcp_value *next = nullptr;
>    /* Next pointers in a linked list of values in a strongly connected component
>       of values. */
> -  ipcp_value *scc_next;
> +  ipcp_value *scc_next = nullptr;
>    /* Next pointers in a linked list of SCCs of values sorted topologically
>       according their sources.  */
> -  ipcp_value  *topo_next;
> +  ipcp_value  *topo_next = nullptr;
>    /* A specialized node created for this value, NULL if none has been (so far)
>       created.  */
> -  cgraph_node *spec_node;
> +  cgraph_node *spec_node = nullptr;
>    /* Depth first search number and low link for topological sorting of
>       values.  */
> -  int dfs, low_link;
> +  int dfs = 0;
> +  int low_link = 0;
> +  /* SCC number to identify values which recursively feed into each other.
> +     Values in the same SCC have the same SCC number.  */
> +  int scc_no = 0;
> +  /* Non zero if the value is generated from another value in the same lattice
> +     for a self-recursive call, the actual number is how many times the
> +     operation has been performed.  In the unlikely event of the value being
> +     present in two chains fo self-recursive value generation chains, it is the
> +     maximum.  */
> +  unsigned self_recursion_generated_level = 0;
>    /* True if this value is currently on the topo-sort stack.  */
> -  bool on_stack;
> -
> -  ipcp_value()
> -    : sources (0), next (0), scc_next (0), topo_next (0),
> -      spec_node (0), dfs (0), low_link (0), on_stack (false) {}
> +  bool on_stack = false;
>  
>    void add_source (cgraph_edge *cs, ipcp_value *src_val, int src_idx,
>  		   HOST_WIDE_INT offset);
> +
> +  /* Return true if both THIS value and O feed into each other.  */
> +
> +  bool same_scc (const ipcp_value<valtype> *o)
> +  {
> +    return o->scc_no == scc_no;
> +  }
> +
> +/* Return true, if a this value has been generated for a self-recursive call as
> +   a result of an arithmetic pass-through jump-function acting on a value in
> +   the same lattice function.  */
> +
> +  bool self_recursion_generated_p ()
> +  {
> +    return self_recursion_generated_level > 0;
> +  }
>  };
>  
>  /* Lattice describing potential values of a formal parameter of a function, or
> @@ -239,7 +261,7 @@ public:
>  		  ipcp_value<valtype> *src_val = NULL,
>  		  int src_idx = 0, HOST_WIDE_INT offset = -1,
>  		  ipcp_value<valtype> **val_p = NULL,
> -		  bool unlimited = false);
> +		  unsigned same_lat_gen_level = 0);
>    void print (FILE * f, bool dump_sources, bool dump_benefits);
>  };
>  
> @@ -498,7 +520,11 @@ ipcp_lattice<valtype>::print (FILE * f, bool dump_sources, bool dump_benefits)
>  	{
>  	  ipcp_value_source<valtype> *s;
>  
> -	  fprintf (f, " [from:");
> +	  if (val->self_recursion_generated_p ())
> +	    fprintf (f, " [self_gen(%i), from:",
> +		     val->self_recursion_generated_level);
> +	  else
> +	    fprintf (f, " [scc: %i, from:", val->scc_no);
>  	  for (s = val->sources; s; s = s->next)
>  	    fprintf (f, " %i(%f)", s->cs->caller->order,
>  		     s->cs->sreal_frequency ().to_double ());
> @@ -1837,12 +1863,13 @@ ipcp_value<valtype>::add_source (cgraph_edge *cs, ipcp_value *src_val,
>     SOURCE and clear all other fields.  */
>  
>  static ipcp_value<tree> *
> -allocate_and_init_ipcp_value (tree source)
> +allocate_and_init_ipcp_value (tree cst, unsigned same_lat_gen_level)
>  {
>    ipcp_value<tree> *val;
>  
>    val = new (ipcp_cst_values_pool.allocate ()) ipcp_value<tree>();
> -  val->value = source;
> +  val->value = cst;
> +  val->self_recursion_generated_level = same_lat_gen_level;
>    return val;
>  }
>  
> @@ -1850,14 +1877,15 @@ allocate_and_init_ipcp_value (tree source)
>     value to SOURCE and clear all other fields.  */
>  
>  static ipcp_value<ipa_polymorphic_call_context> *
> -allocate_and_init_ipcp_value (ipa_polymorphic_call_context source)
> +allocate_and_init_ipcp_value (ipa_polymorphic_call_context ctx,
> +			      unsigned same_lat_gen_level)
>  {
>    ipcp_value<ipa_polymorphic_call_context> *val;
>  
> -  // TODO
>    val = new (ipcp_poly_ctx_values_pool.allocate ())
>      ipcp_value<ipa_polymorphic_call_context>();
> -  val->value = source;
> +  val->value = ctx;
> +  val->self_recursion_generated_level = same_lat_gen_level;
>    return val;
>  }
>  
> @@ -1865,8 +1893,12 @@ allocate_and_init_ipcp_value (ipa_polymorphic_call_context source)
>     SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
>     meaning.  OFFSET -1 means the source is scalar and not a part of an
>     aggregate.  If non-NULL, VAL_P records address of existing or newly added
> -   ipcp_value.  UNLIMITED means whether value count should not exceed the limit
> -   given by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
> +   ipcp_value.
> +
> +   If the value is generated for a self-recursive call as a result of an
> +   arithmetic pass-through jump-function acting on a value in the same lattice,
> +   SAME_LAT_GEN_LEVEL must be the length of such chain, otherwise it must be
> +   zero.  If it is non-zero, PARAM_IPA_CP_VALUE_LIST_SIZE limit is ignored.  */
>  
>  template <typename valtype>
>  bool
> @@ -1874,7 +1906,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
>  				  ipcp_value<valtype> *src_val,
>  				  int src_idx, HOST_WIDE_INT offset,
>  				  ipcp_value<valtype> **val_p,
> -				  bool unlimited)
> +				  unsigned same_lat_gen_level)
>  {
>    ipcp_value<valtype> *val, *last_val = NULL;
>  
> @@ -1890,6 +1922,9 @@ ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
>  	if (val_p)
>  	  *val_p = val;
>  
> +	if (val->self_recursion_generated_level < same_lat_gen_level)
> +	  val->self_recursion_generated_level = same_lat_gen_level;
> +
>  	if (ipa_edge_within_scc (cs))
>  	  {
>  	    ipcp_value_source<valtype> *s;
> @@ -1904,7 +1939,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
>  	return false;
>        }
>  
> -  if (!unlimited && values_count == opt_for_fn (cs->caller->decl,
> +  if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
>  						param_ipa_cp_value_list_size))
>      {
>        /* We can only free sources, not the values themselves, because sources
> @@ -1923,7 +1958,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
>      }
>  
>    values_count++;
> -  val = allocate_and_init_ipcp_value (newval);
> +  val = allocate_and_init_ipcp_value (newval, same_lat_gen_level);
>    val->add_source (cs, src_val, src_idx, offset);
>    val->next = NULL;
>  
> @@ -1940,60 +1975,6 @@ ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
>    return true;
>  }
>  
> -/* Return true, if a ipcp_value VAL is orginated from parameter value of
> -   self-feeding recursive function via some kind of pass-through jump
> -   function.  */
> -
> -static bool
> -self_recursively_generated_p (ipcp_value<tree> *val)
> -{
> -  class ipa_node_params *info = NULL;
> -
> -  for (ipcp_value_source<tree> *src = val->sources; src; src = src->next)
> -    {
> -      cgraph_edge *cs = src->cs;
> -
> -      if (!src->val || cs->caller != cs->callee->function_symbol ())
> -	return false;
> -
> -      if (src->val == val)
> -	continue;
> -
> -      if (!info)
> -	info = ipa_node_params_sum->get (cs->caller);
> -
> -      class ipcp_param_lattices *plats = ipa_get_parm_lattices (info,
> -								src->index);
> -      ipcp_lattice<tree> *src_lat;
> -      ipcp_value<tree> *src_val;
> -
> -      if (src->offset == -1)
> -	src_lat = &plats->itself;
> -      else
> -	{
> -	  struct ipcp_agg_lattice *src_aglat;
> -
> -	  for (src_aglat = plats->aggs; src_aglat; src_aglat = src_aglat->next)
> -	    if (src_aglat->offset == src->offset)
> -	      break;
> -
> -	  if (!src_aglat)
> -	    return false;
> -
> -	  src_lat = src_aglat;
> -	}
> -
> -      for (src_val = src_lat->values; src_val; src_val = src_val->next)
> -	if (src_val == val)
> -	  break;
> -
> -      if (!src_val)
> -	return false;
> -    }
> -
> -  return true;
> -}
> -
>  /* A helper function that returns result of operation specified by OPCODE on
>     the value of SRC_VAL.  If non-NULL, OPND1_TYPE is expected type for the
>     value of SRC_VAL.  If the operation is binary, OPND2 is a constant value
> @@ -2068,7 +2049,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
>  	     source, this is absolutely conservative, but could avoid explosion
>  	     of lattice's value space, especially when one recursive function
>  	     calls another recursive.  */
> -	  if (self_recursively_generated_p (src_val))
> +	  if (src_val->self_recursion_generated_p ())
>  	    {
>  	      ipcp_value_source<tree> *s;
>  
> @@ -2096,7 +2077,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
>  		break;
>  
>  	      ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
> -					  src_offset, &src_val, true);
> +					  src_offset, &src_val, j);
>  	      gcc_checking_assert (src_val);
>  	    }
>  	}
> @@ -2108,7 +2089,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,
>  	/* Now we do not use self-recursively generated value as propagation
>  	   source, otherwise it is easy to make value space of normal lattice
>  	   overflow.  */
> -	if (self_recursively_generated_p (src_val))
> +	if (src_val->self_recursion_generated_p ())
>  	  {
>  	    ret |= dest_lat->set_contains_variable ();
>  	    continue;
> @@ -3732,6 +3713,7 @@ value_topo_info<valtype>::add_val (ipcp_value<valtype> *cur_val)
>  	  v = stack;
>  	  stack = v->topo_next;
>  	  v->on_stack = false;
> +	  v->scc_no = cur_val->dfs;
>  
>  	  v->scc_next = scc_list;
>  	  scc_list = v;
> @@ -3905,8 +3887,25 @@ value_topo_info<valtype>::propagate_effects ()
>  		    else
>  		      continue;
>  		  }
> +
> +		int special_factor = 1;
> +		if (val->same_scc (src->val))
> +		  special_factor
> +		    = opt_for_fn(src->cs->caller->decl,
> +				 param_ipa_cp_recursive_freq_factor);
> +		else if (val->self_recursion_generated_p ()
> +			 && (src->cs->callee->function_symbol ()
> +			     == src->cs->caller))
> +		  {
> +		    int max_recur_gen_depth
> +		      = opt_for_fn(src->cs->caller->decl,
> +				   param_ipa_cp_max_recursive_depth);
> +		    special_factor = max_recur_gen_depth
> +		      - val->self_recursion_generated_level + 1;
> +		  }
> +
>  		src->val->prop_time_benefit
> -		  += time * src->cs->sreal_frequency ();
> +		  += time * special_factor * src->cs->sreal_frequency ();
>  	      }
>  
>  	  if (size < INT_MAX)
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 92b003e38cb..8d772309407 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -266,6 +266,10 @@ Maximum depth of recursive cloning for self-recursive function.
>  Common Joined UInteger Var(param_ipa_cp_min_recursive_probability) Init(2) Param Optimization
>  Recursive cloning only when the probability of call being executed exceeds the parameter.
>  
> +-param=ipa-cp-recursive-freq-factor=
> +Common Joined UInteger Var(param_ipa_cp_recursive_freq_factor) Init(6) Param Optimization
> +When propagating IPA-CP effect estimates, multiply frequencies of recursive edges that that bring back an unchanged value by this factor.
> +
>  -param=ipa-cp-recursion-penalty=
>  Common Joined UInteger Var(param_ipa_cp_recursion_penalty) Init(40) IntegerRange(0, 100) Param Optimization
>  Percentage penalty the recursive functions will receive when they are evaluated for cloning.
> -- 
> 2.32.0
>
Martin Jambor Oct. 7, 2021, 2:59 p.m. UTC | #2
Hi,

On Wed, Oct 06 2021, Jan Hubicka wrote:
>> Recursive call graph edges, even when they are hot and important for
>> the compiled program, can never have frequency bigger than one, even
>> when the actual time savings in the next recursion call are not
>> realized just once but depend on the depth of recursion.  The current
>> IPA-CP effect propagation code did not take that into account and just
>> used the frequency, thus severely underestimating the effect.
>> 
>> This patch artificially boosts values taking part in such calls.  If a
>> value feeds into itself through a recursive call, the frequency of the
>> edge is multiplied by a parameter with default value of 6, basically
>> assuming that the recursion will take place 6 times.  This value can
>> of course be subject to change.
>> 
>> Moreover, values which do not feed into themselves but which were
>> generated for a self-recursive call with an arithmetic
>> pass-function (aka the 548.exchange "hack" which however is generally
>> applicable for recursive functions which count the recursion depth in
>> a parameter) have the edge frequency multiplied as many times as there
>> are generated values in the chain.  In essence, we will assume they
>> are all useful.
>> 
>> This patch partially fixes the current situation when we fail to
>> optimize 548.exchange with PGO.  In the benchmark one recursive edge
>> count overwhelmingly dominates all other counts in the program and so
>> we fail to perform the first cloning (for the nonrecursive entry call)
>> because it looks totally insignificant.
>> 
>> gcc/ChangeLog:
>> 
>> 2021-07-16  Martin Jambor  <mjambor@suse.cz>
>> 
>> 	* params.opt (ipa-cp-recursive-freq-factor): New.
>> 	* ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
>> 	scc_no, self_recursion_generated_level, same_scc and
>> 	self_recursion_generated_p.
>> 	(ipcp_lattice::add_value): Replaced parameter unlimited with
>> 	same_lat_gen_level, usit it determine limit of values and store it to
>> 	the value.
>> 	(ipcp_lattice<valtype>::print): Dump the new fileds.
>> 	(allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
>> 	parameter and store it to the new value.
>> 	(self_recursively_generated_p): Removed.
>> 	(propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
>> 	instead of self_recursively_generated_p, store self generation level
>> 	to such values.
>> 	(value_topo_info<valtype>::add_val): Set scc_no.
>> 	(value_topo_info<valtype>::propagate_effects): Multiply frequencies of
>> 	recursively feeding values and self generated values by appropriate
>> 	new factors.
>
> If you boost every self fed value by factor of 6, I wonder how quickly
> we run into exponential explosion of the cost (since the frequency
> should be close to 1 and 6^9=10077696....

The factor of six is applied once for an entire SCC, so we'd reach this
huge number only if there was a chain of nine different recursive
functions - with this patch we assume each one will recurse six times,
so the result is indeed a huge execution count estimate.

The constant is not used for the "self generated" values like those in
exchange, those are handled by the else branch below.  For those we
expect the recursion happens 8 times, because that is how many values we
generate, but the boost is different depending on the recursion depth.

>
> I think it would be more robust to simply assume that the job will
>distribute evenly across the clones.  How hard is to implement that?

This is not an update of counters.  The code tries to estimate execution
time improvement that is will be possible in callees if we clone for
this particular value and so is based on call graph edge frequencies (so
that if in a callee we can save 5 units of time and the frequency is 5,
we estimate we will save 25).  The code has the advantage that it is
universal for both situations when profile feedback is and is not
available.

Martin
Jan Hubicka Oct. 7, 2021, 3:25 p.m. UTC | #3
> Hi,
> >
> > If you boost every self fed value by factor of 6, I wonder how quickly
> > we run into exponential explosion of the cost (since the frequency
> > should be close to 1 and 6^9=10077696....
> 
> The factor of six is applied once for an entire SCC, so we'd reach this
> huge number only if there was a chain of nine different recursive
> functions - with this patch we assume each one will recurse six times,
> so the result is indeed a huge execution count estimate.
> 
> The constant is not used for the "self generated" values like those in
> exchange, those are handled by the else branch below.  For those we
> expect the recursion happens 8 times, because that is how many values we
> generate, but the boost is different depending on the recursion depth.
> 
> >
> > I think it would be more robust to simply assume that the job will
> >distribute evenly across the clones.  How hard is to implement that?
> 
> This is not an update of counters.  The code tries to estimate execution
> time improvement that is will be possible in callees if we clone for
> this particular value and so is based on call graph edge frequencies (so
> that if in a callee we can save 5 units of time and the frequency is 5,
> we estimate we will save 25).  The code has the advantage that it is
> universal for both situations when profile feedback is and is not
> available.

I guess the patch is OK then.

Thanks,
Honza
> 
> Martin
>
diff mbox series

Patch

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 55b9216337f..b987d975793 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -184,30 +184,52 @@  public:
   /* The actual value for the given parameter.  */
   valtype value;
   /* The list of sources from which this value originates.  */
-  ipcp_value_source <valtype> *sources;
+  ipcp_value_source <valtype> *sources = nullptr;
   /* Next pointers in a linked list of all values in a lattice.  */
-  ipcp_value *next;
+  ipcp_value *next = nullptr;
   /* Next pointers in a linked list of values in a strongly connected component
      of values. */
-  ipcp_value *scc_next;
+  ipcp_value *scc_next = nullptr;
   /* Next pointers in a linked list of SCCs of values sorted topologically
      according their sources.  */
-  ipcp_value  *topo_next;
+  ipcp_value  *topo_next = nullptr;
   /* A specialized node created for this value, NULL if none has been (so far)
      created.  */
-  cgraph_node *spec_node;
+  cgraph_node *spec_node = nullptr;
   /* Depth first search number and low link for topological sorting of
      values.  */
-  int dfs, low_link;
+  int dfs = 0;
+  int low_link = 0;
+  /* SCC number to identify values which recursively feed into each other.
+     Values in the same SCC have the same SCC number.  */
+  int scc_no = 0;
+  /* Non zero if the value is generated from another value in the same lattice
+     for a self-recursive call, the actual number is how many times the
+     operation has been performed.  In the unlikely event of the value being
+     present in two chains fo self-recursive value generation chains, it is the
+     maximum.  */
+  unsigned self_recursion_generated_level = 0;
   /* True if this value is currently on the topo-sort stack.  */
-  bool on_stack;
-
-  ipcp_value()
-    : sources (0), next (0), scc_next (0), topo_next (0),
-      spec_node (0), dfs (0), low_link (0), on_stack (false) {}
+  bool on_stack = false;
 
   void add_source (cgraph_edge *cs, ipcp_value *src_val, int src_idx,
 		   HOST_WIDE_INT offset);
+
+  /* Return true if both THIS value and O feed into each other.  */
+
+  bool same_scc (const ipcp_value<valtype> *o)
+  {
+    return o->scc_no == scc_no;
+  }
+
+/* Return true, if a this value has been generated for a self-recursive call as
+   a result of an arithmetic pass-through jump-function acting on a value in
+   the same lattice function.  */
+
+  bool self_recursion_generated_p ()
+  {
+    return self_recursion_generated_level > 0;
+  }
 };
 
 /* Lattice describing potential values of a formal parameter of a function, or
@@ -239,7 +261,7 @@  public:
 		  ipcp_value<valtype> *src_val = NULL,
 		  int src_idx = 0, HOST_WIDE_INT offset = -1,
 		  ipcp_value<valtype> **val_p = NULL,
-		  bool unlimited = false);
+		  unsigned same_lat_gen_level = 0);
   void print (FILE * f, bool dump_sources, bool dump_benefits);
 };
 
@@ -498,7 +520,11 @@  ipcp_lattice<valtype>::print (FILE * f, bool dump_sources, bool dump_benefits)
 	{
 	  ipcp_value_source<valtype> *s;
 
-	  fprintf (f, " [from:");
+	  if (val->self_recursion_generated_p ())
+	    fprintf (f, " [self_gen(%i), from:",
+		     val->self_recursion_generated_level);
+	  else
+	    fprintf (f, " [scc: %i, from:", val->scc_no);
 	  for (s = val->sources; s; s = s->next)
 	    fprintf (f, " %i(%f)", s->cs->caller->order,
 		     s->cs->sreal_frequency ().to_double ());
@@ -1837,12 +1863,13 @@  ipcp_value<valtype>::add_source (cgraph_edge *cs, ipcp_value *src_val,
    SOURCE and clear all other fields.  */
 
 static ipcp_value<tree> *
-allocate_and_init_ipcp_value (tree source)
+allocate_and_init_ipcp_value (tree cst, unsigned same_lat_gen_level)
 {
   ipcp_value<tree> *val;
 
   val = new (ipcp_cst_values_pool.allocate ()) ipcp_value<tree>();
-  val->value = source;
+  val->value = cst;
+  val->self_recursion_generated_level = same_lat_gen_level;
   return val;
 }
 
@@ -1850,14 +1877,15 @@  allocate_and_init_ipcp_value (tree source)
    value to SOURCE and clear all other fields.  */
 
 static ipcp_value<ipa_polymorphic_call_context> *
-allocate_and_init_ipcp_value (ipa_polymorphic_call_context source)
+allocate_and_init_ipcp_value (ipa_polymorphic_call_context ctx,
+			      unsigned same_lat_gen_level)
 {
   ipcp_value<ipa_polymorphic_call_context> *val;
 
-  // TODO
   val = new (ipcp_poly_ctx_values_pool.allocate ())
     ipcp_value<ipa_polymorphic_call_context>();
-  val->value = source;
+  val->value = ctx;
+  val->self_recursion_generated_level = same_lat_gen_level;
   return val;
 }
 
@@ -1865,8 +1893,12 @@  allocate_and_init_ipcp_value (ipa_polymorphic_call_context source)
    SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same
    meaning.  OFFSET -1 means the source is scalar and not a part of an
    aggregate.  If non-NULL, VAL_P records address of existing or newly added
-   ipcp_value.  UNLIMITED means whether value count should not exceed the limit
-   given by PARAM_IPA_CP_VALUE_LIST_SIZE.  */
+   ipcp_value.
+
+   If the value is generated for a self-recursive call as a result of an
+   arithmetic pass-through jump-function acting on a value in the same lattice,
+   SAME_LAT_GEN_LEVEL must be the length of such chain, otherwise it must be
+   zero.  If it is non-zero, PARAM_IPA_CP_VALUE_LIST_SIZE limit is ignored.  */
 
 template <typename valtype>
 bool
@@ -1874,7 +1906,7 @@  ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
 				  ipcp_value<valtype> *src_val,
 				  int src_idx, HOST_WIDE_INT offset,
 				  ipcp_value<valtype> **val_p,
-				  bool unlimited)
+				  unsigned same_lat_gen_level)
 {
   ipcp_value<valtype> *val, *last_val = NULL;
 
@@ -1890,6 +1922,9 @@  ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
 	if (val_p)
 	  *val_p = val;
 
+	if (val->self_recursion_generated_level < same_lat_gen_level)
+	  val->self_recursion_generated_level = same_lat_gen_level;
+
 	if (ipa_edge_within_scc (cs))
 	  {
 	    ipcp_value_source<valtype> *s;
@@ -1904,7 +1939,7 @@  ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
 	return false;
       }
 
-  if (!unlimited && values_count == opt_for_fn (cs->caller->decl,
+  if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl,
 						param_ipa_cp_value_list_size))
     {
       /* We can only free sources, not the values themselves, because sources
@@ -1923,7 +1958,7 @@  ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
     }
 
   values_count++;
-  val = allocate_and_init_ipcp_value (newval);
+  val = allocate_and_init_ipcp_value (newval, same_lat_gen_level);
   val->add_source (cs, src_val, src_idx, offset);
   val->next = NULL;
 
@@ -1940,60 +1975,6 @@  ipcp_lattice<valtype>::add_value (valtype newval, cgraph_edge *cs,
   return true;
 }
 
-/* Return true, if a ipcp_value VAL is orginated from parameter value of
-   self-feeding recursive function via some kind of pass-through jump
-   function.  */
-
-static bool
-self_recursively_generated_p (ipcp_value<tree> *val)
-{
-  class ipa_node_params *info = NULL;
-
-  for (ipcp_value_source<tree> *src = val->sources; src; src = src->next)
-    {
-      cgraph_edge *cs = src->cs;
-
-      if (!src->val || cs->caller != cs->callee->function_symbol ())
-	return false;
-
-      if (src->val == val)
-	continue;
-
-      if (!info)
-	info = ipa_node_params_sum->get (cs->caller);
-
-      class ipcp_param_lattices *plats = ipa_get_parm_lattices (info,
-								src->index);
-      ipcp_lattice<tree> *src_lat;
-      ipcp_value<tree> *src_val;
-
-      if (src->offset == -1)
-	src_lat = &plats->itself;
-      else
-	{
-	  struct ipcp_agg_lattice *src_aglat;
-
-	  for (src_aglat = plats->aggs; src_aglat; src_aglat = src_aglat->next)
-	    if (src_aglat->offset == src->offset)
-	      break;
-
-	  if (!src_aglat)
-	    return false;
-
-	  src_lat = src_aglat;
-	}
-
-      for (src_val = src_lat->values; src_val; src_val = src_val->next)
-	if (src_val == val)
-	  break;
-
-      if (!src_val)
-	return false;
-    }
-
-  return true;
-}
-
 /* A helper function that returns result of operation specified by OPCODE on
    the value of SRC_VAL.  If non-NULL, OPND1_TYPE is expected type for the
    value of SRC_VAL.  If the operation is binary, OPND2 is a constant value
@@ -2068,7 +2049,7 @@  propagate_vals_across_arith_jfunc (cgraph_edge *cs,
 	     source, this is absolutely conservative, but could avoid explosion
 	     of lattice's value space, especially when one recursive function
 	     calls another recursive.  */
-	  if (self_recursively_generated_p (src_val))
+	  if (src_val->self_recursion_generated_p ())
 	    {
 	      ipcp_value_source<tree> *s;
 
@@ -2096,7 +2077,7 @@  propagate_vals_across_arith_jfunc (cgraph_edge *cs,
 		break;
 
 	      ret |= dest_lat->add_value (cstval, cs, src_val, src_idx,
-					  src_offset, &src_val, true);
+					  src_offset, &src_val, j);
 	      gcc_checking_assert (src_val);
 	    }
 	}
@@ -2108,7 +2089,7 @@  propagate_vals_across_arith_jfunc (cgraph_edge *cs,
 	/* Now we do not use self-recursively generated value as propagation
 	   source, otherwise it is easy to make value space of normal lattice
 	   overflow.  */
-	if (self_recursively_generated_p (src_val))
+	if (src_val->self_recursion_generated_p ())
 	  {
 	    ret |= dest_lat->set_contains_variable ();
 	    continue;
@@ -3732,6 +3713,7 @@  value_topo_info<valtype>::add_val (ipcp_value<valtype> *cur_val)
 	  v = stack;
 	  stack = v->topo_next;
 	  v->on_stack = false;
+	  v->scc_no = cur_val->dfs;
 
 	  v->scc_next = scc_list;
 	  scc_list = v;
@@ -3905,8 +3887,25 @@  value_topo_info<valtype>::propagate_effects ()
 		    else
 		      continue;
 		  }
+
+		int special_factor = 1;
+		if (val->same_scc (src->val))
+		  special_factor
+		    = opt_for_fn(src->cs->caller->decl,
+				 param_ipa_cp_recursive_freq_factor);
+		else if (val->self_recursion_generated_p ()
+			 && (src->cs->callee->function_symbol ()
+			     == src->cs->caller))
+		  {
+		    int max_recur_gen_depth
+		      = opt_for_fn(src->cs->caller->decl,
+				   param_ipa_cp_max_recursive_depth);
+		    special_factor = max_recur_gen_depth
+		      - val->self_recursion_generated_level + 1;
+		  }
+
 		src->val->prop_time_benefit
-		  += time * src->cs->sreal_frequency ();
+		  += time * special_factor * src->cs->sreal_frequency ();
 	      }
 
 	  if (size < INT_MAX)
diff --git a/gcc/params.opt b/gcc/params.opt
index 92b003e38cb..8d772309407 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -266,6 +266,10 @@  Maximum depth of recursive cloning for self-recursive function.
 Common Joined UInteger Var(param_ipa_cp_min_recursive_probability) Init(2) Param Optimization
 Recursive cloning only when the probability of call being executed exceeds the parameter.
 
+-param=ipa-cp-recursive-freq-factor=
+Common Joined UInteger Var(param_ipa_cp_recursive_freq_factor) Init(6) Param Optimization
+When propagating IPA-CP effect estimates, multiply frequencies of recursive edges that that bring back an unchanged value by this factor.
+
 -param=ipa-cp-recursion-penalty=
 Common Joined UInteger Var(param_ipa_cp_recursion_penalty) Init(40) IntegerRange(0, 100) Param Optimization
 Percentage penalty the recursive functions will receive when they are evaluated for cloning.