diff mbox

ipa-cp heuristic tweek

Message ID 20150329154320.GA67588@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka March 29, 2015, 3:43 p.m. UTC
Hi,
this patch improve crafty performance by avoiding ipa-cp clonning of
Search function that specializes the first iteration of the recursion.
The patch is by Martin, I only tested it and cleaned up code in count_callers
and set_single_call_flag

Bootstrapped/regtested x86_64-linux, comitted.
	PR ipa/65478
	* params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New.
	(PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
	* ipa-prop.h (ipa_node_params): New flags node_within_scc and
	node_calling_single_call.
	* ipa-cp.c (count_callers): New function.
	(set_single_call_flag): Likewise.
	(initialize_node_lattices): Count callers and set single_flag_call if
	necessary.
	(incorporate_penalties): New function.
	(good_cloning_opportunity_p): Use it, dump new flags.
	(propagate_constants_topo): Set node_within_scc flag if appropriate.
	* doc/invoke.texi (ipa-cp-recursion-penalty,
	ipa-cp-single-call-pentalty): Document.

Comments

Martin Jambor March 30, 2015, 12:11 p.m. UTC | #1
Hi,

On Sun, Mar 29, 2015 at 05:43:20PM +0200, Jan Hubicka wrote:
> Hi,

thanks for committing the patch, I'm just wondering why...

> this patch improve crafty performance by avoiding ipa-cp clonning of
> Search function that specializes the first iteration of the recursion.
> The patch is by Martin, I only tested it and cleaned up code in count_callers
> and set_single_call_flag
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 	PR ipa/65478
> 	* params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New.
> 	(PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
> 	* ipa-prop.h (ipa_node_params): New flags node_within_scc and
> 	node_calling_single_call.
> 	* ipa-cp.c (count_callers): New function.
> 	(set_single_call_flag): Likewise.
> 	(initialize_node_lattices): Count callers and set single_flag_call if
> 	necessary.
> 	(incorporate_penalties): New function.
> 	(good_cloning_opportunity_p): Use it, dump new flags.
> 	(propagate_constants_topo): Set node_within_scc flag if appropriate.
> 	* doc/invoke.texi (ipa-cp-recursion-penalty,
> 	ipa-cp-single-call-pentalty): Document.
> Index: params.def
> ===================================================================
> --- params.def	(revision 221757)
> +++ params.def	(working copy)
> @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD,
>  	  "beneficial to clone.",
>  	  500, 0, 0)
>  
> +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY,
> +	  "ipa-cp-recursion-penalty",
> +	  "Percentage penalty the recursive functions will receive when they "
> +	  "are evaluated for cloning.",
> +	  40, 0, 100)
> +
> +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY,
> +	  "ipa-cp-single-call-penalty",
> +	  "Percentage penalty functions containg a single call to another "
> +	  "function will receive when they are evaluated for cloning.",
> +	  15, 0, 100)
> +
>  DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS,
>  	  "ipa-max-agg-items",
>  	  "Maximum number of aggregate content items for a parameter in "
> Index: ipa-prop.h
> ===================================================================
> --- ipa-prop.h	(revision 221757)
> +++ ipa-prop.h	(working copy)
> @@ -330,6 +330,10 @@ struct ipa_node_params
>    /* Node has been completely replaced by clones and will be removed after
>       ipa-cp is finished.  */
>    unsigned node_dead : 1;
> +  /* Node is involved in a recursion, potentionally indirect.  */
> +  unsigned node_within_scc : 1;
> +  /* Node is calling a private function called only once.  */
> +  unsigned node_calling_single_call : 1;
>  };
>  
>  /* ipa_node_params access functions.  Please use these to access fields that
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c	(revision 221757)
> +++ ipa-cp.c	(working copy)
> @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p
>    return ret;
>  }
>  
> +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA
> +   points to by the number of callers to NODE.  */
> +
> +static bool
> +count_callers (cgraph_node *node, void *data)
> +{
> +  int *caller_count = (int *) data;
> +
> +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> +    /* Local thunks can be handled transparently, but if the thunk can not
> +       be optimized out, count it as a real use.  */
> +    if (!cs->caller->thunk.thunk_p || !cs->caller->local.local)
> +      ++*caller_count;

...we don't recurse for local thunks and do not count their number of
callers here.  I'm not sure if it can ever happen in practice but if
we have a local function that is called multiple times but always
through one particular local thunk, we should end with caller_count
greater than one, and with this code we will not?  Such function
certainly should not be considered called just once by the inliner.

Similarly, if we ever have a local node with a non-local thunk (would
that be legal and make sense?), then we should try not ending up with
caller_count == 1, because inliner will also never consider that
function just called once.  So I'd suggest incrementing the caller by
two instead in that case.

But maybe I just don't understand what "handled transparently" in the
comment means.

Thanks for clearing this up,

Martin
Jan Hubicka March 30, 2015, 5:05 p.m. UTC | #2
> > Index: ipa-cp.c
> > ===================================================================
> > --- ipa-cp.c	(revision 221757)
> > +++ ipa-cp.c	(working copy)
> > @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p
> >    return ret;
> >  }
> >  
> > +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA
> > +   points to by the number of callers to NODE.  */
> > +
> > +static bool
> > +count_callers (cgraph_node *node, void *data)
> > +{
> > +  int *caller_count = (int *) data;
> > +
> > +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> > +    /* Local thunks can be handled transparently, but if the thunk can not
> > +       be optimized out, count it as a real use.  */
> > +    if (!cs->caller->thunk.thunk_p || !cs->caller->local.local)
> > +      ++*caller_count;
> 
> ...we don't recurse for local thunks and do not count their number of
> callers here.  I'm not sure if it can ever happen in practice but if
> we have a local function that is called multiple times but always
> through one particular local thunk, we should end with caller_count
> greater than one, and with this code we will not?  Such function
> certainly should not be considered called just once by the inliner.

Inliner won't inline to thunks at all currently, but you call the function
by for_symbol_thunks_and_aliases, so the recursion to thunks already happens
by this wrapper.
> 
> Similarly, if we ever have a local node with a non-local thunk (would
> that be legal and make sense?), then we should try not ending up with

We currently keep local falg in sync in between thunks and functions (otherwise
i386 calling convetions will break). But indeed we may add check here to avoid
dependency on this detail.

Honza
> caller_count == 1, because inliner will also never consider that
> function just called once.  So I'd suggest incrementing the caller by
> two instead in that case.
> 
> But maybe I just don't understand what "handled transparently" in the
> comment means.
> 
> Thanks for clearing this up,
> 
> Martin
Ilya Enkovich March 31, 2015, 12:32 p.m. UTC | #3
2015-03-29 18:43 GMT+03:00 Jan Hubicka <hubicka@ucw.cz>:
> Hi,
> this patch improve crafty performance by avoiding ipa-cp clonning of
> Search function that specializes the first iteration of the recursion.
> The patch is by Martin, I only tested it and cleaned up code in count_callers
> and set_single_call_flag
>
> Bootstrapped/regtested x86_64-linux, comitted.
>         PR ipa/65478
>         * params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New.
>         (PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise.
>         * ipa-prop.h (ipa_node_params): New flags node_within_scc and
>         node_calling_single_call.
>         * ipa-cp.c (count_callers): New function.
>         (set_single_call_flag): Likewise.
>         (initialize_node_lattices): Count callers and set single_flag_call if
>         necessary.
>         (incorporate_penalties): New function.
>         (good_cloning_opportunity_p): Use it, dump new flags.
>         (propagate_constants_topo): Set node_within_scc flag if appropriate.
>         * doc/invoke.texi (ipa-cp-recursion-penalty,
>         ipa-cp-single-call-pentalty): Document.
> Index: params.def
> ===================================================================
> --- params.def  (revision 221757)
> +++ params.def  (working copy)
> @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD,
>           "beneficial to clone.",
>           500, 0, 0)
>
> +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY,
> +         "ipa-cp-recursion-penalty",
> +         "Percentage penalty the recursive functions will receive when they "
> +         "are evaluated for cloning.",
> +         40, 0, 100)
> +
> +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY,
> +         "ipa-cp-single-call-penalty",
> +         "Percentage penalty functions containg a single call to another "
> +         "function will receive when they are evaluated for cloning.",
> +         15, 0, 100)
> +
>  DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS,
>           "ipa-max-agg-items",
>           "Maximum number of aggregate content items for a parameter in "
> Index: ipa-prop.h
> ===================================================================
> --- ipa-prop.h  (revision 221757)
> +++ ipa-prop.h  (working copy)
> @@ -330,6 +330,10 @@ struct ipa_node_params
>    /* Node has been completely replaced by clones and will be removed after
>       ipa-cp is finished.  */
>    unsigned node_dead : 1;
> +  /* Node is involved in a recursion, potentionally indirect.  */
> +  unsigned node_within_scc : 1;
> +  /* Node is calling a private function called only once.  */
> +  unsigned node_calling_single_call : 1;
>  };
>
>  /* ipa_node_params access functions.  Please use these to access fields that
> Index: ipa-cp.c
> ===================================================================
> --- ipa-cp.c    (revision 221757)
> +++ ipa-cp.c    (working copy)
> @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p
>    return ret;
>  }
>
> +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA
> +   points to by the number of callers to NODE.  */
> +
> +static bool
> +count_callers (cgraph_node *node, void *data)
> +{
> +  int *caller_count = (int *) data;
> +
> +  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
> +    /* Local thunks can be handled transparently, but if the thunk can not
> +       be optimized out, count it as a real use.  */
> +    if (!cs->caller->thunk.thunk_p || !cs->caller->local.local)
> +      ++*caller_count;
> +  return false;
> +}
> +
> +/* Worker of call_for_symbol_thunks_and_aliases, it is supposed to be called on
> +   the one caller of some other node.  Set the caller's corresponding flag.  */
> +
> +static bool
> +set_single_call_flag (cgraph_node *node, void *)
> +{
> +  cgraph_edge *cs = node->callers;
> +  /* Local thunks can be handled transparently, skip them.  */
> +  while (cs && cs->caller->thunk.thunk_p && cs->caller->local.local)
> +    cs = cs->next_caller;
> +  if (cs)
> +    {
> +      gcc_assert (!cs->next_caller);

This assert assumes the only non-thunk caller is always at the end of
a callers list. Is it actually guaranteed?

> +      IPA_NODE_REF (cs->caller)->node_calling_single_call = true;
> +      return true;
> +    }
> +  return false;
> +}
> +
>  /* Initialize ipcp_lattices.  */


Thanks,
Ilya
diff mbox

Patch

Index: params.def
===================================================================
--- params.def	(revision 221757)
+++ params.def	(working copy)
@@ -999,6 +999,18 @@  DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD,
 	  "beneficial to clone.",
 	  500, 0, 0)
 
+DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY,
+	  "ipa-cp-recursion-penalty",
+	  "Percentage penalty the recursive functions will receive when they "
+	  "are evaluated for cloning.",
+	  40, 0, 100)
+
+DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY,
+	  "ipa-cp-single-call-penalty",
+	  "Percentage penalty functions containg a single call to another "
+	  "function will receive when they are evaluated for cloning.",
+	  15, 0, 100)
+
 DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS,
 	  "ipa-max-agg-items",
 	  "Maximum number of aggregate content items for a parameter in "
Index: ipa-prop.h
===================================================================
--- ipa-prop.h	(revision 221757)
+++ ipa-prop.h	(working copy)
@@ -330,6 +330,10 @@  struct ipa_node_params
   /* Node has been completely replaced by clones and will be removed after
      ipa-cp is finished.  */
   unsigned node_dead : 1;
+  /* Node is involved in a recursion, potentionally indirect.  */
+  unsigned node_within_scc : 1;
+  /* Node is calling a private function called only once.  */
+  unsigned node_calling_single_call : 1;
 };
 
 /* ipa_node_params access functions.  Please use these to access fields that
Index: ipa-cp.c
===================================================================
--- ipa-cp.c	(revision 221757)
+++ ipa-cp.c	(working copy)
@@ -811,6 +811,41 @@  set_all_contains_variable (struct ipcp_p
   return ret;
 }
 
+/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA
+   points to by the number of callers to NODE.  */
+
+static bool
+count_callers (cgraph_node *node, void *data)
+{
+  int *caller_count = (int *) data;
+
+  for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller)
+    /* Local thunks can be handled transparently, but if the thunk can not
+       be optimized out, count it as a real use.  */
+    if (!cs->caller->thunk.thunk_p || !cs->caller->local.local)
+      ++*caller_count;
+  return false;
+}
+
+/* Worker of call_for_symbol_thunks_and_aliases, it is supposed to be called on
+   the one caller of some other node.  Set the caller's corresponding flag.  */
+
+static bool
+set_single_call_flag (cgraph_node *node, void *)
+{
+  cgraph_edge *cs = node->callers;
+  /* Local thunks can be handled transparently, skip them.  */
+  while (cs && cs->caller->thunk.thunk_p && cs->caller->local.local)
+    cs = cs->next_caller;
+  if (cs)
+    {
+      gcc_assert (!cs->next_caller);
+      IPA_NODE_REF (cs->caller)->node_calling_single_call = true;
+      return true;
+    }
+  return false;
+}
+
 /* Initialize ipcp_lattices.  */
 
 static void
@@ -822,7 +857,17 @@  initialize_node_lattices (struct cgraph_
   int i;
 
   gcc_checking_assert (node->has_gimple_body_p ());
-  if (!cgraph_local_p (node))
+  if (cgraph_local_p (node))
+    {
+      int caller_count = 0;
+      node->call_for_symbol_thunks_and_aliases (count_callers, &caller_count,
+						true);
+      gcc_checking_assert (caller_count > 0);
+      if (caller_count == 1)
+	node->call_for_symbol_thunks_and_aliases (set_single_call_flag,
+						  NULL, true);
+    }
+  else
     {
       /* When cloning is allowed, we can assume that externally visible
 	 functions are not called.  We will compensate this by cloning
@@ -2105,6 +2150,24 @@  hint_time_bonus (inline_hints hints)
   return result;
 }
 
+/* If there is a reason to penalize the function described by INFO in the
+   cloning goodness evaluation, do so.  */
+
+static inline int64_t
+incorporate_penalties (ipa_node_params *info, int64_t evaluation)
+{
+  if (info->node_within_scc)
+    evaluation = (evaluation
+		  * (100 - PARAM_VALUE (PARAM_IPA_CP_RECURSION_PENALTY))) / 100;
+
+  if (info->node_calling_single_call)
+    evaluation = (evaluation
+		  * (100 - PARAM_VALUE (PARAM_IPA_CP_SINGLE_CALL_PENALTY)))
+      / 100;
+
+  return evaluation;
+}
+
 /* Return true if cloning NODE is a good idea, given the estimated TIME_BENEFIT
    and SIZE_COST and with the sum of frequencies of incoming edges to the
    potential new clone in FREQUENCIES.  */
@@ -2120,18 +2183,22 @@  good_cloning_opportunity_p (struct cgrap
 
   gcc_assert (size_cost > 0);
 
+  struct ipa_node_params *info = IPA_NODE_REF (node);
   if (max_count)
     {
       int factor = (count_sum * 1000) / max_count;
       int64_t evaluation = (((int64_t) time_benefit * factor)
 				    / size_cost);
+      evaluation = incorporate_penalties (info, evaluation);
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "     good_cloning_opportunity_p (time: %i, "
 		 "size: %i, count_sum: " HOST_WIDE_INT_PRINT_DEC
-		 ") -> evaluation: " "%"PRId64
+		 "%s%s) -> evaluation: " "%"PRId64
 		 ", threshold: %i\n",
 		 time_benefit, size_cost, (HOST_WIDE_INT) count_sum,
+		 info->node_within_scc ? ", scc" : "",
+		 info->node_calling_single_call ? ", single_call" : "",
 		 evaluation, PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD));
 
       return evaluation >= PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD);
@@ -2140,13 +2207,16 @@  good_cloning_opportunity_p (struct cgrap
     {
       int64_t evaluation = (((int64_t) time_benefit * freq_sum)
 				    / size_cost);
+      evaluation = incorporate_penalties (info, evaluation);
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "     good_cloning_opportunity_p (time: %i, "
-		 "size: %i, freq_sum: %i) -> evaluation: "
+		 "size: %i, freq_sum: %i%s%s) -> evaluation: "
 		 "%"PRId64 ", threshold: %i\n",
-		 time_benefit, size_cost, freq_sum, evaluation,
-		 PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD));
+		 time_benefit, size_cost, freq_sum,
+		 info->node_within_scc ? ", scc" : "",
+		 info->node_calling_single_call ? ", single_call" : "",
+		 evaluation, PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD));
 
       return evaluation >= PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD);
     }
@@ -2637,9 +2707,12 @@  propagate_constants_topo (struct ipa_top
 	  struct cgraph_edge *cs;
 
 	  for (cs = v->callees; cs; cs = cs->next_callee)
-	    if (ipa_edge_within_scc (cs)
-		&& propagate_constants_accross_call (cs))
-	      push_node_to_stack (topo, cs->callee->function_symbol ());
+	    if (ipa_edge_within_scc (cs))
+	      {
+		IPA_NODE_REF (v)->node_within_scc = true;
+		if (propagate_constants_accross_call (cs))
+		  push_node_to_stack (topo, cs->callee->function_symbol ());
+	      }
 	  v = pop_node_from_stack (topo);
 	}
 
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 221757)
+++ doc/invoke.texi	(working copy)
@@ -10834,6 +10834,15 @@  IPA-CP calculates its own score of cloni
 and performs those cloning opportunities with scores that exceed
 @option{ipa-cp-eval-threshold}.
 
+@item ipa-cp-recursion-penalty
+Percentage penalty the recursive functions will receive when they
+are evaluated for cloning.
+
+@item ipa-cp-single-call-penalty
+Percentage penalty functions containg a single call to another
+function will receive when they are evaluated for cloning.
+
+
 @item ipa-max-agg-items
 IPA-CP is also capable to propagate a number of scalar values passed
 in an aggregate. @option{ipa-max-agg-items} controls the maximum