Message ID | 20150329154320.GA67588@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Hi, On Sun, Mar 29, 2015 at 05:43:20PM +0200, Jan Hubicka wrote: > Hi, thanks for committing the patch, I'm just wondering why... > this patch improve crafty performance by avoiding ipa-cp clonning of > Search function that specializes the first iteration of the recursion. > The patch is by Martin, I only tested it and cleaned up code in count_callers > and set_single_call_flag > > Bootstrapped/regtested x86_64-linux, comitted. > PR ipa/65478 > * params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New. > (PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise. > * ipa-prop.h (ipa_node_params): New flags node_within_scc and > node_calling_single_call. > * ipa-cp.c (count_callers): New function. > (set_single_call_flag): Likewise. > (initialize_node_lattices): Count callers and set single_flag_call if > necessary. > (incorporate_penalties): New function. > (good_cloning_opportunity_p): Use it, dump new flags. > (propagate_constants_topo): Set node_within_scc flag if appropriate. > * doc/invoke.texi (ipa-cp-recursion-penalty, > ipa-cp-single-call-pentalty): Document. > Index: params.def > =================================================================== > --- params.def (revision 221757) > +++ params.def (working copy) > @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD, > "beneficial to clone.", > 500, 0, 0) > > +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY, > + "ipa-cp-recursion-penalty", > + "Percentage penalty the recursive functions will receive when they " > + "are evaluated for cloning.", > + 40, 0, 100) > + > +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY, > + "ipa-cp-single-call-penalty", > + "Percentage penalty functions containg a single call to another " > + "function will receive when they are evaluated for cloning.", > + 15, 0, 100) > + > DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS, > "ipa-max-agg-items", > "Maximum number of aggregate content items for a parameter in " > Index: ipa-prop.h > =================================================================== > --- ipa-prop.h (revision 221757) > +++ ipa-prop.h (working copy) > @@ -330,6 +330,10 @@ struct ipa_node_params > /* Node has been completely replaced by clones and will be removed after > ipa-cp is finished. */ > unsigned node_dead : 1; > + /* Node is involved in a recursion, potentionally indirect. */ > + unsigned node_within_scc : 1; > + /* Node is calling a private function called only once. */ > + unsigned node_calling_single_call : 1; > }; > > /* ipa_node_params access functions. Please use these to access fields that > Index: ipa-cp.c > =================================================================== > --- ipa-cp.c (revision 221757) > +++ ipa-cp.c (working copy) > @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p > return ret; > } > > +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA > + points to by the number of callers to NODE. */ > + > +static bool > +count_callers (cgraph_node *node, void *data) > +{ > + int *caller_count = (int *) data; > + > + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) > + /* Local thunks can be handled transparently, but if the thunk can not > + be optimized out, count it as a real use. */ > + if (!cs->caller->thunk.thunk_p || !cs->caller->local.local) > + ++*caller_count; ...we don't recurse for local thunks and do not count their number of callers here. I'm not sure if it can ever happen in practice but if we have a local function that is called multiple times but always through one particular local thunk, we should end with caller_count greater than one, and with this code we will not? Such function certainly should not be considered called just once by the inliner. Similarly, if we ever have a local node with a non-local thunk (would that be legal and make sense?), then we should try not ending up with caller_count == 1, because inliner will also never consider that function just called once. So I'd suggest incrementing the caller by two instead in that case. But maybe I just don't understand what "handled transparently" in the comment means. Thanks for clearing this up, Martin
> > Index: ipa-cp.c > > =================================================================== > > --- ipa-cp.c (revision 221757) > > +++ ipa-cp.c (working copy) > > @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p > > return ret; > > } > > > > +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA > > + points to by the number of callers to NODE. */ > > + > > +static bool > > +count_callers (cgraph_node *node, void *data) > > +{ > > + int *caller_count = (int *) data; > > + > > + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) > > + /* Local thunks can be handled transparently, but if the thunk can not > > + be optimized out, count it as a real use. */ > > + if (!cs->caller->thunk.thunk_p || !cs->caller->local.local) > > + ++*caller_count; > > ...we don't recurse for local thunks and do not count their number of > callers here. I'm not sure if it can ever happen in practice but if > we have a local function that is called multiple times but always > through one particular local thunk, we should end with caller_count > greater than one, and with this code we will not? Such function > certainly should not be considered called just once by the inliner. Inliner won't inline to thunks at all currently, but you call the function by for_symbol_thunks_and_aliases, so the recursion to thunks already happens by this wrapper. > > Similarly, if we ever have a local node with a non-local thunk (would > that be legal and make sense?), then we should try not ending up with We currently keep local falg in sync in between thunks and functions (otherwise i386 calling convetions will break). But indeed we may add check here to avoid dependency on this detail. Honza > caller_count == 1, because inliner will also never consider that > function just called once. So I'd suggest incrementing the caller by > two instead in that case. > > But maybe I just don't understand what "handled transparently" in the > comment means. > > Thanks for clearing this up, > > Martin
2015-03-29 18:43 GMT+03:00 Jan Hubicka <hubicka@ucw.cz>: > Hi, > this patch improve crafty performance by avoiding ipa-cp clonning of > Search function that specializes the first iteration of the recursion. > The patch is by Martin, I only tested it and cleaned up code in count_callers > and set_single_call_flag > > Bootstrapped/regtested x86_64-linux, comitted. > PR ipa/65478 > * params.def (PARAM_IPA_CP_RECURSION_PENALTY) : New. > (PARAM_IPA_CP_SINGLE_CALL_PENALTY): Likewise. > * ipa-prop.h (ipa_node_params): New flags node_within_scc and > node_calling_single_call. > * ipa-cp.c (count_callers): New function. > (set_single_call_flag): Likewise. > (initialize_node_lattices): Count callers and set single_flag_call if > necessary. > (incorporate_penalties): New function. > (good_cloning_opportunity_p): Use it, dump new flags. > (propagate_constants_topo): Set node_within_scc flag if appropriate. > * doc/invoke.texi (ipa-cp-recursion-penalty, > ipa-cp-single-call-pentalty): Document. > Index: params.def > =================================================================== > --- params.def (revision 221757) > +++ params.def (working copy) > @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD, > "beneficial to clone.", > 500, 0, 0) > > +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY, > + "ipa-cp-recursion-penalty", > + "Percentage penalty the recursive functions will receive when they " > + "are evaluated for cloning.", > + 40, 0, 100) > + > +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY, > + "ipa-cp-single-call-penalty", > + "Percentage penalty functions containg a single call to another " > + "function will receive when they are evaluated for cloning.", > + 15, 0, 100) > + > DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS, > "ipa-max-agg-items", > "Maximum number of aggregate content items for a parameter in " > Index: ipa-prop.h > =================================================================== > --- ipa-prop.h (revision 221757) > +++ ipa-prop.h (working copy) > @@ -330,6 +330,10 @@ struct ipa_node_params > /* Node has been completely replaced by clones and will be removed after > ipa-cp is finished. */ > unsigned node_dead : 1; > + /* Node is involved in a recursion, potentionally indirect. */ > + unsigned node_within_scc : 1; > + /* Node is calling a private function called only once. */ > + unsigned node_calling_single_call : 1; > }; > > /* ipa_node_params access functions. Please use these to access fields that > Index: ipa-cp.c > =================================================================== > --- ipa-cp.c (revision 221757) > +++ ipa-cp.c (working copy) > @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p > return ret; > } > > +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA > + points to by the number of callers to NODE. */ > + > +static bool > +count_callers (cgraph_node *node, void *data) > +{ > + int *caller_count = (int *) data; > + > + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) > + /* Local thunks can be handled transparently, but if the thunk can not > + be optimized out, count it as a real use. */ > + if (!cs->caller->thunk.thunk_p || !cs->caller->local.local) > + ++*caller_count; > + return false; > +} > + > +/* Worker of call_for_symbol_thunks_and_aliases, it is supposed to be called on > + the one caller of some other node. Set the caller's corresponding flag. */ > + > +static bool > +set_single_call_flag (cgraph_node *node, void *) > +{ > + cgraph_edge *cs = node->callers; > + /* Local thunks can be handled transparently, skip them. */ > + while (cs && cs->caller->thunk.thunk_p && cs->caller->local.local) > + cs = cs->next_caller; > + if (cs) > + { > + gcc_assert (!cs->next_caller); This assert assumes the only non-thunk caller is always at the end of a callers list. Is it actually guaranteed? > + IPA_NODE_REF (cs->caller)->node_calling_single_call = true; > + return true; > + } > + return false; > +} > + > /* Initialize ipcp_lattices. */ Thanks, Ilya
Index: params.def =================================================================== --- params.def (revision 221757) +++ params.def (working copy) @@ -999,6 +999,18 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD, "beneficial to clone.", 500, 0, 0) +DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY, + "ipa-cp-recursion-penalty", + "Percentage penalty the recursive functions will receive when they " + "are evaluated for cloning.", + 40, 0, 100) + +DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY, + "ipa-cp-single-call-penalty", + "Percentage penalty functions containg a single call to another " + "function will receive when they are evaluated for cloning.", + 15, 0, 100) + DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS, "ipa-max-agg-items", "Maximum number of aggregate content items for a parameter in " Index: ipa-prop.h =================================================================== --- ipa-prop.h (revision 221757) +++ ipa-prop.h (working copy) @@ -330,6 +330,10 @@ struct ipa_node_params /* Node has been completely replaced by clones and will be removed after ipa-cp is finished. */ unsigned node_dead : 1; + /* Node is involved in a recursion, potentionally indirect. */ + unsigned node_within_scc : 1; + /* Node is calling a private function called only once. */ + unsigned node_calling_single_call : 1; }; /* ipa_node_params access functions. Please use these to access fields that Index: ipa-cp.c =================================================================== --- ipa-cp.c (revision 221757) +++ ipa-cp.c (working copy) @@ -811,6 +811,41 @@ set_all_contains_variable (struct ipcp_p return ret; } +/* Worker of call_for_symbol_thunks_and_aliases, increment the integer DATA + points to by the number of callers to NODE. */ + +static bool +count_callers (cgraph_node *node, void *data) +{ + int *caller_count = (int *) data; + + for (cgraph_edge *cs = node->callers; cs; cs = cs->next_caller) + /* Local thunks can be handled transparently, but if the thunk can not + be optimized out, count it as a real use. */ + if (!cs->caller->thunk.thunk_p || !cs->caller->local.local) + ++*caller_count; + return false; +} + +/* Worker of call_for_symbol_thunks_and_aliases, it is supposed to be called on + the one caller of some other node. Set the caller's corresponding flag. */ + +static bool +set_single_call_flag (cgraph_node *node, void *) +{ + cgraph_edge *cs = node->callers; + /* Local thunks can be handled transparently, skip them. */ + while (cs && cs->caller->thunk.thunk_p && cs->caller->local.local) + cs = cs->next_caller; + if (cs) + { + gcc_assert (!cs->next_caller); + IPA_NODE_REF (cs->caller)->node_calling_single_call = true; + return true; + } + return false; +} + /* Initialize ipcp_lattices. */ static void @@ -822,7 +857,17 @@ initialize_node_lattices (struct cgraph_ int i; gcc_checking_assert (node->has_gimple_body_p ()); - if (!cgraph_local_p (node)) + if (cgraph_local_p (node)) + { + int caller_count = 0; + node->call_for_symbol_thunks_and_aliases (count_callers, &caller_count, + true); + gcc_checking_assert (caller_count > 0); + if (caller_count == 1) + node->call_for_symbol_thunks_and_aliases (set_single_call_flag, + NULL, true); + } + else { /* When cloning is allowed, we can assume that externally visible functions are not called. We will compensate this by cloning @@ -2105,6 +2150,24 @@ hint_time_bonus (inline_hints hints) return result; } +/* If there is a reason to penalize the function described by INFO in the + cloning goodness evaluation, do so. */ + +static inline int64_t +incorporate_penalties (ipa_node_params *info, int64_t evaluation) +{ + if (info->node_within_scc) + evaluation = (evaluation + * (100 - PARAM_VALUE (PARAM_IPA_CP_RECURSION_PENALTY))) / 100; + + if (info->node_calling_single_call) + evaluation = (evaluation + * (100 - PARAM_VALUE (PARAM_IPA_CP_SINGLE_CALL_PENALTY))) + / 100; + + return evaluation; +} + /* Return true if cloning NODE is a good idea, given the estimated TIME_BENEFIT and SIZE_COST and with the sum of frequencies of incoming edges to the potential new clone in FREQUENCIES. */ @@ -2120,18 +2183,22 @@ good_cloning_opportunity_p (struct cgrap gcc_assert (size_cost > 0); + struct ipa_node_params *info = IPA_NODE_REF (node); if (max_count) { int factor = (count_sum * 1000) / max_count; int64_t evaluation = (((int64_t) time_benefit * factor) / size_cost); + evaluation = incorporate_penalties (info, evaluation); if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " good_cloning_opportunity_p (time: %i, " "size: %i, count_sum: " HOST_WIDE_INT_PRINT_DEC - ") -> evaluation: " "%"PRId64 + "%s%s) -> evaluation: " "%"PRId64 ", threshold: %i\n", time_benefit, size_cost, (HOST_WIDE_INT) count_sum, + info->node_within_scc ? ", scc" : "", + info->node_calling_single_call ? ", single_call" : "", evaluation, PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD)); return evaluation >= PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD); @@ -2140,13 +2207,16 @@ good_cloning_opportunity_p (struct cgrap { int64_t evaluation = (((int64_t) time_benefit * freq_sum) / size_cost); + evaluation = incorporate_penalties (info, evaluation); if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " good_cloning_opportunity_p (time: %i, " - "size: %i, freq_sum: %i) -> evaluation: " + "size: %i, freq_sum: %i%s%s) -> evaluation: " "%"PRId64 ", threshold: %i\n", - time_benefit, size_cost, freq_sum, evaluation, - PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD)); + time_benefit, size_cost, freq_sum, + info->node_within_scc ? ", scc" : "", + info->node_calling_single_call ? ", single_call" : "", + evaluation, PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD)); return evaluation >= PARAM_VALUE (PARAM_IPA_CP_EVAL_THRESHOLD); } @@ -2637,9 +2707,12 @@ propagate_constants_topo (struct ipa_top struct cgraph_edge *cs; for (cs = v->callees; cs; cs = cs->next_callee) - if (ipa_edge_within_scc (cs) - && propagate_constants_accross_call (cs)) - push_node_to_stack (topo, cs->callee->function_symbol ()); + if (ipa_edge_within_scc (cs)) + { + IPA_NODE_REF (v)->node_within_scc = true; + if (propagate_constants_accross_call (cs)) + push_node_to_stack (topo, cs->callee->function_symbol ()); + } v = pop_node_from_stack (topo); } Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 221757) +++ doc/invoke.texi (working copy) @@ -10834,6 +10834,15 @@ IPA-CP calculates its own score of cloni and performs those cloning opportunities with scores that exceed @option{ipa-cp-eval-threshold}. +@item ipa-cp-recursion-penalty +Percentage penalty the recursive functions will receive when they +are evaluated for cloning. + +@item ipa-cp-single-call-penalty +Percentage penalty functions containg a single call to another +function will receive when they are evaluated for cloning. + + @item ipa-max-agg-items IPA-CP is also capable to propagate a number of scalar values passed in an aggregate. @option{ipa-max-agg-items} controls the maximum