From patchwork Fri Mar 28 19:32:47 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 334867 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 7B01C140088 for ; Sat, 29 Mar 2014 06:33:03 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=NzbHT1VeHm9Z+/hm1Wpr2GJqoyXT6N8ELiN84KiKW8L/vqrKC/bWm um8ZgL7c6NO9MAmCO78MAuvY9mEspPuPvqV1P3XOMsLuZ3UnV+nYNRzYISYIWqpB R90cF/vV026ud4vYlziTgDhqQssn51O0ChXl55jj4+TukE9s2BPfr0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=093XgLfs229UIkinSMF5AkvPaL4=; b=PQbsOuQZdW7YFsiIuKZ9 RhIc9yarPic1pyy5JMYNDE+o4FJ8LRzcTd+PKofEbZABh4Gh1YOR+UiSiG5EhsYu 6JEkU9D1VqgHyBfReMqIlLHZtXAGUC8ryNlNmEnaCdWwf+JtxDeSwXA20fSnTa4k 71A8klzzmaYTh/Ya3+hH9cc= Received: (qmail 32720 invoked by alias); 28 Mar 2014 19:32:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32706 invoked by uid 89); 28 Mar 2014 19:32:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 28 Mar 2014 19:32:51 +0000 Received: from occam.kam.mff.cuni.cz (occam.kam.mff.cuni.cz [195.113.17.166]) by nikam.ms.mff.cuni.cz (Postfix) with ESMTP id 0291F543F6E for ; Fri, 28 Mar 2014 20:32:48 +0100 (CET) Received: by occam.kam.mff.cuni.cz (Postfix, from userid 16202) id F09A222015B; Fri, 28 Mar 2014 20:32:47 +0100 (CET) Date: Fri, 28 Mar 2014 20:32:47 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: PR ipa/60243 (inliner being slow) Message-ID: <20140328193247.GB7504@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Hi, the inliner heuristic is organized as a greedy algorithm making inline decisions in order defined by badness until inline limits are hit. The tricky part is that the badness depends both on caller and callee (it is basically size/time metric, that depends on callee, but caller provide context via known values and predicates that may simplify callee body). So after each inlining decision, the badnesses of calls from function being inlined and calls of function being inlined into needs to be updated. This updating process is basically O(1) for evaluation of predicates + O(n_call_sites) for evaulation of call edges that are independent. This may produce non-linear behaviour in stupid cases where you have function with very many call sites you inline into that is tiself called very many times. Other case where we get non-linear is the side case of want_inline_small_function_p which makes function inlinable if the code of caller grows but the overall unit shrinks. The growth of the unit after inlining given function needs to be recomputed every time when function cahnges or one of its calls are modified. This patch solves those bottlenecks. The first case via computing min_size, that is a rough estimate of minimal growth of the function after inlining. This can be used to cut the expensive per-edge computations when the function is obvoiusly large (as it would be in case it have many call sites). The other change is smarther estimate about the growth of the unit: the unit can shrink only if the function have call sites that shrink the code and in that case those will be inlined anyway or if the offline copy is eliminated. Instead of always computing precise value I introduced growth_likely_positive that makes estimate on how many calls one can have and first just quickly counts the call edges. If there are many of them, there is no need for expensive calcualation. In addition of shoting estimate_growth from the testcase profile it also imroves Firefox LTO inliner times by about 40% or 20 seconds. We are still not well on compiling richards testcase. I get out of memory problems with early inlining enabled and many other issues. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza PR ipa/60243 * ipa-inline.c (want_inline_small_function_p): Short circuit large functions; reorganize to make cheap checks first. (inline_small_functions): Do not estimate growth when dumping; it is expensive. * ipa-inline.h (inline_summary): Add min_size. (growth_likely_positive): New function. * ipa-inline-analysis.c (dump_inline_summary): Add min_size. (set_cond_stmt_execution_predicate): Cleanup. (estimate_edge_size_and_time): Compute min_size. (estimate_calls_size_and_time): Likewise. (estimate_node_size_and_time): Likewise. (inline_update_overall_summary): Update min_size. (do_estimate_edge_time): Likewise. (do_estimate_edge_size): Update. (do_estimate_edge_hints): Update. (growth_likely_positive): New function. Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 208875) +++ ipa-inline.c (working copy) @@ -573,6 +573,24 @@ want_inline_small_function_p (struct cgr e->inline_failed = CIF_FUNCTION_NOT_INLINE_CANDIDATE; want_inline = false; } + /* Do fast and conservative check if the function can be good + inline cnadidate. At themoment we allow inline hints to + promote non-inline function to inline and we increase + MAX_INLINE_INSNS_SINGLE 16fold for inline functions. */ + else if (!DECL_DECLARED_INLINE_P (callee->decl) + && inline_summary (callee)->min_size - inline_edge_summary (e)->call_stmt_size + > MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO)) + { + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + want_inline = false; + } + else if (DECL_DECLARED_INLINE_P (callee->decl) + && inline_summary (callee)->min_size - inline_edge_summary (e)->call_stmt_size + > 16 * MAX_INLINE_INSNS_SINGLE) + { + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + want_inline = false; + } else { int growth = estimate_edge_growth (e); @@ -585,56 +603,26 @@ want_inline_small_function_p (struct cgr hints suggests that inlining given function is very profitable. */ else if (DECL_DECLARED_INLINE_P (callee->decl) && growth >= MAX_INLINE_INSNS_SINGLE - && !big_speedup - && !(hints & (INLINE_HINT_indirect_call - | INLINE_HINT_loop_iterations - | INLINE_HINT_array_index - | INLINE_HINT_loop_stride))) + && ((!big_speedup + && !(hints & (INLINE_HINT_indirect_call + | INLINE_HINT_loop_iterations + | INLINE_HINT_array_index + | INLINE_HINT_loop_stride))) + || growth >= MAX_INLINE_INSNS_SINGLE * 16)) { e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT; want_inline = false; } - /* Before giving up based on fact that caller size will grow, allow - functions that are called few times and eliminating the offline - copy will lead to overall code size reduction. - Not all of these will be handled by subsequent inlining of functions - called once: in particular weak functions are not handled or funcitons - that inline to multiple calls but a lot of bodies is optimized out. - Finally we want to inline earlier to allow inlining of callbacks. - - This is slightly wrong on aggressive side: it is entirely possible - that function is called many times with a context where inlining - reduces code size and few times with a context where inlining increase - code size. Resoluting growth estimate will be negative even if it - would make more sense to keep offline copy and do not inline into the - call sites that makes the code size grow. - - When badness orders the calls in a way that code reducing calls come - first, this situation is not a problem at all: after inlining all - "good" calls, we will realize that keeping the function around is - better. */ - else if (growth <= MAX_INLINE_INSNS_SINGLE - /* Unlike for functions called once, we play unsafe with - COMDATs. We can allow that since we know functions - in consideration are small (and thus risk is small) and - moreover grow estimates already accounts that COMDAT - functions may or may not disappear when eliminated from - current unit. With good probability making aggressive - choice in all units is going to make overall program - smaller. - - Consequently we ask cgraph_can_remove_if_no_direct_calls_p - instead of - cgraph_will_be_removed_from_program_if_no_direct_calls */ - && !DECL_EXTERNAL (callee->decl) - && cgraph_can_remove_if_no_direct_calls_p (callee) - && estimate_growth (callee) <= 0) - ; else if (!DECL_DECLARED_INLINE_P (callee->decl) && !flag_inline_functions) { - e->inline_failed = CIF_NOT_DECLARED_INLINED; - want_inline = false; + /* growth_likely_positive is expensive, always test it last. */ + if (growth >= MAX_INLINE_INSNS_SINGLE + || growth_likely_positive (callee, growth)) + { + e->inline_failed = CIF_NOT_DECLARED_INLINED; + want_inline = false; + } } /* Apply MAX_INLINE_INSNS_AUTO limit for functions not declared inline Upgrade it to MAX_INLINE_INSNS_SINGLE when hints suggests that @@ -649,11 +637,18 @@ want_inline_small_function_p (struct cgr MAX_INLINE_INSNS_SINGLE) : MAX_INLINE_INSNS_AUTO)) { - e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; - want_inline = false; + /* growth_likely_positive is expensive, always test it last. */ + if (growth >= MAX_INLINE_INSNS_SINGLE + || growth_likely_positive (callee, growth)) + { + e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT; + want_inline = false; + } } /* If call is cold, do not inline when function body would grow. */ - else if (!cgraph_maybe_hot_edge_p (e)) + else if (!cgraph_maybe_hot_edge_p (e) + && (growth >= MAX_INLINE_INSNS_SINGLE + || growth_likely_positive (callee, growth))) { e->inline_failed = CIF_UNLIKELY_CALL; want_inline = false; @@ -1723,14 +1718,12 @@ inline_small_functions (void) inline_summary (callee)->size); fprintf (dump_file, " to be inlined into %s/%i in %s:%i\n" - " Estimated growth after inlined into all is %+i insns.\n" " Estimated badness is %i, frequency %.2f.\n", edge->caller->name (), edge->caller->order, flag_wpa ? "unknown" : gimple_filename ((const_gimple) edge->call_stmt), flag_wpa ? -1 : gimple_lineno ((const_gimple) edge->call_stmt), - estimate_growth (callee), badness, edge->frequency / (double)CGRAPH_FREQ_BASE); if (edge->count) Index: ipa-inline.h =================================================================== --- ipa-inline.h (revision 208875) +++ ipa-inline.h (working copy) @@ -117,6 +117,8 @@ struct GTY(()) inline_summary int self_size; /* Time of the function body. */ int self_time; + /* Minimal size increase after inlining. */ + int min_size; /* False when there something makes inlining impossible (such as va_arg). */ unsigned inlinable : 1; @@ -220,6 +222,7 @@ void estimate_ipcp_clone_size_and_time ( vec, int *, int *, inline_hints *); int do_estimate_growth (struct cgraph_node *); +bool growth_likely_positive (struct cgraph_node *, int); void inline_merge_summary (struct cgraph_edge *edge); void inline_update_overall_summary (struct cgraph_node *node); int do_estimate_edge_size (struct cgraph_edge *edge); Index: ipa-inline-analysis.c =================================================================== --- ipa-inline-analysis.c (revision 208875) +++ ipa-inline-analysis.c (working copy) @@ -1397,6 +1397,7 @@ dump_inline_summary (FILE *f, struct cgr fprintf (f, " global time: %i\n", s->time); fprintf (f, " self size: %i\n", s->self_size); fprintf (f, " global size: %i\n", s->size); + fprintf (f, " min size: %i\n", s->min_size); fprintf (f, " self stack: %i\n", (int) s->estimated_self_stack_size); fprintf (f, " global stack: %i\n", (int) s->estimated_stack_size); @@ -1746,8 +1747,7 @@ set_cond_stmt_execution_predicate (struc if (this_code != ERROR_MARK) { struct predicate p = add_condition (summary, index, &aggpos, - e->flags & EDGE_TRUE_VALUE - ? code : inverted_code, + this_code, gimple_cond_rhs (last)); e->aux = pool_alloc (edge_predicate_pool); *(struct predicate *) e->aux = p; @@ -2991,10 +2991,15 @@ estimate_edge_devirt_benefit (struct cgr return isummary->inlinable; } -/* Increase SIZE and TIME for size and time needed to handle edge E. */ +/* Increase SIZE, MIN_SIZE (if non-NULL) and TIME for size and time needed to + handle edge E with probability PROB. + Set HINTS if edge may be devirtualized. + KNOWN_VALS, KNOWN_AGGS and KNOWN_BINFOS describe context of the call + site. */ static inline void -estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *time, +estimate_edge_size_and_time (struct cgraph_edge *e, int *size, int *min_size, + int *time, int prob, vec known_vals, vec known_binfos, @@ -3004,12 +3009,16 @@ estimate_edge_size_and_time (struct cgra struct inline_edge_summary *es = inline_edge_summary (e); int call_size = es->call_stmt_size; int call_time = es->call_stmt_time; + int cur_size; if (!e->callee && estimate_edge_devirt_benefit (e, &call_size, &call_time, known_vals, known_binfos, known_aggs) && hints && cgraph_maybe_hot_edge_p (e)) *hints |= INLINE_HINT_indirect_call; - *size += call_size * INLINE_SIZE_SCALE; + cur_size = call_size * INLINE_SIZE_SCALE; + *size += cur_size; + if (min_size) + *min_size += cur_size; *time += apply_probability ((gcov_type) call_time, prob) * e->frequency * (INLINE_TIME_SCALE / CGRAPH_FREQ_BASE); if (*time > MAX_TIME * INLINE_TIME_SCALE) @@ -3018,12 +3027,14 @@ estimate_edge_size_and_time (struct cgra -/* Increase SIZE and TIME for size and time needed to handle all calls in NODE. - POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call - site. */ +/* Increase SIZE, MIN_SIZE and TIME for size and time needed to handle all + calls in NODE. + POSSIBLE_TRUTHS, KNOWN_VALS, KNOWN_AGGS and KNOWN_BINFOS describe context of + the call site. */ static void -estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time, +estimate_calls_size_and_time (struct cgraph_node *node, int *size, + int *min_size, int *time, inline_hints *hints, clause_t possible_truths, vec known_vals, @@ -3041,12 +3052,15 @@ estimate_calls_size_and_time (struct cgr { /* Predicates of calls shall not use NOT_CHANGED codes, sowe do not need to compute probabilities. */ - estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, + estimate_edge_size_and_time (e, size, + es->predicate ? NULL : min_size, + time, REG_BR_PROB_BASE, known_vals, known_binfos, known_aggs, hints); } else - estimate_calls_size_and_time (e->callee, size, time, hints, + estimate_calls_size_and_time (e->callee, size, min_size, time, + hints, possible_truths, known_vals, known_binfos, known_aggs); @@ -3057,7 +3071,9 @@ estimate_calls_size_and_time (struct cgr struct inline_edge_summary *es = inline_edge_summary (e); if (!es->predicate || evaluate_predicate (es->predicate, possible_truths)) - estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, + estimate_edge_size_and_time (e, size, + es->predicate ? NULL : min_size, + time, REG_BR_PROB_BASE, known_vals, known_binfos, known_aggs, hints); } @@ -3065,8 +3081,13 @@ estimate_calls_size_and_time (struct cgr /* Estimate size and time needed to execute NODE assuming - POSSIBLE_TRUTHS clause, and KNOWN_VALS and KNOWN_BINFOS information - about NODE's arguments. */ + POSSIBLE_TRUTHS clause, and KNOWN_VALS, KNOWN_AGGS and KNOWN_BINFOS + information about NODE's arguments. If non-NULL use also probability + information present in INLINE_PARAM_SUMMARY vector. + Additionally detemine hints determined by the context. Finally compute + minimal size needed for the call that is independent on the call context and + can be used for fast estimates. Return the values in RET_SIZE, + RET_MIN_SIZE, RET_TIME and RET_HINTS. */ static void estimate_node_size_and_time (struct cgraph_node *node, @@ -3074,7 +3095,7 @@ estimate_node_size_and_time (struct cgra vec known_vals, vec known_binfos, vec known_aggs, - int *ret_size, int *ret_time, + int *ret_size, int *ret_min_size, int *ret_time, inline_hints *ret_hints, vec inline_param_summary) @@ -3083,6 +3104,7 @@ estimate_node_size_and_time (struct cgra size_time_entry *e; int size = 0; int time = 0; + int min_size = 0; inline_hints hints = 0; int i; @@ -3128,6 +3150,8 @@ estimate_node_size_and_time (struct cgra gcc_checking_assert (time >= 0); } + gcc_checking_assert (true_predicate_p (&(*info->entry)[0].predicate)); + min_size = (*info->entry)[0].size; gcc_checking_assert (size >= 0); gcc_checking_assert (time >= 0); @@ -3145,12 +3169,13 @@ estimate_node_size_and_time (struct cgra if (DECL_DECLARED_INLINE_P (node->decl)) hints |= INLINE_HINT_declared_inline; - estimate_calls_size_and_time (node, &size, &time, &hints, possible_truths, + estimate_calls_size_and_time (node, &size, &min_size, &time, &hints, possible_truths, known_vals, known_binfos, known_aggs); gcc_checking_assert (size >= 0); gcc_checking_assert (time >= 0); time = RDIV (time, INLINE_TIME_SCALE); size = RDIV (size, INLINE_SIZE_SCALE); + min_size = RDIV (min_size, INLINE_SIZE_SCALE); if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "\n size:%i time:%i\n", (int) size, (int) time); @@ -3158,6 +3183,8 @@ estimate_node_size_and_time (struct cgra *ret_time = time; if (ret_size) *ret_size = size; + if (ret_min_size) + *ret_min_size = min_size; if (ret_hints) *ret_hints = hints; return; @@ -3182,7 +3209,7 @@ estimate_ipcp_clone_size_and_time (struc clause = evaluate_conditions_for_known_args (node, false, known_vals, known_aggs); estimate_node_size_and_time (node, clause, known_vals, known_binfos, - known_aggs, ret_size, ret_time, hints, vNULL); + known_aggs, ret_size, NULL, ret_time, hints, vNULL); } /* Translate all conditions from callee representation into caller @@ -3583,7 +3610,8 @@ inline_update_overall_summary (struct cg if (info->time > MAX_TIME * INLINE_TIME_SCALE) info->time = MAX_TIME * INLINE_TIME_SCALE; } - estimate_calls_size_and_time (node, &info->size, &info->time, NULL, + estimate_calls_size_and_time (node, &info->size, &info->min_size, + &info->time, NULL, ~(clause_t) (1 << predicate_false_condition), vNULL, vNULL, vNULL); info->time = (info->time + INLINE_TIME_SCALE / 2) / INLINE_TIME_SCALE; @@ -3628,6 +3656,7 @@ do_estimate_edge_time (struct cgraph_edg vec known_binfos; vec known_aggs; struct inline_edge_summary *es = inline_edge_summary (edge); + int min_size; callee = cgraph_function_or_thunk_node (edge->callee, NULL); @@ -3636,7 +3665,7 @@ do_estimate_edge_time (struct cgraph_edg &clause, &known_vals, &known_binfos, &known_aggs); estimate_node_size_and_time (callee, clause, known_vals, known_binfos, - known_aggs, &size, &time, &hints, es->param); + known_aggs, &size, &min_size, &time, &hints, es->param); known_vals.release (); known_binfos.release (); known_aggs.release (); @@ -3646,6 +3675,7 @@ do_estimate_edge_time (struct cgraph_edg /* When caching, update the cache entry. */ if (edge_growth_cache.exists ()) { + inline_summary (edge->callee)->min_size = min_size; if ((int) edge_growth_cache.length () <= edge->uid) edge_growth_cache.safe_grow_cleared (cgraph_edge_max_uid); edge_growth_cache[edge->uid].time = time + (time >= 0); @@ -3689,7 +3719,7 @@ do_estimate_edge_size (struct cgraph_edg &clause, &known_vals, &known_binfos, &known_aggs); estimate_node_size_and_time (callee, clause, known_vals, known_binfos, - known_aggs, &size, NULL, NULL, vNULL); + known_aggs, &size, NULL, NULL, NULL, vNULL); known_vals.release (); known_binfos.release (); known_aggs.release (); @@ -3728,7 +3758,7 @@ do_estimate_edge_hints (struct cgraph_ed &clause, &known_vals, &known_binfos, &known_aggs); estimate_node_size_and_time (callee, clause, known_vals, known_binfos, - known_aggs, NULL, NULL, &hints, vNULL); + known_aggs, NULL, NULL, NULL, &hints, vNULL); known_vals.release (); known_binfos.release (); known_aggs.release (); @@ -3848,6 +3878,55 @@ do_estimate_growth (struct cgraph_node * } +/* Make cheap estimation if growth of NODE is likely positive knowing + EDGE_GROWTH of one particular edge. + We assume that most of other edges will have similar growth + and skip computation if there are too many callers. */ + +bool +growth_likely_positive (struct cgraph_node *node, int edge_growth ATTRIBUTE_UNUSED) +{ + int max_callers; + int ret; + struct cgraph_edge *e; + gcc_checking_assert (edge_growth > 0); + + /* Unlike for functions called once, we play unsafe with + COMDATs. We can allow that since we know functions + in consideration are small (and thus risk is small) and + moreover grow estimates already accounts that COMDAT + functions may or may not disappear when eliminated from + current unit. With good probability making aggressive + choice in all units is going to make overall program + smaller. + + Consequently we ask cgraph_can_remove_if_no_direct_calls_p + instead of + cgraph_will_be_removed_from_program_if_no_direct_calls */ + if (DECL_EXTERNAL (node->decl) + || !cgraph_can_remove_if_no_direct_calls_p (node)) + return true; + + /* If there is cached value, just go ahead. */ + if ((int)node_growth_cache.length () > node->uid + && (ret = node_growth_cache[node->uid])) + return ret > 0; + if (!cgraph_will_be_removed_from_program_if_no_direct_calls (node) + && (!DECL_COMDAT (node->decl) + || !cgraph_can_remove_if_no_direct_calls_p (node))) + return true; + max_callers = inline_summary (node)->size * 4 / edge_growth + 2; + + for (e = node->callers; e; e = e->next_caller) + { + max_callers--; + if (!max_callers) + return true; + } + return estimate_growth (node) > 0; +} + + /* This function performs intraprocedural analysis in NODE that is required to inline indirect calls. */