From patchwork Thu Nov 21 15:13:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1198995 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-514327-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="mS8WQW8V"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47Jjk54pFjz9sR5 for ; Fri, 22 Nov 2019 02:13:17 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=pnrQ/l6aCDJXc9SxPZET3n65SFXan0/hgSJPYp45Hs11Njl7BVEHp 0H1tEkHDXezO44T9zJBZtJ9ZORmZs8woVbxsqf+6+XPPzFLxwgVGb5Jpag25ylGI 6NP2c2rst7Mg4yerakG5/v6JuVeUphEt76sQIXr4ZFCyYl8uTmO3OU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=a9wF/sr90sz7gsQeyqKTujzlH1g=; b=mS8WQW8Vm/kHkokZlyMo 2ENAqf37ZUBlEnJBeqi+M25NPmGvc7BFYg1NZ675aohYDscGMxALgYA/yZ1XRxLU QLsrPx+/wCFTILQea3bsbp/HeG7cEP1AWtNoMreMTDYhkq3ao+baxBCYgPxKzL+T HHwONAHpAHCkujSWsU2BQEM= Received: (qmail 3847 invoked by alias); 21 Nov 2019 15:13:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3839 invoked by uid 89); 21 Nov 2019 15:13:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=poorly, noop, no-op, avail X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 21 Nov 2019 15:13:06 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id D5F07288566; Thu, 21 Nov 2019 16:13:03 +0100 (CET) Date: Thu, 21 Nov 2019 16:13:03 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Fix nonlinearity in callee edges updates Message-ID: <20191121151303.jgcj4wiugxbab5im@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20170113 (1.7.2) Hi, this patch fixes remaining quadratic behaviour on inlining into large functions (in gcc the avrage function we inline into has 498 callees) by avoiding re-checking inlinability of the call edges in the caller of inline call that was not affected by inlining. This is not complete no-op as can be seen, say on tramp3d. It may happen that inlining into function may make completely unrelated call in the same function inlinable, most commonly if that is a cold call to a function whose other calls are hot and inlined in meantime. Then inlining last call and eliminating offline copy may become a win in the cost metrics. We inline such calls later, so pracitcal difference seems to be mostly about reordering the inline decision. Resulting tramp3d binary has same size and runtime but different function order. I did benchmakring on GCC and Firefox and it seems that we do not lose anying important here. I will keep eye one performance impact of this. Inliner now spends about 60% by updating costs and 15% by actual application of inline decisions in callgraph. That seems pretty good and we are getting slowly to same WPA inline time as with GCC9: - 27.78% 0.50% lto1-wpa lto1 [.] inline_small_functions - 27.28% inline_small_functions - 9.90% update_callee_keys - 5.57% update_edge_key 4.62% edge_badness 0.62% bitmap_bit_p - 7.20% update_caller_keys - 6.04% want_inline_small_function_p - 5.34% do_estimate_edge_size - 5.15% do_estimate_edge_time - 2.70% ipa_call_context::estimate_size_and_time - 1.74% estimate_calls_size_and_time_1 - 1.00% estimate_edge_devirt_benefit 0.92% ipa_get_indirect_edge_target_1 - 1.73% evaluate_properties_for_edge - 0.96% evaluate_conditions_for_known_args 0.59% fold_binary_loc 0.89% can_inline_edge_p - 4.08% inline_call - 1.58% clone_inlined_nodes 1.15% cgraph_node::create_clone 1.57% ipa_merge_fn_summary_after_inlining - 1.58% fibonacci_heap::extract_minimum_node fibonacci_heap::consolidate - 0.89% want_inline_small_function_p - 0.70% do_estimate_edge_size 0.60% do_estimate_edge_time - 0.87% estimate_growth - 0.85% do_estimate_growth_1 0.70% do_estimate_edge_size 0.64% edge_badness This is quite extreme case (caused by the fact that we have a lot of autogenerated code that calls functions which we now consider for inlining and we are now able to propagate aggregate jump functions and figure out that resulting code will shrink or not grow too much). Bootstrapped/regtested x86_64-linux. Will commit it shortly. Honza * ipa-inline.c (update_callee_keys): Add parameter UPDATE_SINCE. (resolve_noninline_speculation, inline_small_functions): Avoid redundant updates. Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 278542) +++ ipa-inline.c (working copy) @@ -1481,40 +1481,63 @@ update_caller_keys (edge_heap_t *heap, s } } -/* Recompute HEAP nodes for each uninlined call in NODE. +/* Recompute HEAP nodes for each uninlined call in NODE + If UPDATE_SINCE is non-NULL check if edges called within that function + are inlinable (typically UPDATE_SINCE is the inline clone we introduced + where all edges have new context). + This is used when we know that edge badnesses are going only to increase (we introduced new call site) and thus all we need is to insert newly created edges into heap. */ static void update_callee_keys (edge_heap_t *heap, struct cgraph_node *node, + struct cgraph_node *update_since, bitmap updated_nodes) { struct cgraph_edge *e = node->callees; + bool check_inlinability = update_since == node; if (!e) return; while (true) if (!e->inline_failed && e->callee->callees) - e = e->callee->callees; + { + if (e->callee == update_since) + check_inlinability = true; + e = e->callee->callees; + } else { enum availability avail; struct cgraph_node *callee; + if (!check_inlinability) + { + if (e->aux + && !bitmap_bit_p (updated_nodes, + e->callee->ultimate_alias_target + (&avail, e->caller)->get_uid ())) + update_edge_key (heap, e); + } /* We do not reset callee growth cache here. Since we added a new call, growth chould have just increased and consequentely badness metric don't need updating. */ - if (e->inline_failed - && (callee = e->callee->ultimate_alias_target (&avail, e->caller)) - && ipa_fn_summaries->get (callee) != NULL - && ipa_fn_summaries->get (callee)->inlinable - && avail >= AVAIL_AVAILABLE - && !bitmap_bit_p (updated_nodes, callee->get_uid ())) + else if (e->inline_failed + && (callee = e->callee->ultimate_alias_target (&avail, + e->caller)) + && avail >= AVAIL_AVAILABLE + && ipa_fn_summaries->get (callee) != NULL + && ipa_fn_summaries->get (callee)->inlinable + && !bitmap_bit_p (updated_nodes, callee->get_uid ())) { if (can_inline_edge_p (e, false) && want_inline_small_function_p (e, false) && can_inline_edge_by_limits_p (e, false)) - update_edge_key (heap, e); + { + gcc_checking_assert (check_inlinability || can_inline_edge_p (e, false)); + gcc_checking_assert (check_inlinability || e->aux); + update_edge_key (heap, e); + } else if (e->aux) { report_inline_failed_reason (e); @@ -1522,6 +1545,13 @@ update_callee_keys (edge_heap_t *heap, s e->aux = NULL; } } + /* In case we redirected to unreachable node we only need to remove the + fibheap entry. */ + else if (e->aux) + { + heap->delete_node ((edge_heap_node_t *) e->aux); + e->aux = NULL; + } if (e->next_callee) e = e->next_callee; else @@ -1530,6 +1560,8 @@ update_callee_keys (edge_heap_t *heap, s { if (e->caller == node) return; + if (e->caller == update_since) + check_inlinability = false; e = e->caller->callers; } while (!e->next_callee); @@ -1820,7 +1852,7 @@ resolve_noninline_speculation (edge_heap ipa_update_overall_fn_summary (where); update_caller_keys (edge_heap, where, updated_nodes, NULL); - update_callee_keys (edge_heap, where, + update_callee_keys (edge_heap, where, NULL, updated_nodes); } } @@ -1993,7 +2025,7 @@ inline_small_functions (void) reset_edge_caches (where); update_caller_keys (&edge_heap, where, updated_nodes, NULL); - update_callee_keys (&edge_heap, where, + update_callee_keys (&edge_heap, where, NULL, updated_nodes); bitmap_clear (updated_nodes); } @@ -2124,6 +2156,8 @@ inline_small_functions (void) continue; } + profile_count old_count = callee->count; + /* Heuristics for inlining small functions work poorly for recursive calls where we do effects similar to loop unrolling. When inlining such edge seems profitable, leave decision on @@ -2147,7 +2181,7 @@ inline_small_functions (void) at once. Consequently we need to update all callee keys. */ if (opt_for_fn (edge->caller->decl, flag_indirect_inlining)) add_new_edges_to_heap (&edge_heap, new_indirect_edges); - update_callee_keys (&edge_heap, where, updated_nodes); + update_callee_keys (&edge_heap, where, where, updated_nodes); bitmap_clear (updated_nodes); } else @@ -2197,9 +2231,12 @@ inline_small_functions (void) /* Wrapper penalty may be non-monotonous in this respect. Fortunately it only affects small functions. */ && !wrapper_heuristics_may_apply (where, old_size)) - update_callee_keys (&edge_heap, edge->callee, updated_nodes); + update_callee_keys (&edge_heap, edge->callee, edge->callee, + updated_nodes); else - update_callee_keys (&edge_heap, where, updated_nodes); + update_callee_keys (&edge_heap, where, + edge->callee, + updated_nodes); } where = edge->caller; if (where->inlined_to) @@ -2214,9 +2251,10 @@ inline_small_functions (void) update_caller_keys (&edge_heap, where, updated_nodes, NULL); /* Offline copy count has possibly changed, recompute if profile is available. */ - struct cgraph_node *n = cgraph_node::get (edge->callee->decl); - if (n != edge->callee && n->analyzed && n->count.ipa ().initialized_p ()) - update_callee_keys (&edge_heap, n, updated_nodes); + struct cgraph_node *n + = cgraph_node::get (edge->callee->decl)->ultimate_alias_target (); + if (n != edge->callee && n->analyzed && !(n->count == old_count)) + update_callee_keys (&edge_heap, n, NULL, updated_nodes); bitmap_clear (updated_nodes); if (dump_enabled_p ())