From patchwork Thu Oct 22 09:48:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1386105 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ucw.cz Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CH2cJ6c5Cz9sT6 for ; Thu, 22 Oct 2020 20:48:31 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 39F2E386EC54; Thu, 22 Oct 2020 09:48:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 0B0E13857C52 for ; Thu, 22 Oct 2020 09:48:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0B0E13857C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=none smtp.mailfrom=hubicka@kam.mff.cuni.cz Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 64B882811D9; Thu, 22 Oct 2020 11:48:20 +0200 (CEST) Date: Thu, 22 Oct 2020 11:48:20 +0200 From: Jan Hubicka To: gary@amperecomputing.com, mjambor@suse.cz, mliska@suse.cz, jakub@redhat.com, gcc-patches@gcc.gnu.org Subject: Materialize clones on demand Message-ID: <20201022094820.GB97578@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi, this patch removes the pass to materialize all clones and instead this is now done on demand. The motivation is to reduce lifetime of function bodies in ltrans that should noticeably reduce memory use for highly parallel compilations of large programs (like Martin does) or with partitioning reduced/disabled. For cc1 with one partition the memory use seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not particularly accurate). This should also make get_body to do the right thing at WPA time (still not good idea for production patch). I did not test this path. Martin (Jambor), Jakub, there is one FIXME in ipa-param-manipulation. We seem to ICE when we redirect to a call before callee is materialized (this should be possible to trigger on mainline with recursive callgraphs too, but it definitly triggers on several testcases in c testsuite if the get_untransformed_body is disabled). It would be nice to fix this, but I am not quite sure how the debug info adjustments here works. Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with release checking. I plan to commit it after bit more testing. Honza gcc/ChangeLog: 2020-10-22 Jan Hubicka * cgraph.c (cgraph_node::get_untransformed_body): Perform lazy clone materialization. * cgraph.h (cgraph_node::materialize_clone): Declare. (symbol_table::materialize_all_clones): Remove. * cgraphclones.c (cgraph_materialize_clone): Turn to ... (cgraph_node::materialize_clone): .. this one; move here dumping from symbol_table::materialize_all_clones. (symbol_table::materialize_all_clones): Remove. * cgraphunit.c (mark_functions_to_output): Clear stmt references. (cgraph_node::expand): Initialize bitmaps early; do not call execute_all_ipa_transforms if there are no transforms. * ipa-inline-transform.c (save_inline_function_body): Fix formating. (inline_transform): Materialize all clones before function is modified. * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): Materialize clone if needed. * ipa.c (class pass_materialize_all_clones): Remove. (make_pass_materialize_all_clones): Remove. * passes.c (execute_all_ipa_transforms): Materialize all clones. * passes.def: Remove pass_materialize_all_clones. * tree-pass.h (make_pass_materialize_all_clones): Remove. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 9480935ff84..35a0182b847 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -3872,7 +3872,7 @@ cgraph_node::function_or_virtual_thunk_symbol } /* When doing LTO, read cgraph_node's body from disk if it is not already - present. */ + present. Also perform any necessary clone materializations. */ bool cgraph_node::get_untransformed_body (void) @@ -3882,6 +3882,17 @@ cgraph_node::get_untransformed_body (void) size_t len; tree decl = this->decl; + /* See if there is clone to be materialized. + (inline clones does not need materialization, but we can be seeing + an inline clone of real clone). */ + cgraph_node *p = this; + for (cgraph_node *c = clone_of; c; c = c->clone_of) + { + if (c->decl != decl) + p->materialize_clone (); + p = c; + } + /* Check if body is already there. Either we have gimple body or the function is thunk and in that case we set DECL_ARGUMENTS. */ if (DECL_ARGUMENTS (decl) || gimple_has_body_p (decl)) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index c953a1b6711..d3279410c2e 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1152,6 +1152,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node apply them. */ bool get_body (void); + void materialize_clone (void); + /* Release memory used to represent body of function. Use this only for functions that are released before being translated to target code (i.e. RTL). Functions that are compiled to RTL and beyond @@ -2286,13 +2288,6 @@ public: functions inserted into callgraph already at construction time. */ void process_new_functions (void); - /* Once all functions from compilation unit are in memory, produce all clones - and update all calls. We might also do this on demand if we don't want to - bring all functions to memory prior compilation, but current WHOPR - implementation does that and it is bit easier to keep everything right - in this order. */ - void materialize_all_clones (void); - /* Register a symbol NODE. */ inline void register_symbol (symtab_node *node); diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index f920dcb4c29..07a51a58aef 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -1083,114 +1083,57 @@ void cgraph_node::remove_from_clone_tree () /* Given virtual clone, turn it into actual clone. */ -static void -cgraph_materialize_clone (cgraph_node *node) -{ - bitmap_obstack_initialize (NULL); - node->former_clone_of = node->clone_of->decl; - if (node->clone_of->former_clone_of) - node->former_clone_of = node->clone_of->former_clone_of; - /* Copy the OLD_VERSION_NODE function tree to the new version. */ - tree_function_versioning (node->clone_of->decl, node->decl, - node->clone.tree_map, node->clone.param_adjustments, - true, NULL, NULL); - if (symtab->dump_file) - { - dump_function_to_file (node->clone_of->decl, symtab->dump_file, - dump_flags); - dump_function_to_file (node->decl, symtab->dump_file, dump_flags); - } - - cgraph_node *clone_of = node->clone_of; - /* Function is no longer clone. */ - node->remove_from_clone_tree (); - if (!clone_of->analyzed && !clone_of->clones) - { - clone_of->release_body (); - clone_of->remove_callees (); - clone_of->remove_all_references (); - } - bitmap_obstack_release (NULL); -} - -/* Once all functions from compilation unit are in memory, produce all clones - and update all calls. We might also do this on demand if we don't want to - bring all functions to memory prior compilation, but current WHOPR - implementation does that and it is a bit easier to keep everything right in - this order. */ - void -symbol_table::materialize_all_clones (void) +cgraph_node::materialize_clone () { - cgraph_node *node; - bool stabilized = false; - - + clone_of->get_untransformed_body (); + former_clone_of = clone_of->decl; + if (clone_of->former_clone_of) + former_clone_of = clone_of->former_clone_of; if (symtab->dump_file) - fprintf (symtab->dump_file, "Materializing clones\n"); - - cgraph_node::checking_verify_cgraph_nodes (); - - /* We can also do topological order, but number of iterations should be - bounded by number of IPA passes since single IPA pass is probably not - going to create clones of clones it created itself. */ - while (!stabilized) { - stabilized = true; - FOR_EACH_FUNCTION (node) + fprintf (symtab->dump_file, "cloning %s to %s\n", + clone_of->dump_name (), + dump_name ()); + if (clone.tree_map) { - if (node->clone_of && node->decl != node->clone_of->decl - && !gimple_has_body_p (node->decl)) + fprintf (symtab->dump_file, " replace map:"); + for (unsigned int i = 0; + i < vec_safe_length (clone.tree_map); + i++) { - if (!node->clone_of->clone_of) - node->clone_of->get_untransformed_body (); - if (gimple_has_body_p (node->clone_of->decl)) - { - if (symtab->dump_file) - { - fprintf (symtab->dump_file, "cloning %s to %s\n", - node->clone_of->dump_name (), - node->dump_name ()); - if (node->clone.tree_map) - { - unsigned int i; - fprintf (symtab->dump_file, " replace map:"); - for (i = 0; - i < vec_safe_length (node->clone.tree_map); - i++) - { - ipa_replace_map *replace_info; - replace_info = (*node->clone.tree_map)[i]; - fprintf (symtab->dump_file, "%s %i -> ", - i ? "," : "", replace_info->parm_num); - print_generic_expr (symtab->dump_file, - replace_info->new_tree); - } - fprintf (symtab->dump_file, "\n"); - } - if (node->clone.param_adjustments) - node->clone.param_adjustments->dump (symtab->dump_file); - } - cgraph_materialize_clone (node); - stabilized = false; - } + ipa_replace_map *replace_info; + replace_info = (*clone.tree_map)[i]; + fprintf (symtab->dump_file, "%s %i -> ", + i ? "," : "", replace_info->parm_num); + print_generic_expr (symtab->dump_file, + replace_info->new_tree); } + fprintf (symtab->dump_file, "\n"); } + if (clone.param_adjustments) + clone.param_adjustments->dump (symtab->dump_file); } - FOR_EACH_FUNCTION (node) - if (!node->analyzed && node->callees) - { - node->remove_callees (); - node->remove_all_references (); - } - else - node->clear_stmts_in_references (); + /* Copy the OLD_VERSION_NODE function tree to the new version. */ + tree_function_versioning (clone_of->decl, decl, + clone.tree_map, clone.param_adjustments, + true, NULL, NULL); if (symtab->dump_file) - fprintf (symtab->dump_file, "Materialization Call site updates done.\n"); - - cgraph_node::checking_verify_cgraph_nodes (); + { + dump_function_to_file (clone_of->decl, symtab->dump_file, + dump_flags); + dump_function_to_file (decl, symtab->dump_file, dump_flags); + } - symtab->remove_unreachable_nodes (symtab->dump_file); + cgraph_node *this_clone_of = clone_of; + /* Function is no longer clone. */ + remove_from_clone_tree (); + if (!this_clone_of->analyzed && !this_clone_of->clones) + { + this_clone_of->release_body (); + this_clone_of->remove_callees (); + this_clone_of->remove_all_references (); + } } #include "gt-cgraphclones.h" diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 05713c28cf0..1e2262789dd 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1601,6 +1601,7 @@ mark_functions_to_output (void) FOR_EACH_FUNCTION (node) { tree decl = node->decl; + node->clear_stmts_in_references (); gcc_assert (!node->process || node->same_comdat_group); if (node->process) @@ -2274,6 +2275,9 @@ cgraph_node::expand (void) announce_function (decl); process = 0; gcc_assert (lowered); + + /* Initialize the default bitmap obstack. */ + bitmap_obstack_initialize (NULL); get_untransformed_body (); /* Generate RTL for the body of DECL. */ @@ -2282,9 +2286,6 @@ cgraph_node::expand (void) gcc_assert (symtab->global_info_ready); - /* Initialize the default bitmap obstack. */ - bitmap_obstack_initialize (NULL); - /* Initialize the RTL code for the function. */ saved_loc = input_location; input_location = DECL_SOURCE_LOCATION (decl); @@ -2298,7 +2299,8 @@ cgraph_node::expand (void) bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/ update_ssa (TODO_update_ssa_only_virtuals); - execute_all_ipa_transforms (false); + if (ipa_transforms_to_apply.exists ()) + execute_all_ipa_transforms (false); /* Perform all tree transforms and optimizations. */ diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c index af2c2856aaa..f419df04961 100644 --- a/gcc/ipa-inline-transform.c +++ b/gcc/ipa-inline-transform.c @@ -644,16 +644,16 @@ save_inline_function_body (struct cgraph_node *node) tree_function_versioning (node->decl, first_clone->decl, NULL, NULL, true, NULL, NULL); - /* The function will be short lived and removed after we inline all the clones, - but make it internal so we won't confuse ourself. */ + /* The function will be short lived and removed after we inline all the + clones, but make it internal so we won't confuse ourself. */ DECL_EXTERNAL (first_clone->decl) = 0; TREE_PUBLIC (first_clone->decl) = 0; DECL_COMDAT (first_clone->decl) = 0; first_clone->ipa_transforms_to_apply.release (); /* When doing recursive inlining, the clone may become unnecessary. - This is possible i.e. in the case when the recursive function is proved to be - non-throwing and the recursion happens only in the EH landing pad. + This is possible i.e. in the case when the recursive function is proved to + be non-throwing and the recursion happens only in the EH landing pad. We cannot remove the clone until we are done with saving the body. Remove it now. */ if (!first_clone->callers) @@ -696,6 +696,14 @@ inline_transform (struct cgraph_node *node) if (cfun->after_inlining) return 0; + cgraph_node *next_clone; + for (cgraph_node *n = node->clones; n; n = next_clone) + { + next_clone = n->next_sibling_clone; + if (n->decl != node->decl) + n->materialize_clone (); + } + /* We might need the body of this function so that we can expand it inline somewhere else. */ if (preserve_function_body_p (node)) diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c index 5fc0de56556..438f4bd5a68 100644 --- a/gcc/ipa-param-manipulation.c +++ b/gcc/ipa-param-manipulation.c @@ -783,6 +783,13 @@ ipa_param_adjustments::modify_call (gcall *stmt, { vec **debug_args = NULL; unsigned i = 0; + cgraph_node *callee_node = cgraph_node::get (callee_decl); + + /* FIXME: we don't seem to be able to insert debug args before clone + is materialized. Materializing them early leads to extra memory + use. */ + if (callee_node->clone_of) + callee_node->get_untransformed_body (); for (tree old_parm = DECL_ARGUMENTS (old_decl); old_parm && i < old_nargs && ((int) i) < m_always_copy_start; old_parm = DECL_CHAIN (old_parm), i++) diff --git a/gcc/ipa.c b/gcc/ipa.c index 288b58cf73d..ab7256d857f 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -1386,43 +1386,3 @@ make_pass_ipa_single_use (gcc::context *ctxt) return new pass_ipa_single_use (ctxt); } -/* Materialize all clones. */ - -namespace { - -const pass_data pass_data_materialize_all_clones = -{ - SIMPLE_IPA_PASS, /* type */ - "materialize-all-clones", /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_IPA_OPT, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - 0, /* todo_flags_finish */ -}; - -class pass_materialize_all_clones : public simple_ipa_opt_pass -{ -public: - pass_materialize_all_clones (gcc::context *ctxt) - : simple_ipa_opt_pass (pass_data_materialize_all_clones, ctxt) - {} - - /* opt_pass methods: */ - virtual unsigned int execute (function *) - { - symtab->materialize_all_clones (); - return 0; - } - -}; // class pass_materialize_all_clones - -} // anon namespace - -simple_ipa_opt_pass * -make_pass_materialize_all_clones (gcc::context *ctxt) -{ - return new pass_materialize_all_clones (ctxt); -} diff --git a/gcc/passes.c b/gcc/passes.c index 6ff31ec37d7..1942b7cd1c3 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2271,6 +2271,14 @@ execute_all_ipa_transforms (bool do_not_collect) return; node = cgraph_node::get (current_function_decl); + cgraph_node *next_clone; + for (cgraph_node *n = node->clones; n; n = next_clone) + { + next_clone = n->next_sibling_clone; + if (n->decl != node->decl) + n->materialize_clone (); + } + if (node->ipa_transforms_to_apply.exists ()) { unsigned int i; diff --git a/gcc/passes.def b/gcc/passes.def index f865bdc19ac..cf15d8eafca 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -172,7 +172,6 @@ along with GCC; see the file COPYING3. If not see passes are executed after partitioning and thus see just parts of the compiled unit. */ INSERT_PASSES_AFTER (all_late_ipa_passes) - NEXT_PASS (pass_materialize_all_clones); NEXT_PASS (pass_ipa_pta); NEXT_PASS (pass_omp_simd_clone); TERMINATE_PASS_LIST (all_late_ipa_passes) diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 62e5b696cab..1e8badfe4be 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -519,8 +519,6 @@ extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_modref (gcc::context *ctxt); -extern simple_ipa_opt_pass *make_pass_materialize_all_clones (gcc::context * - ctxt); extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context *ctxt);