Message ID | 20201022094820.GB97578@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Materialize clones on demand | expand |
Hi, On Thu, Oct 22 2020, Jan Hubicka wrote: > Hi, > this patch removes the pass to materialize all clones and instead this > is now done on demand. The motivation is to reduce lifetime of function > bodies in ltrans that should noticeably reduce memory use for highly > parallel compilations of large programs (like Martin does) or with > partitioning reduced/disabled. For cc1 with one partition the memory use > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not > particularly accurate). > Nice. > This should also make get_body to do the right thing at WPA time (still > not good idea for production patch). I did not test this path. > > Martin (Jambor), Jakub, there is one FIXME in ipa-param-manipulation. > We seem to ICE when we redirect to a call before callee is materialized > (this should be possible to trigger on mainline with recursive > callgraphs too, but it definitly triggers on several testcases in c > testsuite if the get_untransformed_body is disabled). It would be nice > to fix this, but I am not quite sure how the debug info adjustments here > works. Well, the debug mappings are all based on PARM_DECLs. Unfortunately, I cannot think of any quick fix now, though we might want to sit down and try to revise the mechanism also because of debug info issues described in PR 95343 and PR 93385. I'll keep this in mind and in my notes. I have one question regarding the patch itself: > Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with > release checking. I plan to commit it after bit more testing. > > Honza > > gcc/ChangeLog: > > 2020-10-22 Jan Hubicka <hubicka@ucw.cz> > > * cgraph.c (cgraph_node::get_untransformed_body): Perform lazy > clone materialization. > * cgraph.h (cgraph_node::materialize_clone): Declare. > (symbol_table::materialize_all_clones): Remove. > * cgraphclones.c (cgraph_materialize_clone): Turn to ... > (cgraph_node::materialize_clone): .. this one; move here > dumping from symbol_table::materialize_all_clones. > (symbol_table::materialize_all_clones): Remove. > * cgraphunit.c (mark_functions_to_output): Clear stmt references. > (cgraph_node::expand): Initialize bitmaps early; > do not call execute_all_ipa_transforms if there are no transforms. > * ipa-inline-transform.c (save_inline_function_body): Fix formating. > (inline_transform): Materialize all clones before function is modified. > * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): > Materialize clone if needed. > * ipa.c (class pass_materialize_all_clones): Remove. > (make_pass_materialize_all_clones): Remove. > * passes.c (execute_all_ipa_transforms): Materialize all clones. > * passes.def: Remove pass_materialize_all_clones. > * tree-pass.h (make_pass_materialize_all_clones): Remove. > [...] > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c > index 05713c28cf0..1e2262789dd 100644 > --- a/gcc/cgraphunit.c > +++ b/gcc/cgraphunit.c > @@ -2298,7 +2299,8 @@ cgraph_node::expand (void) > bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/ > > update_ssa (TODO_update_ssa_only_virtuals); > - execute_all_ipa_transforms (false); > + if (ipa_transforms_to_apply.exists ()) > + execute_all_ipa_transforms (false); > Can some function not have ipa_inline among the transforms_to_apply? Martin
> > Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with > > release checking. I plan to commit it after bit more testing. > > > > Honza > > > > gcc/ChangeLog: > > > > 2020-10-22 Jan Hubicka <hubicka@ucw.cz> > > > > * cgraph.c (cgraph_node::get_untransformed_body): Perform lazy > > clone materialization. > > * cgraph.h (cgraph_node::materialize_clone): Declare. > > (symbol_table::materialize_all_clones): Remove. > > * cgraphclones.c (cgraph_materialize_clone): Turn to ... > > (cgraph_node::materialize_clone): .. this one; move here > > dumping from symbol_table::materialize_all_clones. > > (symbol_table::materialize_all_clones): Remove. > > * cgraphunit.c (mark_functions_to_output): Clear stmt references. > > (cgraph_node::expand): Initialize bitmaps early; > > do not call execute_all_ipa_transforms if there are no transforms. > > * ipa-inline-transform.c (save_inline_function_body): Fix formating. > > (inline_transform): Materialize all clones before function is modified. > > * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): > > Materialize clone if needed. > > * ipa.c (class pass_materialize_all_clones): Remove. > > (make_pass_materialize_all_clones): Remove. > > * passes.c (execute_all_ipa_transforms): Materialize all clones. > > * passes.def: Remove pass_materialize_all_clones. > > * tree-pass.h (make_pass_materialize_all_clones): Remove. > > > > [...] > > > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c > > index 05713c28cf0..1e2262789dd 100644 > > --- a/gcc/cgraphunit.c > > +++ b/gcc/cgraphunit.c > > @@ -2298,7 +2299,8 @@ cgraph_node::expand (void) > > bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/ > > > > update_ssa (TODO_update_ssa_only_virtuals); > > - execute_all_ipa_transforms (false); > > + if (ipa_transforms_to_apply.exists ()) > > + execute_all_ipa_transforms (false); > > > > Can some function not have ipa_inline among the transforms_to_apply? This is for the case of repeated execution. If you do get_body earlier transforms are already applied. Honza > > Martin
> Hi, > > On Thu, Oct 22 2020, Jan Hubicka wrote: > > Hi, > > this patch removes the pass to materialize all clones and instead this > > is now done on demand. The motivation is to reduce lifetime of function > > bodies in ltrans that should noticeably reduce memory use for highly > > parallel compilations of large programs (like Martin does) or with > > partitioning reduced/disabled. For cc1 with one partition the memory use > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not > > particularly accurate). > > > > Nice. Sadly this is only true w/o debug info. I collected memory usage stats at the end of the ltrans stage and it is as folloes - after streaming in global stream: 126M GGC and 41M heap - after streaming symbol table: 373M GGC and 92M heap - after stremaing in summaries: 394M GGC and 92M heap (only large summary seems to be ipa-cp transformation summary) - then compilation starts and memory goes slowly up to 3527M at the end of compilation The following accounts for more than 1% GGC: Time variable usr sys wall GGC ipa inlining heuristics : 6.99 ( 0%) 4.62 ( 1%) 11.17 ( 1%) 241M ( 1%) ipa lto gimple in : 50.04 ( 3%) 29.72 ( 7%) 80.22 ( 4%) 3129M ( 14%) ipa lto decl in : 0.79 ( 0%) 0.36 ( 0%) 1.15 ( 0%) 135M ( 1%) ipa lto cgraph I/O : 0.95 ( 0%) 0.20 ( 0%) 1.15 ( 0%) 269M ( 1%) cfg cleanup : 25.83 ( 2%) 2.52 ( 1%) 28.15 ( 1%) 154M ( 1%) df reg dead/unused notes : 24.08 ( 2%) 2.09 ( 1%) 26.77 ( 1%) 180M ( 1%) alias analysis : 16.94 ( 1%) 1.05 ( 0%) 17.71 ( 1%) 383M ( 2%) integration : 45.76 ( 3%) 44.30 ( 11%) 88.99 ( 5%) 2328M ( 10%) tree VRP : 41.38 ( 3%) 15.67 ( 4%) 57.71 ( 3%) 560M ( 2%) tree SSA rewrite : 6.71 ( 0%) 2.17 ( 1%) 8.96 ( 0%) 194M ( 1%) tree SSA incremental : 26.99 ( 2%) 8.23 ( 2%) 34.42 ( 2%) 144M ( 1%) tree operand scan : 65.34 ( 4%) 61.50 ( 15%) 127.02 ( 7%) 886M ( 4%) dominator optimization : 41.53 ( 3%) 13.56 ( 3%) 55.78 ( 3%) 407M ( 2%) tree split crit edges : 1.08 ( 0%) 0.65 ( 0%) 1.63 ( 0%) 127M ( 1%) tree PRE : 34.30 ( 2%) 14.52 ( 4%) 49.08 ( 3%) 337M ( 1%) tree code sinking : 2.92 ( 0%) 0.58 ( 0%) 3.51 ( 0%) 122M ( 1%) tree iv optimization : 6.71 ( 0%) 1.19 ( 0%) 8.46 ( 0%) 133M ( 1%) expand : 45.56 ( 3%) 8.24 ( 2%) 55.02 ( 3%) 1980M ( 9%) forward prop : 11.89 ( 1%) 1.39 ( 0%) 12.59 ( 1%) 130M ( 1%) dead store elim2 : 10.03 ( 1%) 0.70 ( 0%) 11.23 ( 1%) 138M ( 1%) loop init : 11.96 ( 1%) 4.95 ( 1%) 17.11 ( 1%) 378M ( 2%) CPROP : 22.63 ( 2%) 2.78 ( 1%) 25.19 ( 1%) 359M ( 2%) combiner : 41.39 ( 3%) 2.57 ( 1%) 43.30 ( 2%) 558M ( 2%) reload CSE regs : 22.38 ( 2%) 1.25 ( 0%) 23.06 ( 1%) 186M ( 1%) final : 32.33 ( 2%) 4.28 ( 1%) 36.75 ( 2%) 1105M ( 5%) symout : 49.04 ( 3%) 2.23 ( 1%) 52.33 ( 3%) 2517M ( 11%) var-tracking emit : 33.26 ( 2%) 1.02 ( 0%) 34.35 ( 2%) 582M ( 3%) rest of compilation : 38.05 ( 3%) 15.61 ( 4%) 52.42 ( 3%) 114M ( 1%) TOTAL :1486.02 408.79 1899.96 22512M We seem to leak some hashtables: dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc and hashmaps: ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc and hashsets: ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap and vectors: tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k However main problem is cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M -------------------------------------------------------------------------------------------------------------------------------------------- Clearly some function bodies leak - I will try to figure out what. But main problem is debug info. I guess debug info for whole cc1plus is large, but it would be nice if it was not in the garbage collector, for example :) Honza
On Fri, 23 Oct 2020, Jan Hubicka wrote: > > Hi, > > > > On Thu, Oct 22 2020, Jan Hubicka wrote: > > > Hi, > > > this patch removes the pass to materialize all clones and instead this > > > is now done on demand. The motivation is to reduce lifetime of function > > > bodies in ltrans that should noticeably reduce memory use for highly > > > parallel compilations of large programs (like Martin does) or with > > > partitioning reduced/disabled. For cc1 with one partition the memory use > > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not > > > particularly accurate). > > > > > > > Nice. > > Sadly this is only true w/o debug info. I collected memory usage stats > at the end of the ltrans stage and it is as folloes > > - after streaming in global stream: 126M GGC and 41M heap > - after streaming symbol table: 373M GGC and 92M heap > - after stremaing in summaries: 394M GGC and 92M heap > (only large summary seems to be ipa-cp transformation summary) > - then compilation starts and memory goes slowly up to 3527M at the end > of compilation > > The following accounts for more than 1% GGC: > > Time variable usr sys wall GGC > ipa inlining heuristics : 6.99 ( 0%) 4.62 ( 1%) 11.17 ( 1%) 241M ( 1%) > ipa lto gimple in : 50.04 ( 3%) 29.72 ( 7%) 80.22 ( 4%) 3129M ( 14%) > ipa lto decl in : 0.79 ( 0%) 0.36 ( 0%) 1.15 ( 0%) 135M ( 1%) > ipa lto cgraph I/O : 0.95 ( 0%) 0.20 ( 0%) 1.15 ( 0%) 269M ( 1%) > cfg cleanup : 25.83 ( 2%) 2.52 ( 1%) 28.15 ( 1%) 154M ( 1%) > df reg dead/unused notes : 24.08 ( 2%) 2.09 ( 1%) 26.77 ( 1%) 180M ( 1%) > alias analysis : 16.94 ( 1%) 1.05 ( 0%) 17.71 ( 1%) 383M ( 2%) > integration : 45.76 ( 3%) 44.30 ( 11%) 88.99 ( 5%) 2328M ( 10%) > tree VRP : 41.38 ( 3%) 15.67 ( 4%) 57.71 ( 3%) 560M ( 2%) > tree SSA rewrite : 6.71 ( 0%) 2.17 ( 1%) 8.96 ( 0%) 194M ( 1%) > tree SSA incremental : 26.99 ( 2%) 8.23 ( 2%) 34.42 ( 2%) 144M ( 1%) > tree operand scan : 65.34 ( 4%) 61.50 ( 15%) 127.02 ( 7%) 886M ( 4%) > dominator optimization : 41.53 ( 3%) 13.56 ( 3%) 55.78 ( 3%) 407M ( 2%) > tree split crit edges : 1.08 ( 0%) 0.65 ( 0%) 1.63 ( 0%) 127M ( 1%) > tree PRE : 34.30 ( 2%) 14.52 ( 4%) 49.08 ( 3%) 337M ( 1%) > tree code sinking : 2.92 ( 0%) 0.58 ( 0%) 3.51 ( 0%) 122M ( 1%) > tree iv optimization : 6.71 ( 0%) 1.19 ( 0%) 8.46 ( 0%) 133M ( 1%) > expand : 45.56 ( 3%) 8.24 ( 2%) 55.02 ( 3%) 1980M ( 9%) > forward prop : 11.89 ( 1%) 1.39 ( 0%) 12.59 ( 1%) 130M ( 1%) > dead store elim2 : 10.03 ( 1%) 0.70 ( 0%) 11.23 ( 1%) 138M ( 1%) > loop init : 11.96 ( 1%) 4.95 ( 1%) 17.11 ( 1%) 378M ( 2%) > CPROP : 22.63 ( 2%) 2.78 ( 1%) 25.19 ( 1%) 359M ( 2%) > combiner : 41.39 ( 3%) 2.57 ( 1%) 43.30 ( 2%) 558M ( 2%) > reload CSE regs : 22.38 ( 2%) 1.25 ( 0%) 23.06 ( 1%) 186M ( 1%) > final : 32.33 ( 2%) 4.28 ( 1%) 36.75 ( 2%) 1105M ( 5%) > symout : 49.04 ( 3%) 2.23 ( 1%) 52.33 ( 3%) 2517M ( 11%) > var-tracking emit : 33.26 ( 2%) 1.02 ( 0%) 34.35 ( 2%) 582M ( 3%) > rest of compilation : 38.05 ( 3%) 15.61 ( 4%) 52.42 ( 3%) 114M ( 1%) > TOTAL :1486.02 408.79 1899.96 22512M > > We seem to leak some hashtables: > dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc that one likely keeps quite some memory live... > cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap > tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc Hmm, so we do scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100); and scalar_evolution_info->empty (); scalar_evolution_info = NULL; to reclaim. ->empty () will IIRC at least allocate 7 elements which we the eventually should reclaim during a GC walk - I guess the hashtable statistics do not really handle GC reclaimed portions? If there's a friendlier way of releasing a GC allocated hash-tab we can switch to that. Note that in principle the hash-table doesn't need to be GC allocated but it needs to be walked since it refers to trees that might not be referenced in other ways. > and hashmaps: > ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap > tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap Similar as SCEV, probably mis-accounting? > alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc > ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc > dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc > > and hashsets: > ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap > ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap > tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap > > and vectors: > tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k Huh. It's an auto_vec<> > tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k > tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k > ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k > tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k > graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k > dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k > tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k > tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k > symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k > vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k Those all look OK to me, not sure why we even think there's a leak? > However main problem is > cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k > cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k > varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k > emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k > dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k > emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k > tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k > gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k > tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k > dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k > cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k > tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k > stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k > stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k > tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k > cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k > cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k > tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k > tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k > stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k > dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 > tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k > tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k > function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k > hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k > dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k > tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k > dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k > dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k > dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k > toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M > -------------------------------------------------------------------------------------------------------------------------------------------- > > Clearly some function bodies leak - I will try to figure out what. But > main problem is debug info. > I guess debug info for whole cc1plus is large, but it would be nice if > it was not in the garbage collector, for example :) Well, we're building a DIE tree for the whole unit here so I'm not sure what parts we can optimize. The structures may keep quite some stuff on the tree side live through the decl -> DIE and block -> DIE maps and the external_die_map used for LTO streaming (but if we lazily stream bodies we do need to keep this map ... unless we add some start/end-stream-body hooks and doing the map per function. But then we build the DIEs lazily as well so the query of the map is lazy :/) Richard.
> > We seem to leak some hashtables: > > dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc > > that one likely keeps quite some memory live... Yep, having in-memory dwaf2out for whole cc1plus eats a lot of memory quite naturally. > > > cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap > > tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc > > Hmm, so we do > > scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100); > > and > > scalar_evolution_info->empty (); > scalar_evolution_info = NULL; > > to reclaim. ->empty () will IIRC at least allocate 7 elements which we > the eventually should reclaim during a GC walk - I guess the hashtable > statistics do not really handle GC reclaimed portions? > > If there's a friendlier way of releasing a GC allocated hash-tab > we can switch to that. Note that in principle the hash-table doesn't > need to be GC allocated but it needs to be walked since it refers to > trees that might not be referenced in other ways. hashtable has destructor that does ggc_free, so i think ggc_delete is right way to free. > > > and hashmaps: > > ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap > > tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap > > Similar as SCEV, probably mis-accounting? > > > alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc > > ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc > > dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc > > > > and hashsets: > > ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap > > ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap > > tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap > > > > and vectors: > > tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k > > Huh. It's an auto_vec<> Hmm, those maybe gets miscounted, i will check. > > > tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k > > tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k > > ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k > > tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k > > graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k > > dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k > > tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k > > tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k > > symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k > > vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k Also I should annotate copy. > > Those all look OK to me, not sure why we even think there's a leak? I think we do not need to hold references anymore (perhaps for aliases - i will check). Also all function bodies should be freed by now. > > > However main problem is > > cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k > > cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k > > varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k > > emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k > > dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k > > emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k > > tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k > > gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k > > tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k > > dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k > > cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k I think it is bug to have fuction body at the end of compilation - will try to work out reason for that. > > tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k > > stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k > > stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k > > tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k > > cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k > > cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k > > tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k > > tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k > > stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k > > dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 > > tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k > > tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k > > function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k > > hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k > > dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k > > tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k > > dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k > > dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k > > dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k > > toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 > > -------------------------------------------------------------------------------------------------------------------------------------------- > > GGC memory Leak Garbage Freed Overhead Times > > -------------------------------------------------------------------------------------------------------------------------------------------- > > Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > Clearly some function bodies leak - I will try to figure out what. But > > main problem is debug info. > > I guess debug info for whole cc1plus is large, but it would be nice if > > it was not in the garbage collector, for example :) > > Well, we're building a DIE tree for the whole unit here so I'm not sure > what parts we can optimize. The structures may keep quite some stuff > on the tree side live through the decl -> DIE and block -> DIE maps > and the external_die_map used for LTO streaming (but if we lazily stream > bodies we do need to keep this map ... unless we add some > start/end-stream-body hooks and doing the map per function. But then > we build the DIEs lazily as well so the query of the map is lazy :/) Yep, not sure how much we could do here. Of course ggc_collect when invoked will do quite a lot of walking to discover relatively few tree references, but not sure if that can be solved by custom marking or so. Hona > > Richard. > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imend
On Mon, 26 Oct 2020, Jan Hubicka wrote: > > > We seem to leak some hashtables: > > > dwarf2out.c:28850 (dwarf2out_init) 31M: 23.8% 47M 19 : 0.0% ggc > > > > that one likely keeps quite some memory live... > > Yep, having in-memory dwaf2out for whole cc1plus eats a lot of memory > quite naturally. OTOH the late debug shouldn't be so big ... > > > > > cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap > > > tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc > > > > Hmm, so we do > > > > scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100); > > > > and > > > > scalar_evolution_info->empty (); > > scalar_evolution_info = NULL; > > > > to reclaim. ->empty () will IIRC at least allocate 7 elements which we > > the eventually should reclaim during a GC walk - I guess the hashtable > > statistics do not really handle GC reclaimed portions? > > > > If there's a friendlier way of releasing a GC allocated hash-tab > > we can switch to that. Note that in principle the hash-table doesn't > > need to be GC allocated but it needs to be walked since it refers to > > trees that might not be referenced in other ways. > > hashtable has destructor that does ggc_free, so i think ggc_delete is > right way to free. Can you try if that helps? As said, in the end it's probably miscountings in the stats. > > > > > and hashmaps: > > > ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap > > > tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap > > > > Similar as SCEV, probably mis-accounting? > > > > > alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc > > > ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc > > > dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc > > > > > > and hashsets: > > > ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap > > > ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap > > > tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap > > > > > > and vectors: > > > tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k > > > > Huh. It's an auto_vec<> > > Hmm, those maybe gets miscounted, i will check. > > > > > tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k > > > tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k > > > ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k > > > tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k > > > graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k > > > dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k > > > tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k > > > tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k > > > symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k > > > vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k > > Also I should annotate copy. Yeah, some missing annotations might cause issues. > > > > Those all look OK to me, not sure why we even think there's a leak? > > I think we do not need to hold references anymore (perhaps for aliases - > i will check). Also all function bodies should be freed by now. > > > > > However main problem is > > > cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k > > > cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k > > > varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k > > > emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k > > > dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k > > > emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k > > > tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k > > > gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k > > > tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k > > > dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k > > > cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k > I think it is bug to have fuction body at the end of compilation - will > try to work out reason for that. > > > tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k > > > stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k > > > stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k > > > tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k > > > cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k > > > cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k > > > tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k > > > tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k > > > stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k > > > dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 > > > tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k > > > tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k > > > function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k > > > hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k > > > dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k > > > tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k > > > dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k > > > dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k > > > dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k > > > toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > GGC memory Leak Garbage Freed Overhead Times > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > Clearly some function bodies leak - I will try to figure out what. But > > > main problem is debug info. > > > I guess debug info for whole cc1plus is large, but it would be nice if > > > it was not in the garbage collector, for example :) > > > > Well, we're building a DIE tree for the whole unit here so I'm not sure > > what parts we can optimize. The structures may keep quite some stuff > > on the tree side live through the decl -> DIE and block -> DIE maps > > and the external_die_map used for LTO streaming (but if we lazily stream > > bodies we do need to keep this map ... unless we add some > > start/end-stream-body hooks and doing the map per function. But then > > we build the DIEs lazily as well so the query of the map is lazy :/) > > Yep, not sure how much we could do here. Of course ggc_collect when > invoked will do quite a lot of walking to discover relatively few tree > references, but not sure if that can be solved by custom marking or so. In principle the late DIE creation code can remove entries from the external_die_map map, but not sure how much that helps (might also cause re-allocation of it if we shrink it). It might help quite a bit for references to BLOCKs. Maybe you can try the following simple patch ... diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index ba93a6c3d81..350cc5d443c 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl) const char *sym = desc->sym; unsigned HOST_WIDE_INT off = desc->off; + external_die_map->remove (decl); in_lto_p = false; dw_die_ref die = (TREE_CODE (decl) == BLOCK > Hona > > > > Richard. > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imend >
> > > > > > > cselib.c:3137 (cselib_init) 34M: 25.9% 34M 1514k: 17.3% heap > > > > tree-scalar-evolution.c:2984 (scev_initialize) 37M: 27.6% 50M 228k: 2.6% ggc > > > > > > Hmm, so we do > > > > > > scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100); > > > > > > and > > > > > > scalar_evolution_info->empty (); > > > scalar_evolution_info = NULL; > > > > > > to reclaim. ->empty () will IIRC at least allocate 7 elements which we > > > the eventually should reclaim during a GC walk - I guess the hashtable > > > statistics do not really handle GC reclaimed portions? > > > > > > If there's a friendlier way of releasing a GC allocated hash-tab > > > we can switch to that. Note that in principle the hash-table doesn't > > > need to be GC allocated but it needs to be walked since it refers to > > > trees that might not be referenced in other ways. > > > > hashtable has destructor that does ggc_free, so i think ggc_delete is > > right way to free. > > Can you try if that helps? As said, in the end it's probably > miscountings in the stats. I do not think we are miscounting here. empty () really allocates small hashtable and leaves it alone. It should be ggc_delete. I will test it. > > > > > > > > and hashmaps: > > > > ipa-reference.c:1133 (ipa_reference_read_optimiz 2047k: 3.0% 3071k 9 : 0.0% heap > > > > tree-ssa.c:60 (redirect_edge_var_map_add) 4125k: 6.1% 4126k 8190 : 0.1% heap > > > > > > Similar as SCEV, probably mis-accounting? > > > > > > > alias.c:1200 (record_alias_subset) 4510k: 6.6% 4510k 4546 : 0.0% ggc > > > > ipa-prop.h:986 (ipcp_transformation_t) 8191k: 12.0% 11M 16 : 0.0% ggc > > > > dwarf2out.c:5957 (dwarf2out_register_external_di 47M: 72.2% 71M 12 : 0.0% ggc > > > > > > > > and hashsets: > > > > ipa-devirt.c:3093 (possible_polymorphic_call_tar 15k: 0.9% 23k 8 : 0.0% heap > > > > ipa-devirt.c:1599 (add_type_duplicate) 412k: 22.2% 412k 4065 : 0.0% heap > > > > tree-ssa-threadbackward.c:40 (thread_jumps) 1432k: 77.0% 1433k 119k: 0.8% heap > > > > > > > > and vectors: > > > > tree-ssa-structalias.c:5783 (push_fields_onto_fi 8 847k: 0.3% 976k 475621: 0.8% 17k 24k > > > > > > Huh. It's an auto_vec<> > > > > Hmm, those maybe gets miscounted, i will check. > > > > > > > tree-ssa-pre.c:334 (alloc_expression_id) 48 1125k: 0.4% 1187k 198336: 0.3% 23k 34k > > > > tree-into-ssa.c:1787 (register_new_update_single 8 1196k: 0.5% 1264k 380385: 0.6% 24k 36k > > > > ggc-page.c:1264 (add_finalizer) 8 1232k: 0.5% 1848k 43: 0.0% 77k 81k > > > > tree-ssa-structalias.c:1609 (topo_visit) 8 1302k: 0.5% 1328k 892964: 1.4% 27k 33k > > > > graphds.c:254 (graphds_dfs) 4 1469k: 0.6% 1675k 2101780: 3.4% 30k 34k > > > > dominance.c:955 (get_dominated_to_depth) 8 2251k: 0.9% 2266k 685140: 1.1% 46k 50k > > > > tree-ssa-structalias.c:410 (new_var_info) 32 2264k: 0.9% 2341k 330758: 0.5% 47k 63k > > > > tree-ssa-structalias.c:3104 (process_constraint) 48 2376k: 0.9% 2606k 405451: 0.7% 49k 83k > > > > symtab.c:612 (create_reference) 8 3314k: 1.3% 4897k 75213: 0.1% 414k 612k > > > > vec.h:1734 (copy) 48 233M:90.5% 234M 6243163:10.1% 4982k 5003k > > > > Also I should annotate copy. > > Yeah, some missing annotations might cause issues. It will only let us to see who copies the vectors ;) auto_vecs I think are special since we may manage to miscount the pre-allocated space. I will look into that. > > > > > > Well, we're building a DIE tree for the whole unit here so I'm not sure > > > what parts we can optimize. The structures may keep quite some stuff > > > on the tree side live through the decl -> DIE and block -> DIE maps > > > and the external_die_map used for LTO streaming (but if we lazily stream > > > bodies we do need to keep this map ... unless we add some > > > start/end-stream-body hooks and doing the map per function. But then > > > we build the DIEs lazily as well so the query of the map is lazy :/) > > > > Yep, not sure how much we could do here. Of course ggc_collect when > > invoked will do quite a lot of walking to discover relatively few tree > > references, but not sure if that can be solved by custom marking or so. > > In principle the late DIE creation code can remove entries from the > external_die_map map, but not sure how much that helps (might also > cause re-allocation of it if we shrink it). It might help quite a bit > for references to BLOCKs. Maybe you can try the following simple > patch ... > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index ba93a6c3d81..350cc5d443c 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl) > > const char *sym = desc->sym; > unsigned HOST_WIDE_INT off = desc->off; > + external_die_map->remove (decl); > > in_lto_p = false; > dw_die_ref die = (TREE_CODE (decl) == BLOCK I will give it a try. Thanks! I think shrinking hashtables is not much of a fear here: it happens lazilly either at ggc_collect (that is desirable) or when hashtable is walked (which is amortized by the walk) Honza > > > > > Hona > > > > > > Richard. > > > > > > -- > > > Richard Biener <rguenther@suse.de> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > > Germany; GF: Felix Imend > > > > -- > Richard Biener <rguenther@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imend
> > > > However main problem is > > > > cfg.c:202 (connect_src) 5745k: 0.2% 271M: 1.9% 1754k: 0.0% 1132k: 0.2% 7026k > > > > cfg.c:212 (connect_dest) 6307k: 0.2% 281M: 2.0% 10129k: 0.2% 2490k: 0.5% 7172k > > > > varasm.c:3359 (build_constant_desc) 7387k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k > > > > emit-rtl.c:486 (gen_raw_REG) 7799k: 0.2% 215M: 1.5% 96 : 0.0% 0 : 0.0% 9502k > > > > dwarf2cfi.c:2341 (add_cfis_to_fde) 8027k: 0.2% 0 : 0.0% 4906k: 0.1% 1405k: 0.3% 78k > > > > emit-rtl.c:4074 (make_jump_insn_raw) 8239k: 0.2% 93M: 0.7% 0 : 0.0% 0 : 0.0% 1442k > > > > tree-ssanames.c:308 (make_ssa_name_fn) 9130k: 0.2% 456M: 3.3% 0 : 0.0% 0 : 0.0% 6622k > > > > gimple.c:1808 (gimple_copy) 9508k: 0.3% 524M: 3.7% 8609k: 0.2% 2972k: 0.6% 7135k > > > > tree-inline.c:4879 (expand_call_inline) 9590k: 0.3% 21M: 0.2% 0 : 0.0% 0 : 0.0% 328k > > > > dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 444k > > > > cfg.c:266 (unchecked_make_edge) 10M: 0.3% 60M: 0.4% 355M: 6.8% 0 : 0.0% 9083k > > I think it is bug to have fuction body at the end of compilation - will > > try to work out reason for that. > > > > tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2313k: 0.0% 0 : 0.0% 0 : 0.0% 548k > > > > stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7055k: 0.0% 0 : 0.0% 2270k: 0.5% 588k > > > > stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 588k > > > > tree-phinodes.c:119 (allocate_phi_node) 11M: 0.3% 153M: 1.1% 0 : 0.0% 3539k: 0.7% 340k > > > > cgraph.c:289 (create_empty) 12M: 0.3% 0 : 0.0% 109M: 2.1% 0 : 0.0% 371k > > > > cfg.c:127 (alloc_block) 14M: 0.4% 705M: 5.0% 0 : 0.0% 0 : 0.0% 7086k > > > > tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.6% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k > > > > tree-inline.c:834 (remap_block) 28M: 0.8% 159M: 1.1% 0 : 0.0% 0 : 0.0% 2009k > > > > stringpool.c:79 (ggc_alloc_string) 28M: 0.8% 5619k: 0.0% 0 : 0.0% 6658k: 1.4% 1785k > > > > dwarf2out.c:11727 (add_ranges_num) 32M: 0.9% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 > > > > tree-inline.c:5942 (copy_decl_to_var) 39M: 1.1% 51M: 0.4% 0 : 0.0% 0 : 0.0% 646k > > > > tree-inline.c:5994 (copy_decl_no_change) 78M: 2.1% 270M: 1.9% 0 : 0.0% 0 : 0.0% 2497k > > > > function.c:4438 (reorder_blocks_1) 96M: 2.6% 101M: 0.7% 0 : 0.0% 0 : 0.0% 2109k > > > > hash-table.h:802 (expand) 142M: 3.9% 18M: 0.1% 198M: 3.8% 32M: 6.9% 38k > > > > dwarf2out.c:10086 (new_loc_list) 219M: 6.0% 11M: 0.1% 0 : 0.0% 0 : 0.0% 2955k > > > > tree-streamer-in.c:637 (streamer_alloc_tree) 379M: 10.3% 426M: 3.0% 0 : 0.0% 4201k: 0.9% 9828k > > > > dwarf2out.c:5702 (new_die_raw) 434M: 11.8% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5556k > > > > dwarf2out.c:1383 (new_loc_descr) 519M: 14.1% 12M: 0.1% 2880 : 0.0% 0 : 0.0% 6812k > > > > dwarf2out.c:4420 (add_dwarf_attr) 640M: 17.4% 0 : 0.0% 94M: 1.8% 4584k: 1.0% 3877k > > > > toplev.c:906 (realloc_for_line_map) 768M: 20.8% 0 : 0.0% 767M: 14.6% 255M: 54.4% 33 > > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > GGC memory Leak Garbage Freed Overhead Times > > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > Total 3689M:100.0% 14039M:100.0% 5254M:100.0% 470M:100.0% 391M > > > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > > > Clearly some function bodies leak - I will try to figure out what. But > > > > main problem is debug info. > > > > I guess debug info for whole cc1plus is large, but it would be nice if > > > > it was not in the garbage collector, for example :) > > > > > > Well, we're building a DIE tree for the whole unit here so I'm not sure > > > what parts we can optimize. The structures may keep quite some stuff > > > on the tree side live through the decl -> DIE and block -> DIE maps > > > and the external_die_map used for LTO streaming (but if we lazily stream > > > bodies we do need to keep this map ... unless we add some > > > start/end-stream-body hooks and doing the map per function. But then > > > we build the DIEs lazily as well so the query of the map is lazy :/) > > > > Yep, not sure how much we could do here. Of course ggc_collect when > > invoked will do quite a lot of walking to discover relatively few tree > > references, but not sure if that can be solved by custom marking or so. > > In principle the late DIE creation code can remove entries from the > external_die_map map, but not sure how much that helps (might also > cause re-allocation of it if we shrink it). It might help quite a bit > for references to BLOCKs. Maybe you can try the following simple > patch ... > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index ba93a6c3d81..350cc5d443c 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl) > > const char *sym = desc->sym; > unsigned HOST_WIDE_INT off = desc->off; > + external_die_map->remove (decl); > > in_lto_p = false; > dw_die_ref die = (TREE_CODE (decl) == BLOCK Updated stats are: ipa-devirt.c:1950 (get_odr_type) 385k: 0.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 7044 emit-rtl.c:4117 (make_note_raw) 396k: 0.0% 986M: 6.8% 0 : 0.0% 0 : 0.0% 17M lto-cgraph.c:1983 (input_node_opt_summary) 524k: 0.0% 18M: 0.1% 313k: 0.0% 1012k: 0.2% 124k tree-inline.c:4883 (expand_call_inline) 526k: 0.0% 30M: 0.2% 0 : 0.0% 0 : 0.0% 329k gimple.c:1822 (gimple_copy) 527k: 0.0% 536M: 3.7% 8631k: 0.2% 2997k: 0.6% 7174k emit-rtl.c:2703 (gen_label_rtx) 532k: 0.0% 76M: 0.5% 0 : 0.0% 0 : 0.0% 1232k ipa-modref-tree.h:154 (insert_access) 592k: 0.0% 0 : 0.0% 4052k: 0.1% 7192 : 0.0% 26k cfg.c:202 (connect_src) 617k: 0.0% 277M: 1.9% 1755k: 0.0% 1133k: 0.2% 7053k tree-ssanames.c:308 (make_ssa_name_fn) 627k: 0.0% 466M: 3.2% 0 : 0.0% 0 : 0.0% 6642k tree.c:7887 (build_pointer_type_for_mode) 635k: 0.0% 1094k: 0.0% 0 : 0.0% 0 : 0.0% 10k cgraph.c:1989 (rtl_info) 661k: 0.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 27k cfg.c:212 (connect_dest) 698k: 0.0% 287M: 2.0% 10181k: 0.2% 2490k: 0.5% 7200k symbol-summary.h:108 (allocate_new) 736k: 0.0% 0 : 0.0% 8663k: 0.2% 0 : 0.0% 391k varpool.c:137 (create_empty) 746k: 0.0% 0 : 0.0% 6257k: 0.1% 0 : 0.0% 54k varasm.c:1513 (make_decl_rtl) 834k: 0.0% 866k: 0.0% 0 : 0.0% 0 : 0.0% 70k emit-rtl.c:4074 (make_jump_insn_raw) 913k: 0.0% 100M: 0.7% 0 : 0.0% 0 : 0.0% 1448k tree-phinodes.c:119 (allocate_phi_node) 943k: 0.0% 164M: 1.1% 0 : 0.0% 3563k: 0.7% 343k emit-rtl.c:386 (set_mem_attrs) 982k: 0.0% 171M: 1.2% 0 : 0.0% 0 : 0.0% 4413k tree.c:1311 (build_new_int_cst) 1080k: 0.0% 838k: 0.0% 66M: 1.3% 0 : 0.0% 2188k langhooks.c:664 (build_builtin_function) 1125k: 0.0% 137k: 0.0% 0 : 0.0% 170k: 0.0% 4367 emit-rtl.c:486 (gen_raw_REG) 1158k: 0.0% 221M: 1.5% 96 : 0.0% 0 : 0.0% 9517k cfg.c:266 (unchecked_make_edge) 1179k: 0.0% 69M: 0.5% 356M: 6.8% 0 : 0.0% 9119k varasm.c:3350 (build_constant_desc) 1232k: 0.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k varasm.c:3397 (build_constant_desc) 1232k: 0.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k tree.c:1497 (cache_wide_int_in_type_cache) 1342k: 0.0% 44k: 0.0% 0 : 0.0% 3184 : 0.0% 18k cfg.c:127 (alloc_block) 1597k: 0.0% 720M: 5.0% 0 : 0.0% 0 : 0.0% 7113k tree-inline.c:837 (remap_block) 1738k: 0.1% 187M: 1.3% 0 : 0.0% 0 : 0.0% 2016k dwarf2out.c:15872 (mem_loc_descriptor) 2048k: 0.1% 0 : 0.0% 1531k: 0.0% 512 : 0.0% 10 emit-rtl.c:856 (gen_rtx_MEM) 2138k: 0.1% 297M: 2.1% 0 : 0.0% 0 : 0.0% 12M symtab.c:596 (create_reference) 2486k: 0.1% 0 : 0.0% 44M: 0.8% 341k: 0.1% 192k tree-inline.c:5038 (expand_call_inline) 2687k: 0.1% 0 : 0.0% 2434k: 0.0% 15k: 0.0% 6432 dwarf2out.c:1028 (dwarf2out_alloc_current_fde) 3084k: 0.1% 0 : 0.0% 0 : 0.0% 0 : 0.0% 27k ipa-prop.c:5276 (read_ipcp_transformation_info) 3549k: 0.1% 34k: 0.0% 0 : 0.0% 737k: 0.1% 6508 alias.c:1200 (record_alias_subset) 4712k: 0.1% 0 : 0.0% 3096 : 0.0% 36k: 0.0% 4679 tree.c:2264 (build_string) 5163k: 0.2% 1782k: 0.0% 0 : 0.0% 652k: 0.1% 115k function.c:4438 (reorder_blocks_1) 5470k: 0.2% 193M: 1.3% 0 : 0.0% 0 : 0.0% 2121k varasm.c:3359 (build_constant_desc) 7393k: 0.2% 0 : 0.0% 0 : 0.0% 0 : 0.0% 51k dwarf2cfi.c:2341 (add_cfis_to_fde) 8078k: 0.2% 0 : 0.0% 4933k: 0.1% 1417k: 0.3% 78k dwarf2cfi.c:418 (new_cfi) 10M: 0.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 447k stringpool.c:63 (alloc_node) 10M: 0.3% 12M: 0.1% 0 : 0.0% 0 : 0.0% 591k tree.c:1642 (wide_int_to_tree_1) 10M: 0.3% 2375k: 0.0% 0 : 0.0% 0 : 0.0% 549k stringpool.c:41 (stringpool_ggc_alloc) 10M: 0.3% 7328k: 0.0% 0 : 0.0% 2279k: 0.5% 591k cgraph.c:290 (create_empty) 11M: 0.3% 0 : 0.0% 96M: 1.8% 0 : 0.0% 372k tree-inline.c:5946 (copy_decl_to_var) 16M: 0.5% 74M: 0.5% 0 : 0.0% 0 : 0.0% 647k tree-streamer-in.c:558 (streamer_read_tree_bitfi 22M: 0.7% 13k: 0.0% 0 : 0.0% 22k: 0.0% 64k stringpool.c:79 (ggc_alloc_string) 27M: 0.8% 7321k: 0.0% 0 : 0.0% 6640k: 1.3% 1784k dwarf2out.c:11728 (add_ranges_num) 32M: 1.0% 0 : 0.0% 32M: 0.6% 144 : 0.0% 20 tree-inline.c:5998 (copy_decl_no_change) 34M: 1.0% 315M: 2.2% 0 : 0.0% 0 : 0.0% 2504k hash-table.h:802 (expand) 142M: 4.3% 10M: 0.1% 185M: 3.5% 32M: 6.6% 29k dwarf2out.c:10087 (new_loc_list) 199M: 6.0% 9350k: 0.1% 0 : 0.0% 0 : 0.0% 2666k tree-streamer-in.c:637 (streamer_alloc_tree) 315M: 9.5% 491M: 3.4% 0 : 0.0% 4243k: 0.8% 9820k dwarf2out.c:5702 (new_die_raw) 412M: 12.4% 0 : 0.0% 0 : 0.0% 0 : 0.0% 5285k dwarf2out.c:1383 (new_loc_descr) 480M: 14.4% 9653k: 0.1% 2880 : 0.0% 0 : 0.0% 6265k dwarf2out.c:4420 (add_dwarf_attr) 750M: 22.5% 0 : 0.0% 94M: 1.8% 13M: 2.7% 3891k toplev.c:906 (realloc_for_line_map) 768M: 23.0% 0 : 0.0% 767M: 14.6% 255M: 52.3% 33 -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 3332M:100.0% 14432M:100.0% 5267M:100.0% 489M:100.0% 389M -------------------------------------------------------------------------------------------------------------------------------------------- So it seems there is a reduction from 3.6G to 3.3G Honza
diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 9480935ff84..35a0182b847 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -3872,7 +3872,7 @@ cgraph_node::function_or_virtual_thunk_symbol } /* When doing LTO, read cgraph_node's body from disk if it is not already - present. */ + present. Also perform any necessary clone materializations. */ bool cgraph_node::get_untransformed_body (void) @@ -3882,6 +3882,17 @@ cgraph_node::get_untransformed_body (void) size_t len; tree decl = this->decl; + /* See if there is clone to be materialized. + (inline clones does not need materialization, but we can be seeing + an inline clone of real clone). */ + cgraph_node *p = this; + for (cgraph_node *c = clone_of; c; c = c->clone_of) + { + if (c->decl != decl) + p->materialize_clone (); + p = c; + } + /* Check if body is already there. Either we have gimple body or the function is thunk and in that case we set DECL_ARGUMENTS. */ if (DECL_ARGUMENTS (decl) || gimple_has_body_p (decl)) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index c953a1b6711..d3279410c2e 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1152,6 +1152,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node apply them. */ bool get_body (void); + void materialize_clone (void); + /* Release memory used to represent body of function. Use this only for functions that are released before being translated to target code (i.e. RTL). Functions that are compiled to RTL and beyond @@ -2286,13 +2288,6 @@ public: functions inserted into callgraph already at construction time. */ void process_new_functions (void); - /* Once all functions from compilation unit are in memory, produce all clones - and update all calls. We might also do this on demand if we don't want to - bring all functions to memory prior compilation, but current WHOPR - implementation does that and it is bit easier to keep everything right - in this order. */ - void materialize_all_clones (void); - /* Register a symbol NODE. */ inline void register_symbol (symtab_node *node); diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index f920dcb4c29..07a51a58aef 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -1083,114 +1083,57 @@ void cgraph_node::remove_from_clone_tree () /* Given virtual clone, turn it into actual clone. */ -static void -cgraph_materialize_clone (cgraph_node *node) -{ - bitmap_obstack_initialize (NULL); - node->former_clone_of = node->clone_of->decl; - if (node->clone_of->former_clone_of) - node->former_clone_of = node->clone_of->former_clone_of; - /* Copy the OLD_VERSION_NODE function tree to the new version. */ - tree_function_versioning (node->clone_of->decl, node->decl, - node->clone.tree_map, node->clone.param_adjustments, - true, NULL, NULL); - if (symtab->dump_file) - { - dump_function_to_file (node->clone_of->decl, symtab->dump_file, - dump_flags); - dump_function_to_file (node->decl, symtab->dump_file, dump_flags); - } - - cgraph_node *clone_of = node->clone_of; - /* Function is no longer clone. */ - node->remove_from_clone_tree (); - if (!clone_of->analyzed && !clone_of->clones) - { - clone_of->release_body (); - clone_of->remove_callees (); - clone_of->remove_all_references (); - } - bitmap_obstack_release (NULL); -} - -/* Once all functions from compilation unit are in memory, produce all clones - and update all calls. We might also do this on demand if we don't want to - bring all functions to memory prior compilation, but current WHOPR - implementation does that and it is a bit easier to keep everything right in - this order. */ - void -symbol_table::materialize_all_clones (void) +cgraph_node::materialize_clone () { - cgraph_node *node; - bool stabilized = false; - - + clone_of->get_untransformed_body (); + former_clone_of = clone_of->decl; + if (clone_of->former_clone_of) + former_clone_of = clone_of->former_clone_of; if (symtab->dump_file) - fprintf (symtab->dump_file, "Materializing clones\n"); - - cgraph_node::checking_verify_cgraph_nodes (); - - /* We can also do topological order, but number of iterations should be - bounded by number of IPA passes since single IPA pass is probably not - going to create clones of clones it created itself. */ - while (!stabilized) { - stabilized = true; - FOR_EACH_FUNCTION (node) + fprintf (symtab->dump_file, "cloning %s to %s\n", + clone_of->dump_name (), + dump_name ()); + if (clone.tree_map) { - if (node->clone_of && node->decl != node->clone_of->decl - && !gimple_has_body_p (node->decl)) + fprintf (symtab->dump_file, " replace map:"); + for (unsigned int i = 0; + i < vec_safe_length (clone.tree_map); + i++) { - if (!node->clone_of->clone_of) - node->clone_of->get_untransformed_body (); - if (gimple_has_body_p (node->clone_of->decl)) - { - if (symtab->dump_file) - { - fprintf (symtab->dump_file, "cloning %s to %s\n", - node->clone_of->dump_name (), - node->dump_name ()); - if (node->clone.tree_map) - { - unsigned int i; - fprintf (symtab->dump_file, " replace map:"); - for (i = 0; - i < vec_safe_length (node->clone.tree_map); - i++) - { - ipa_replace_map *replace_info; - replace_info = (*node->clone.tree_map)[i]; - fprintf (symtab->dump_file, "%s %i -> ", - i ? "," : "", replace_info->parm_num); - print_generic_expr (symtab->dump_file, - replace_info->new_tree); - } - fprintf (symtab->dump_file, "\n"); - } - if (node->clone.param_adjustments) - node->clone.param_adjustments->dump (symtab->dump_file); - } - cgraph_materialize_clone (node); - stabilized = false; - } + ipa_replace_map *replace_info; + replace_info = (*clone.tree_map)[i]; + fprintf (symtab->dump_file, "%s %i -> ", + i ? "," : "", replace_info->parm_num); + print_generic_expr (symtab->dump_file, + replace_info->new_tree); } + fprintf (symtab->dump_file, "\n"); } + if (clone.param_adjustments) + clone.param_adjustments->dump (symtab->dump_file); } - FOR_EACH_FUNCTION (node) - if (!node->analyzed && node->callees) - { - node->remove_callees (); - node->remove_all_references (); - } - else - node->clear_stmts_in_references (); + /* Copy the OLD_VERSION_NODE function tree to the new version. */ + tree_function_versioning (clone_of->decl, decl, + clone.tree_map, clone.param_adjustments, + true, NULL, NULL); if (symtab->dump_file) - fprintf (symtab->dump_file, "Materialization Call site updates done.\n"); - - cgraph_node::checking_verify_cgraph_nodes (); + { + dump_function_to_file (clone_of->decl, symtab->dump_file, + dump_flags); + dump_function_to_file (decl, symtab->dump_file, dump_flags); + } - symtab->remove_unreachable_nodes (symtab->dump_file); + cgraph_node *this_clone_of = clone_of; + /* Function is no longer clone. */ + remove_from_clone_tree (); + if (!this_clone_of->analyzed && !this_clone_of->clones) + { + this_clone_of->release_body (); + this_clone_of->remove_callees (); + this_clone_of->remove_all_references (); + } } #include "gt-cgraphclones.h" diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 05713c28cf0..1e2262789dd 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1601,6 +1601,7 @@ mark_functions_to_output (void) FOR_EACH_FUNCTION (node) { tree decl = node->decl; + node->clear_stmts_in_references (); gcc_assert (!node->process || node->same_comdat_group); if (node->process) @@ -2274,6 +2275,9 @@ cgraph_node::expand (void) announce_function (decl); process = 0; gcc_assert (lowered); + + /* Initialize the default bitmap obstack. */ + bitmap_obstack_initialize (NULL); get_untransformed_body (); /* Generate RTL for the body of DECL. */ @@ -2282,9 +2286,6 @@ cgraph_node::expand (void) gcc_assert (symtab->global_info_ready); - /* Initialize the default bitmap obstack. */ - bitmap_obstack_initialize (NULL); - /* Initialize the RTL code for the function. */ saved_loc = input_location; input_location = DECL_SOURCE_LOCATION (decl); @@ -2298,7 +2299,8 @@ cgraph_node::expand (void) bitmap_obstack_initialize (®_obstack); /* FIXME, only at RTL generation*/ update_ssa (TODO_update_ssa_only_virtuals); - execute_all_ipa_transforms (false); + if (ipa_transforms_to_apply.exists ()) + execute_all_ipa_transforms (false); /* Perform all tree transforms and optimizations. */ diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c index af2c2856aaa..f419df04961 100644 --- a/gcc/ipa-inline-transform.c +++ b/gcc/ipa-inline-transform.c @@ -644,16 +644,16 @@ save_inline_function_body (struct cgraph_node *node) tree_function_versioning (node->decl, first_clone->decl, NULL, NULL, true, NULL, NULL); - /* The function will be short lived and removed after we inline all the clones, - but make it internal so we won't confuse ourself. */ + /* The function will be short lived and removed after we inline all the + clones, but make it internal so we won't confuse ourself. */ DECL_EXTERNAL (first_clone->decl) = 0; TREE_PUBLIC (first_clone->decl) = 0; DECL_COMDAT (first_clone->decl) = 0; first_clone->ipa_transforms_to_apply.release (); /* When doing recursive inlining, the clone may become unnecessary. - This is possible i.e. in the case when the recursive function is proved to be - non-throwing and the recursion happens only in the EH landing pad. + This is possible i.e. in the case when the recursive function is proved to + be non-throwing and the recursion happens only in the EH landing pad. We cannot remove the clone until we are done with saving the body. Remove it now. */ if (!first_clone->callers) @@ -696,6 +696,14 @@ inline_transform (struct cgraph_node *node) if (cfun->after_inlining) return 0; + cgraph_node *next_clone; + for (cgraph_node *n = node->clones; n; n = next_clone) + { + next_clone = n->next_sibling_clone; + if (n->decl != node->decl) + n->materialize_clone (); + } + /* We might need the body of this function so that we can expand it inline somewhere else. */ if (preserve_function_body_p (node)) diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c index 5fc0de56556..438f4bd5a68 100644 --- a/gcc/ipa-param-manipulation.c +++ b/gcc/ipa-param-manipulation.c @@ -783,6 +783,13 @@ ipa_param_adjustments::modify_call (gcall *stmt, { vec<tree, va_gc> **debug_args = NULL; unsigned i = 0; + cgraph_node *callee_node = cgraph_node::get (callee_decl); + + /* FIXME: we don't seem to be able to insert debug args before clone + is materialized. Materializing them early leads to extra memory + use. */ + if (callee_node->clone_of) + callee_node->get_untransformed_body (); for (tree old_parm = DECL_ARGUMENTS (old_decl); old_parm && i < old_nargs && ((int) i) < m_always_copy_start; old_parm = DECL_CHAIN (old_parm), i++) diff --git a/gcc/ipa.c b/gcc/ipa.c index 288b58cf73d..ab7256d857f 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -1386,43 +1386,3 @@ make_pass_ipa_single_use (gcc::context *ctxt) return new pass_ipa_single_use (ctxt); } -/* Materialize all clones. */ - -namespace { - -const pass_data pass_data_materialize_all_clones = -{ - SIMPLE_IPA_PASS, /* type */ - "materialize-all-clones", /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_IPA_OPT, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - 0, /* todo_flags_finish */ -}; - -class pass_materialize_all_clones : public simple_ipa_opt_pass -{ -public: - pass_materialize_all_clones (gcc::context *ctxt) - : simple_ipa_opt_pass (pass_data_materialize_all_clones, ctxt) - {} - - /* opt_pass methods: */ - virtual unsigned int execute (function *) - { - symtab->materialize_all_clones (); - return 0; - } - -}; // class pass_materialize_all_clones - -} // anon namespace - -simple_ipa_opt_pass * -make_pass_materialize_all_clones (gcc::context *ctxt) -{ - return new pass_materialize_all_clones (ctxt); -} diff --git a/gcc/passes.c b/gcc/passes.c index 6ff31ec37d7..1942b7cd1c3 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2271,6 +2271,14 @@ execute_all_ipa_transforms (bool do_not_collect) return; node = cgraph_node::get (current_function_decl); + cgraph_node *next_clone; + for (cgraph_node *n = node->clones; n; n = next_clone) + { + next_clone = n->next_sibling_clone; + if (n->decl != node->decl) + n->materialize_clone (); + } + if (node->ipa_transforms_to_apply.exists ()) { unsigned int i; diff --git a/gcc/passes.def b/gcc/passes.def index f865bdc19ac..cf15d8eafca 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -172,7 +172,6 @@ along with GCC; see the file COPYING3. If not see passes are executed after partitioning and thus see just parts of the compiled unit. */ INSERT_PASSES_AFTER (all_late_ipa_passes) - NEXT_PASS (pass_materialize_all_clones); NEXT_PASS (pass_ipa_pta); NEXT_PASS (pass_omp_simd_clone); TERMINATE_PASS_LIST (all_late_ipa_passes) diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 62e5b696cab..1e8badfe4be 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -519,8 +519,6 @@ extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_modref (gcc::context *ctxt); -extern simple_ipa_opt_pass *make_pass_materialize_all_clones (gcc::context * - ctxt); extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context *ctxt);