diff mbox series

Materialize clones on demand

Message ID 20201022094820.GB97578@kam.mff.cuni.cz
State New
Headers show
Series Materialize clones on demand | expand

Commit Message

Jan Hubicka Oct. 22, 2020, 9:48 a.m. UTC
Hi,
this patch removes the pass to materialize all clones and instead this
is now done on demand.  The motivation is to reduce lifetime of function
bodies in ltrans that should noticeably reduce memory use for highly
parallel compilations of large programs (like Martin does) or with
partitioning reduced/disabled. For cc1 with one partition the memory use
seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
particularly accurate).

This should also make get_body to do the right thing at WPA time (still
not good idea for production patch).  I did not test this path.

Martin (Jambor), Jakub, there is one FIXME in ipa-param-manipulation.
We seem to ICE when we redirect to a call before callee is materialized
(this should be possible to trigger on mainline with recursive
callgraphs too, but it definitly triggers on several testcases in c
testsuite if the get_untransformed_body is disabled). It would be nice
to fix this, but I am not quite sure how the debug info adjustments here
works.

Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with
release checking.  I plan to commit it after bit more testing.

Honza

gcc/ChangeLog:

2020-10-22  Jan Hubicka  <hubicka@ucw.cz>

	* cgraph.c (cgraph_node::get_untransformed_body): Perform lazy
	clone materialization.
	* cgraph.h (cgraph_node::materialize_clone): Declare.
	(symbol_table::materialize_all_clones): Remove.
	* cgraphclones.c (cgraph_materialize_clone): Turn to ...
	(cgraph_node::materialize_clone): .. this one; move here
	dumping from symbol_table::materialize_all_clones.
	(symbol_table::materialize_all_clones): Remove.
	* cgraphunit.c (mark_functions_to_output): Clear stmt references.
	(cgraph_node::expand): Initialize bitmaps early;
	do not call execute_all_ipa_transforms if there are no transforms.
	* ipa-inline-transform.c (save_inline_function_body): Fix formating.
	(inline_transform): Materialize all clones before function is modified.
	* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
	Materialize clone if needed.
	* ipa.c (class pass_materialize_all_clones): Remove.
	(make_pass_materialize_all_clones): Remove.
	* passes.c (execute_all_ipa_transforms): Materialize all clones.
	* passes.def: Remove pass_materialize_all_clones.
	* tree-pass.h (make_pass_materialize_all_clones): Remove.

Comments

Martin Jambor Oct. 23, 2020, 11:21 a.m. UTC | #1
Hi,

On Thu, Oct 22 2020, Jan Hubicka wrote:
> Hi,
> this patch removes the pass to materialize all clones and instead this
> is now done on demand.  The motivation is to reduce lifetime of function
> bodies in ltrans that should noticeably reduce memory use for highly
> parallel compilations of large programs (like Martin does) or with
> partitioning reduced/disabled. For cc1 with one partition the memory use
> seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> particularly accurate).
>

Nice.

> This should also make get_body to do the right thing at WPA time (still
> not good idea for production patch).  I did not test this path.
>
> Martin (Jambor), Jakub, there is one FIXME in ipa-param-manipulation.
> We seem to ICE when we redirect to a call before callee is materialized
> (this should be possible to trigger on mainline with recursive
> callgraphs too, but it definitly triggers on several testcases in c
> testsuite if the get_untransformed_body is disabled). It would be nice
> to fix this, but I am not quite sure how the debug info adjustments here
> works.

Well, the debug mappings are all based on PARM_DECLs.  Unfortunately, I
cannot think of any quick fix now, though we might want to sit down and
try to revise the mechanism also because of debug info issues described
in PR 95343 and PR 93385.  I'll keep this in mind and in my notes.

I have one question regarding the patch itself:

> Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with
> release checking.  I plan to commit it after bit more testing.
>
> Honza
>
> gcc/ChangeLog:
>
> 2020-10-22  Jan Hubicka  <hubicka@ucw.cz>
>
> 	* cgraph.c (cgraph_node::get_untransformed_body): Perform lazy
> 	clone materialization.
> 	* cgraph.h (cgraph_node::materialize_clone): Declare.
> 	(symbol_table::materialize_all_clones): Remove.
> 	* cgraphclones.c (cgraph_materialize_clone): Turn to ...
> 	(cgraph_node::materialize_clone): .. this one; move here
> 	dumping from symbol_table::materialize_all_clones.
> 	(symbol_table::materialize_all_clones): Remove.
> 	* cgraphunit.c (mark_functions_to_output): Clear stmt references.
> 	(cgraph_node::expand): Initialize bitmaps early;
> 	do not call execute_all_ipa_transforms if there are no transforms.
> 	* ipa-inline-transform.c (save_inline_function_body): Fix formating.
> 	(inline_transform): Materialize all clones before function is modified.
> 	* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
> 	Materialize clone if needed.
> 	* ipa.c (class pass_materialize_all_clones): Remove.
> 	(make_pass_materialize_all_clones): Remove.
> 	* passes.c (execute_all_ipa_transforms): Materialize all clones.
> 	* passes.def: Remove pass_materialize_all_clones.
> 	* tree-pass.h (make_pass_materialize_all_clones): Remove.
>

[...]

> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 05713c28cf0..1e2262789dd 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2298,7 +2299,8 @@ cgraph_node::expand (void)
>    bitmap_obstack_initialize (&reg_obstack); /* FIXME, only at RTL generation*/
>  
>    update_ssa (TODO_update_ssa_only_virtuals);
> -  execute_all_ipa_transforms (false);
> +  if (ipa_transforms_to_apply.exists ())
> +    execute_all_ipa_transforms (false);
>  

Can some function not have ipa_inline among the transforms_to_apply?

Martin
Jan Hubicka Oct. 23, 2020, 11:26 a.m. UTC | #2
> > Bootstrapped/regtested x86_64-linux and also lto-bootstrapped with
> > release checking.  I plan to commit it after bit more testing.
> >
> > Honza
> >
> > gcc/ChangeLog:
> >
> > 2020-10-22  Jan Hubicka  <hubicka@ucw.cz>
> >
> > 	* cgraph.c (cgraph_node::get_untransformed_body): Perform lazy
> > 	clone materialization.
> > 	* cgraph.h (cgraph_node::materialize_clone): Declare.
> > 	(symbol_table::materialize_all_clones): Remove.
> > 	* cgraphclones.c (cgraph_materialize_clone): Turn to ...
> > 	(cgraph_node::materialize_clone): .. this one; move here
> > 	dumping from symbol_table::materialize_all_clones.
> > 	(symbol_table::materialize_all_clones): Remove.
> > 	* cgraphunit.c (mark_functions_to_output): Clear stmt references.
> > 	(cgraph_node::expand): Initialize bitmaps early;
> > 	do not call execute_all_ipa_transforms if there are no transforms.
> > 	* ipa-inline-transform.c (save_inline_function_body): Fix formating.
> > 	(inline_transform): Materialize all clones before function is modified.
> > 	* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
> > 	Materialize clone if needed.
> > 	* ipa.c (class pass_materialize_all_clones): Remove.
> > 	(make_pass_materialize_all_clones): Remove.
> > 	* passes.c (execute_all_ipa_transforms): Materialize all clones.
> > 	* passes.def: Remove pass_materialize_all_clones.
> > 	* tree-pass.h (make_pass_materialize_all_clones): Remove.
> >
> 
> [...]
> 
> > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > index 05713c28cf0..1e2262789dd 100644
> > --- a/gcc/cgraphunit.c
> > +++ b/gcc/cgraphunit.c
> > @@ -2298,7 +2299,8 @@ cgraph_node::expand (void)
> >    bitmap_obstack_initialize (&reg_obstack); /* FIXME, only at RTL generation*/
> >  
> >    update_ssa (TODO_update_ssa_only_virtuals);
> > -  execute_all_ipa_transforms (false);
> > +  if (ipa_transforms_to_apply.exists ())
> > +    execute_all_ipa_transforms (false);
> >  
> 
> Can some function not have ipa_inline among the transforms_to_apply?

This is for the case of repeated execution.  If you do get_body earlier
transforms are already applied.

Honza
> 
> Martin
Jan Hubicka Oct. 23, 2020, 7:27 p.m. UTC | #3
> Hi,
> 
> On Thu, Oct 22 2020, Jan Hubicka wrote:
> > Hi,
> > this patch removes the pass to materialize all clones and instead this
> > is now done on demand.  The motivation is to reduce lifetime of function
> > bodies in ltrans that should noticeably reduce memory use for highly
> > parallel compilations of large programs (like Martin does) or with
> > partitioning reduced/disabled. For cc1 with one partition the memory use
> > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> > particularly accurate).
> >
> 
> Nice.

Sadly this is only true w/o debug info.  I collected memory usage stats
at the end of the ltrans stage and it is as folloes

 - after streaming in global stream: 126M GGC and 41M heap
 - after streaming symbol table:     373M GGC and 92M heap
 - after stremaing in summaries:     394M GGC and 92M heap 
   (only large summary seems to be ipa-cp transformation summary)
 - then compilation starts and memory goes slowly up to 3527M at the end
   of compilation

The following accounts for more than 1% GGC:

Time variable                                   usr           sys          wall           GGC
 ipa inlining heuristics            :   6.99 (  0%)   4.62 (  1%)  11.17 (  1%)   241M (  1%)
 ipa lto gimple in                  :  50.04 (  3%)  29.72 (  7%)  80.22 (  4%)  3129M ( 14%)
 ipa lto decl in                    :   0.79 (  0%)   0.36 (  0%)   1.15 (  0%)   135M (  1%)
 ipa lto cgraph I/O                 :   0.95 (  0%)   0.20 (  0%)   1.15 (  0%)   269M (  1%)
 cfg cleanup                        :  25.83 (  2%)   2.52 (  1%)  28.15 (  1%)   154M (  1%)
 df reg dead/unused notes           :  24.08 (  2%)   2.09 (  1%)  26.77 (  1%)   180M (  1%)
 alias analysis                     :  16.94 (  1%)   1.05 (  0%)  17.71 (  1%)   383M (  2%)
 integration                        :  45.76 (  3%)  44.30 ( 11%)  88.99 (  5%)  2328M ( 10%)
 tree VRP                           :  41.38 (  3%)  15.67 (  4%)  57.71 (  3%)   560M (  2%)
 tree SSA rewrite                   :   6.71 (  0%)   2.17 (  1%)   8.96 (  0%)   194M (  1%)
 tree SSA incremental               :  26.99 (  2%)   8.23 (  2%)  34.42 (  2%)   144M (  1%)
 tree operand scan                  :  65.34 (  4%)  61.50 ( 15%) 127.02 (  7%)   886M (  4%)
 dominator optimization             :  41.53 (  3%)  13.56 (  3%)  55.78 (  3%)   407M (  2%)
 tree split crit edges              :   1.08 (  0%)   0.65 (  0%)   1.63 (  0%)   127M (  1%)
 tree PRE                           :  34.30 (  2%)  14.52 (  4%)  49.08 (  3%)   337M (  1%)
 tree code sinking                  :   2.92 (  0%)   0.58 (  0%)   3.51 (  0%)   122M (  1%)
 tree iv optimization               :   6.71 (  0%)   1.19 (  0%)   8.46 (  0%)   133M (  1%)
 expand                             :  45.56 (  3%)   8.24 (  2%)  55.02 (  3%)  1980M (  9%)
 forward prop                       :  11.89 (  1%)   1.39 (  0%)  12.59 (  1%)   130M (  1%)
 dead store elim2                   :  10.03 (  1%)   0.70 (  0%)  11.23 (  1%)   138M (  1%)
 loop init                          :  11.96 (  1%)   4.95 (  1%)  17.11 (  1%)   378M (  2%)
 CPROP                              :  22.63 (  2%)   2.78 (  1%)  25.19 (  1%)   359M (  2%)
 combiner                           :  41.39 (  3%)   2.57 (  1%)  43.30 (  2%)   558M (  2%)
 reload CSE regs                    :  22.38 (  2%)   1.25 (  0%)  23.06 (  1%)   186M (  1%)
 final                              :  32.33 (  2%)   4.28 (  1%)  36.75 (  2%)  1105M (  5%)
 symout                             :  49.04 (  3%)   2.23 (  1%)  52.33 (  3%)  2517M ( 11%)
 var-tracking emit                  :  33.26 (  2%)   1.02 (  0%)  34.35 (  2%)   582M (  3%)
 rest of compilation                :  38.05 (  3%)  15.61 (  4%)  52.42 (  3%)   114M (  1%)
 TOTAL                              :1486.02        408.79       1899.96        22512M

We seem to leak some hashtables:
dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc
cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc

and hashmaps:
ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap
alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc

and hashsets:
ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap

and vectors:
tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k
tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k

However main problem is
cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
--------------------------------------------------------------------------------------------------------------------------------------------

Clearly some function bodies leak - I will try to figure out what. But
main problem is debug info.
I guess debug info for whole cc1plus is large, but it would be nice if
it was not in the garbage collector, for example :)

Honza
Richard Biener Oct. 26, 2020, 7:41 a.m. UTC | #4
On Fri, 23 Oct 2020, Jan Hubicka wrote:

> > Hi,
> > 
> > On Thu, Oct 22 2020, Jan Hubicka wrote:
> > > Hi,
> > > this patch removes the pass to materialize all clones and instead this
> > > is now done on demand.  The motivation is to reduce lifetime of function
> > > bodies in ltrans that should noticeably reduce memory use for highly
> > > parallel compilations of large programs (like Martin does) or with
> > > partitioning reduced/disabled. For cc1 with one partition the memory use
> > > seems to go down from 4gb to cca 1.5gb (seeing from top, so this is not
> > > particularly accurate).
> > >
> > 
> > Nice.
> 
> Sadly this is only true w/o debug info.  I collected memory usage stats
> at the end of the ltrans stage and it is as folloes
> 
>  - after streaming in global stream: 126M GGC and 41M heap
>  - after streaming symbol table:     373M GGC and 92M heap
>  - after stremaing in summaries:     394M GGC and 92M heap 
>    (only large summary seems to be ipa-cp transformation summary)
>  - then compilation starts and memory goes slowly up to 3527M at the end
>    of compilation
> 
> The following accounts for more than 1% GGC:
> 
> Time variable                                   usr           sys          wall           GGC
>  ipa inlining heuristics            :   6.99 (  0%)   4.62 (  1%)  11.17 (  1%)   241M (  1%)
>  ipa lto gimple in                  :  50.04 (  3%)  29.72 (  7%)  80.22 (  4%)  3129M ( 14%)
>  ipa lto decl in                    :   0.79 (  0%)   0.36 (  0%)   1.15 (  0%)   135M (  1%)
>  ipa lto cgraph I/O                 :   0.95 (  0%)   0.20 (  0%)   1.15 (  0%)   269M (  1%)
>  cfg cleanup                        :  25.83 (  2%)   2.52 (  1%)  28.15 (  1%)   154M (  1%)
>  df reg dead/unused notes           :  24.08 (  2%)   2.09 (  1%)  26.77 (  1%)   180M (  1%)
>  alias analysis                     :  16.94 (  1%)   1.05 (  0%)  17.71 (  1%)   383M (  2%)
>  integration                        :  45.76 (  3%)  44.30 ( 11%)  88.99 (  5%)  2328M ( 10%)
>  tree VRP                           :  41.38 (  3%)  15.67 (  4%)  57.71 (  3%)   560M (  2%)
>  tree SSA rewrite                   :   6.71 (  0%)   2.17 (  1%)   8.96 (  0%)   194M (  1%)
>  tree SSA incremental               :  26.99 (  2%)   8.23 (  2%)  34.42 (  2%)   144M (  1%)
>  tree operand scan                  :  65.34 (  4%)  61.50 ( 15%) 127.02 (  7%)   886M (  4%)
>  dominator optimization             :  41.53 (  3%)  13.56 (  3%)  55.78 (  3%)   407M (  2%)
>  tree split crit edges              :   1.08 (  0%)   0.65 (  0%)   1.63 (  0%)   127M (  1%)
>  tree PRE                           :  34.30 (  2%)  14.52 (  4%)  49.08 (  3%)   337M (  1%)
>  tree code sinking                  :   2.92 (  0%)   0.58 (  0%)   3.51 (  0%)   122M (  1%)
>  tree iv optimization               :   6.71 (  0%)   1.19 (  0%)   8.46 (  0%)   133M (  1%)
>  expand                             :  45.56 (  3%)   8.24 (  2%)  55.02 (  3%)  1980M (  9%)
>  forward prop                       :  11.89 (  1%)   1.39 (  0%)  12.59 (  1%)   130M (  1%)
>  dead store elim2                   :  10.03 (  1%)   0.70 (  0%)  11.23 (  1%)   138M (  1%)
>  loop init                          :  11.96 (  1%)   4.95 (  1%)  17.11 (  1%)   378M (  2%)
>  CPROP                              :  22.63 (  2%)   2.78 (  1%)  25.19 (  1%)   359M (  2%)
>  combiner                           :  41.39 (  3%)   2.57 (  1%)  43.30 (  2%)   558M (  2%)
>  reload CSE regs                    :  22.38 (  2%)   1.25 (  0%)  23.06 (  1%)   186M (  1%)
>  final                              :  32.33 (  2%)   4.28 (  1%)  36.75 (  2%)  1105M (  5%)
>  symout                             :  49.04 (  3%)   2.23 (  1%)  52.33 (  3%)  2517M ( 11%)
>  var-tracking emit                  :  33.26 (  2%)   1.02 (  0%)  34.35 (  2%)   582M (  3%)
>  rest of compilation                :  38.05 (  3%)  15.61 (  4%)  52.42 (  3%)   114M (  1%)
>  TOTAL                              :1486.02        408.79       1899.96        22512M
> 
> We seem to leak some hashtables:
> dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc

that one likely keeps quite some memory live...

> cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
> tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc

Hmm, so we do

  scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);

and

  scalar_evolution_info->empty ();
  scalar_evolution_info = NULL;

to reclaim.  ->empty () will IIRC at least allocate 7 elements which we
the eventually should reclaim during a GC walk - I guess the hashtable
statistics do not really handle GC reclaimed portions?

If there's a friendlier way of releasing a GC allocated hash-tab
we can switch to that.  Note that in principle the hash-table doesn't
need to be GC allocated but it needs to be walked since it refers to
trees that might not be referenced in other ways.

> and hashmaps:
> ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
> tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap

Similar as SCEV, probably mis-accounting?

> alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
> ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
> dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc
> 
> and hashsets:
> ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
> ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
> tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap
> 
> and vectors:
> tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k

Huh.  It's an auto_vec<>

> tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
> tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
> ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
> tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
> graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
> dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
> tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
> tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
> symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
> vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k

Those all look OK to me, not sure why we even think there's a leak?

> However main problem is
> cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
> cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
> varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
> emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
> dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
> emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
> tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
> gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
> tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
> dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
> cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
> tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
> stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
> stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
> tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
> cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
> cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
> tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
> tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
> stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
> dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
> tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
> tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
> function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
> hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
> dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
> tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
> dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
> dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
> dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
> toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> Clearly some function bodies leak - I will try to figure out what. But
> main problem is debug info.
> I guess debug info for whole cc1plus is large, but it would be nice if
> it was not in the garbage collector, for example :)

Well, we're building a DIE tree for the whole unit here so I'm not sure
what parts we can optimize.  The structures may keep quite some stuff
on the tree side live through the decl -> DIE and block -> DIE maps
and the external_die_map used for LTO streaming (but if we lazily stream
bodies we do need to keep this map ... unless we add some
start/end-stream-body hooks and doing the map per function.  But then
we build the DIEs lazily as well so the query of the map is lazy :/)

Richard.
Jan Hubicka Oct. 26, 2020, 9:48 a.m. UTC | #5
> > We seem to leak some hashtables:
> > dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc
> 
> that one likely keeps quite some memory live...

Yep, having in-memory dwaf2out for whole cc1plus eats a lot of memory
quite naturally.
> 
> > cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
> > tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc
> 
> Hmm, so we do
> 
>   scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);
> 
> and
> 
>   scalar_evolution_info->empty ();
>   scalar_evolution_info = NULL;
> 
> to reclaim.  ->empty () will IIRC at least allocate 7 elements which we
> the eventually should reclaim during a GC walk - I guess the hashtable
> statistics do not really handle GC reclaimed portions?
> 
> If there's a friendlier way of releasing a GC allocated hash-tab
> we can switch to that.  Note that in principle the hash-table doesn't
> need to be GC allocated but it needs to be walked since it refers to
> trees that might not be referenced in other ways.

hashtable has destructor that does ggc_free, so i think ggc_delete is
right way to free.
> 
> > and hashmaps:
> > ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
> > tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap
> 
> Similar as SCEV, probably mis-accounting?
> 
> > alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
> > ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
> > dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc
> > 
> > and hashsets:
> > ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
> > ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
> > tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap
> > 
> > and vectors:
> > tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k
> 
> Huh.  It's an auto_vec<>

Hmm, those maybe gets miscounted, i will check.
> 
> > tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
> > tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
> > ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
> > tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
> > graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
> > dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
> > tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
> > tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
> > symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
> > vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k

Also I should annotate copy.
> 
> Those all look OK to me, not sure why we even think there's a leak?

I think we do not need to hold references anymore (perhaps for aliases -
i will check).  Also all function bodies should be freed by now.
> 
> > However main problem is
> > cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
> > cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
> > varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
> > emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
> > dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
> > emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
> > tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
> > gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
> > tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
> > dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
> > cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
I think it is bug to have fuction body at the end of compilation - will
try to work out reason for that.
> > tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
> > stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
> > stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
> > tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
> > cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
> > cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
> > tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
> > tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
> > stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
> > dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
> > tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
> > tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
> > function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
> > hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
> > dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
> > tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
> > dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
> > dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
> > dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
> > toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > 
> > Clearly some function bodies leak - I will try to figure out what. But
> > main problem is debug info.
> > I guess debug info for whole cc1plus is large, but it would be nice if
> > it was not in the garbage collector, for example :)
> 
> Well, we're building a DIE tree for the whole unit here so I'm not sure
> what parts we can optimize.  The structures may keep quite some stuff
> on the tree side live through the decl -> DIE and block -> DIE maps
> and the external_die_map used for LTO streaming (but if we lazily stream
> bodies we do need to keep this map ... unless we add some
> start/end-stream-body hooks and doing the map per function.  But then
> we build the DIEs lazily as well so the query of the map is lazy :/)

Yep, not sure how much we could do here.  Of course ggc_collect when
invoked will do quite a lot of walking to discover relatively few tree
references, but not sure if that can be solved by custom marking or so.

Hona
> 
> Richard.
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend
Richard Biener Oct. 26, 2020, 10:32 a.m. UTC | #6
On Mon, 26 Oct 2020, Jan Hubicka wrote:

> > > We seem to leak some hashtables:
> > > dwarf2out.c:28850 (dwarf2out_init)                      31M: 23.8%       47M       19 :  0.0%       ggc
> > 
> > that one likely keeps quite some memory live...
> 
> Yep, having in-memory dwaf2out for whole cc1plus eats a lot of memory
> quite naturally.

OTOH the late debug shouldn't be so big ...

> > 
> > > cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
> > > tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc
> > 
> > Hmm, so we do
> > 
> >   scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);
> > 
> > and
> > 
> >   scalar_evolution_info->empty ();
> >   scalar_evolution_info = NULL;
> > 
> > to reclaim.  ->empty () will IIRC at least allocate 7 elements which we
> > the eventually should reclaim during a GC walk - I guess the hashtable
> > statistics do not really handle GC reclaimed portions?
> > 
> > If there's a friendlier way of releasing a GC allocated hash-tab
> > we can switch to that.  Note that in principle the hash-table doesn't
> > need to be GC allocated but it needs to be walked since it refers to
> > trees that might not be referenced in other ways.
> 
> hashtable has destructor that does ggc_free, so i think ggc_delete is
> right way to free.

Can you try if that helps?  As said, in the end it's probably
miscountings in the stats.

> > 
> > > and hashmaps:
> > > ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
> > > tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap
> > 
> > Similar as SCEV, probably mis-accounting?
> > 
> > > alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
> > > ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
> > > dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc
> > > 
> > > and hashsets:
> > > ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
> > > ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
> > > tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap
> > > 
> > > and vectors:
> > > tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k
> > 
> > Huh.  It's an auto_vec<>
> 
> Hmm, those maybe gets miscounted, i will check.
> > 
> > > tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
> > > tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
> > > ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
> > > tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
> > > graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
> > > dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
> > > tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
> > > tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
> > > symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
> > > vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k
> 
> Also I should annotate copy.

Yeah, some missing annotations might cause issues.

> > 
> > Those all look OK to me, not sure why we even think there's a leak?
> 
> I think we do not need to hold references anymore (perhaps for aliases -
> i will check).  Also all function bodies should be freed by now.
> > 
> > > However main problem is
> > > cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
> > > cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
> > > varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
> > > emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
> > > dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
> > > emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
> > > tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
> > > gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
> > > tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
> > > dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
> > > cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
> I think it is bug to have fuction body at the end of compilation - will
> try to work out reason for that.
> > > tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
> > > stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
> > > stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
> > > tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
> > > cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
> > > cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
> > > tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
> > > tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
> > > stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
> > > dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
> > > tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
> > > tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
> > > function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
> > > hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
> > > dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
> > > tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
> > > dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
> > > dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
> > > dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
> > > toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
> > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
> > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > 
> > > Clearly some function bodies leak - I will try to figure out what. But
> > > main problem is debug info.
> > > I guess debug info for whole cc1plus is large, but it would be nice if
> > > it was not in the garbage collector, for example :)
> > 
> > Well, we're building a DIE tree for the whole unit here so I'm not sure
> > what parts we can optimize.  The structures may keep quite some stuff
> > on the tree side live through the decl -> DIE and block -> DIE maps
> > and the external_die_map used for LTO streaming (but if we lazily stream
> > bodies we do need to keep this map ... unless we add some
> > start/end-stream-body hooks and doing the map per function.  But then
> > we build the DIEs lazily as well so the query of the map is lazy :/)
> 
> Yep, not sure how much we could do here.  Of course ggc_collect when
> invoked will do quite a lot of walking to discover relatively few tree
> references, but not sure if that can be solved by custom marking or so.

In principle the late DIE creation code can remove entries from the
external_die_map map, but not sure how much that helps (might also
cause re-allocation of it if we shrink it).  It might help quite a bit
for references to BLOCKs.  Maybe you can try the following simple
patch ...

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index ba93a6c3d81..350cc5d443c 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl)
 
   const char *sym = desc->sym;
   unsigned HOST_WIDE_INT off = desc->off;
+  external_die_map->remove (decl);
 
   in_lto_p = false;
   dw_die_ref die = (TREE_CODE (decl) == BLOCK



> Hona
> > 
> > Richard.
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imend
>
Jan Hubicka Oct. 26, 2020, 10:35 a.m. UTC | #7
> > > 
> > > > cselib.c:3137 (cselib_init)                             34M: 25.9%       34M     1514k: 17.3%      heap
> > > > tree-scalar-evolution.c:2984 (scev_initialize)          37M: 27.6%       50M      228k:  2.6%       ggc
> > > 
> > > Hmm, so we do
> > > 
> > >   scalar_evolution_info = hash_table<scev_info_hasher>::create_ggc (100);
> > > 
> > > and
> > > 
> > >   scalar_evolution_info->empty ();
> > >   scalar_evolution_info = NULL;
> > > 
> > > to reclaim.  ->empty () will IIRC at least allocate 7 elements which we
> > > the eventually should reclaim during a GC walk - I guess the hashtable
> > > statistics do not really handle GC reclaimed portions?
> > > 
> > > If there's a friendlier way of releasing a GC allocated hash-tab
> > > we can switch to that.  Note that in principle the hash-table doesn't
> > > need to be GC allocated but it needs to be walked since it refers to
> > > trees that might not be referenced in other ways.
> > 
> > hashtable has destructor that does ggc_free, so i think ggc_delete is
> > right way to free.
> 
> Can you try if that helps?  As said, in the end it's probably
> miscountings in the stats.

I do not think we are miscounting here.  empty () really allocates small
hashtable and leaves it alone.
It should be ggc_delete.  I will test it.
> 
> > > 
> > > > and hashmaps:
> > > > ipa-reference.c:1133 (ipa_reference_read_optimiz      2047k:  3.0%     3071k        9 :  0.0%      heap
> > > > tree-ssa.c:60 (redirect_edge_var_map_add)             4125k:  6.1%     4126k     8190 :  0.1%      heap
> > > 
> > > Similar as SCEV, probably mis-accounting?
> > > 
> > > > alias.c:1200 (record_alias_subset)                    4510k:  6.6%     4510k     4546 :  0.0%       ggc
> > > > ipa-prop.h:986 (ipcp_transformation_t)                8191k: 12.0%       11M       16 :  0.0%       ggc
> > > > dwarf2out.c:5957 (dwarf2out_register_external_di        47M: 72.2%       71M       12 :  0.0%       ggc
> > > > 
> > > > and hashsets:
> > > > ipa-devirt.c:3093 (possible_polymorphic_call_tar        15k:  0.9%       23k        8 :  0.0%      heap
> > > > ipa-devirt.c:1599 (add_type_duplicate)                 412k: 22.2%      412k     4065 :  0.0%      heap
> > > > tree-ssa-threadbackward.c:40 (thread_jumps)           1432k: 77.0%     1433k      119k:  0.8%      heap
> > > > 
> > > > and vectors:
> > > > tree-ssa-structalias.c:5783 (push_fields_onto_fi          8       847k: 0.3%      976k    475621: 0.8%        17k        24k
> > > 
> > > Huh.  It's an auto_vec<>
> > 
> > Hmm, those maybe gets miscounted, i will check.
> > > 
> > > > tree-ssa-pre.c:334 (alloc_expression_id)                 48      1125k: 0.4%     1187k    198336: 0.3%        23k        34k
> > > > tree-into-ssa.c:1787 (register_new_update_single          8      1196k: 0.5%     1264k    380385: 0.6%        24k        36k
> > > > ggc-page.c:1264 (add_finalizer)                           8      1232k: 0.5%     1848k        43: 0.0%        77k        81k
> > > > tree-ssa-structalias.c:1609 (topo_visit)                  8      1302k: 0.5%     1328k    892964: 1.4%        27k        33k
> > > > graphds.c:254 (graphds_dfs)                               4      1469k: 0.6%     1675k   2101780: 3.4%        30k        34k
> > > > dominance.c:955 (get_dominated_to_depth)                  8      2251k: 0.9%     2266k    685140: 1.1%        46k        50k
> > > > tree-ssa-structalias.c:410 (new_var_info)                32      2264k: 0.9%     2341k    330758: 0.5%        47k        63k
> > > > tree-ssa-structalias.c:3104 (process_constraint)         48      2376k: 0.9%     2606k    405451: 0.7%        49k        83k
> > > > symtab.c:612 (create_reference)                           8      3314k: 1.3%     4897k     75213: 0.1%       414k       612k
> > > > vec.h:1734 (copy)                                        48       233M:90.5%      234M   6243163:10.1%      4982k      5003k
> > 
> > Also I should annotate copy.
> 
> Yeah, some missing annotations might cause issues.

It will only let us to see who copies the vectors ;)

auto_vecs I think are special since we may manage to miscount the
pre-allocated space.  I will look into that.
> > > 
> > > Well, we're building a DIE tree for the whole unit here so I'm not sure
> > > what parts we can optimize.  The structures may keep quite some stuff
> > > on the tree side live through the decl -> DIE and block -> DIE maps
> > > and the external_die_map used for LTO streaming (but if we lazily stream
> > > bodies we do need to keep this map ... unless we add some
> > > start/end-stream-body hooks and doing the map per function.  But then
> > > we build the DIEs lazily as well so the query of the map is lazy :/)
> > 
> > Yep, not sure how much we could do here.  Of course ggc_collect when
> > invoked will do quite a lot of walking to discover relatively few tree
> > references, but not sure if that can be solved by custom marking or so.
> 
> In principle the late DIE creation code can remove entries from the
> external_die_map map, but not sure how much that helps (might also
> cause re-allocation of it if we shrink it).  It might help quite a bit
> for references to BLOCKs.  Maybe you can try the following simple
> patch ...
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index ba93a6c3d81..350cc5d443c 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl)
>  
>    const char *sym = desc->sym;
>    unsigned HOST_WIDE_INT off = desc->off;
> +  external_die_map->remove (decl);
>  
>    in_lto_p = false;
>    dw_die_ref die = (TREE_CODE (decl) == BLOCK

I will give it a try.  Thanks!
I think shrinking hashtables is not much of a fear here: it happens
lazilly either at ggc_collect (that is desirable) or when hashtable is
walked (which is amortized by the walk)

Honza
> 
> 
> 
> > Hona
> > > 
> > > Richard.
> > > 
> > > -- 
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imend
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend
Jan Hubicka Oct. 28, 2020, 3:51 p.m. UTC | #8
> > > > However main problem is
> > > > cfg.c:202 (connect_src)                               5745k:  0.2%      271M:  1.9%     1754k:  0.0%     1132k:  0.2%     7026k
> > > > cfg.c:212 (connect_dest)                              6307k:  0.2%      281M:  2.0%    10129k:  0.2%     2490k:  0.5%     7172k
> > > > varasm.c:3359 (build_constant_desc)                   7387k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
> > > > emit-rtl.c:486 (gen_raw_REG)                          7799k:  0.2%      215M:  1.5%       96 :  0.0%        0 :  0.0%     9502k
> > > > dwarf2cfi.c:2341 (add_cfis_to_fde)                    8027k:  0.2%        0 :  0.0%     4906k:  0.1%     1405k:  0.3%       78k
> > > > emit-rtl.c:4074 (make_jump_insn_raw)                  8239k:  0.2%       93M:  0.7%        0 :  0.0%        0 :  0.0%     1442k
> > > > tree-ssanames.c:308 (make_ssa_name_fn)                9130k:  0.2%      456M:  3.3%        0 :  0.0%        0 :  0.0%     6622k
> > > > gimple.c:1808 (gimple_copy)                           9508k:  0.3%      524M:  3.7%     8609k:  0.2%     2972k:  0.6%     7135k
> > > > tree-inline.c:4879 (expand_call_inline)               9590k:  0.3%       21M:  0.2%        0 :  0.0%        0 :  0.0%      328k
> > > > dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      444k
> > > > cfg.c:266 (unchecked_make_edge)                         10M:  0.3%       60M:  0.4%      355M:  6.8%        0 :  0.0%     9083k
> > I think it is bug to have fuction body at the end of compilation - will
> > try to work out reason for that.
> > > > tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2313k:  0.0%        0 :  0.0%        0 :  0.0%      548k
> > > > stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7055k:  0.0%        0 :  0.0%     2270k:  0.5%      588k
> > > > stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      588k
> > > > tree-phinodes.c:119 (allocate_phi_node)                 11M:  0.3%      153M:  1.1%        0 :  0.0%     3539k:  0.7%      340k
> > > > cgraph.c:289 (create_empty)                             12M:  0.3%        0 :  0.0%      109M:  2.1%        0 :  0.0%      371k
> > > > cfg.c:127 (alloc_block)                                 14M:  0.4%      705M:  5.0%        0 :  0.0%        0 :  0.0%     7086k
> > > > tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.6%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
> > > > tree-inline.c:834 (remap_block)                         28M:  0.8%      159M:  1.1%        0 :  0.0%        0 :  0.0%     2009k
> > > > stringpool.c:79 (ggc_alloc_string)                      28M:  0.8%     5619k:  0.0%        0 :  0.0%     6658k:  1.4%     1785k
> > > > dwarf2out.c:11727 (add_ranges_num)                      32M:  0.9%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
> > > > tree-inline.c:5942 (copy_decl_to_var)                   39M:  1.1%       51M:  0.4%        0 :  0.0%        0 :  0.0%      646k
> > > > tree-inline.c:5994 (copy_decl_no_change)                78M:  2.1%      270M:  1.9%        0 :  0.0%        0 :  0.0%     2497k
> > > > function.c:4438 (reorder_blocks_1)                      96M:  2.6%      101M:  0.7%        0 :  0.0%        0 :  0.0%     2109k
> > > > hash-table.h:802 (expand)                              142M:  3.9%       18M:  0.1%      198M:  3.8%       32M:  6.9%       38k
> > > > dwarf2out.c:10086 (new_loc_list)                       219M:  6.0%       11M:  0.1%        0 :  0.0%        0 :  0.0%     2955k
> > > > tree-streamer-in.c:637 (streamer_alloc_tree)           379M: 10.3%      426M:  3.0%        0 :  0.0%     4201k:  0.9%     9828k
> > > > dwarf2out.c:5702 (new_die_raw)                         434M: 11.8%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5556k
> > > > dwarf2out.c:1383 (new_loc_descr)                       519M: 14.1%       12M:  0.1%     2880 :  0.0%        0 :  0.0%     6812k
> > > > dwarf2out.c:4420 (add_dwarf_attr)                      640M: 17.4%        0 :  0.0%       94M:  1.8%     4584k:  1.0%     3877k
> > > > toplev.c:906 (realloc_for_line_map)                    768M: 20.8%        0 :  0.0%      767M: 14.6%      255M: 54.4%       33 
> > > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > > GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> > > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > > Total                                                 3689M:100.0%    14039M:100.0%     5254M:100.0%      470M:100.0%      391M
> > > > --------------------------------------------------------------------------------------------------------------------------------------------
> > > > 
> > > > Clearly some function bodies leak - I will try to figure out what. But
> > > > main problem is debug info.
> > > > I guess debug info for whole cc1plus is large, but it would be nice if
> > > > it was not in the garbage collector, for example :)
> > > 
> > > Well, we're building a DIE tree for the whole unit here so I'm not sure
> > > what parts we can optimize.  The structures may keep quite some stuff
> > > on the tree side live through the decl -> DIE and block -> DIE maps
> > > and the external_die_map used for LTO streaming (but if we lazily stream
> > > bodies we do need to keep this map ... unless we add some
> > > start/end-stream-body hooks and doing the map per function.  But then
> > > we build the DIEs lazily as well so the query of the map is lazy :/)
> > 
> > Yep, not sure how much we could do here.  Of course ggc_collect when
> > invoked will do quite a lot of walking to discover relatively few tree
> > references, but not sure if that can be solved by custom marking or so.
> 
> In principle the late DIE creation code can remove entries from the
> external_die_map map, but not sure how much that helps (might also
> cause re-allocation of it if we shrink it).  It might help quite a bit
> for references to BLOCKs.  Maybe you can try the following simple
> patch ...
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index ba93a6c3d81..350cc5d443c 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -5974,6 +5974,7 @@ maybe_create_die_with_external_ref (tree decl)
>  
>    const char *sym = desc->sym;
>    unsigned HOST_WIDE_INT off = desc->off;
> +  external_die_map->remove (decl);
>  
>    in_lto_p = false;
>    dw_die_ref die = (TREE_CODE (decl) == BLOCK

Updated stats are:

ipa-devirt.c:1950 (get_odr_type)                       385k:  0.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%     7044 
emit-rtl.c:4117 (make_note_raw)                        396k:  0.0%      986M:  6.8%        0 :  0.0%        0 :  0.0%       17M
lto-cgraph.c:1983 (input_node_opt_summary)             524k:  0.0%       18M:  0.1%      313k:  0.0%     1012k:  0.2%      124k
tree-inline.c:4883 (expand_call_inline)                526k:  0.0%       30M:  0.2%        0 :  0.0%        0 :  0.0%      329k
gimple.c:1822 (gimple_copy)                            527k:  0.0%      536M:  3.7%     8631k:  0.2%     2997k:  0.6%     7174k
emit-rtl.c:2703 (gen_label_rtx)                        532k:  0.0%       76M:  0.5%        0 :  0.0%        0 :  0.0%     1232k
ipa-modref-tree.h:154 (insert_access)                  592k:  0.0%        0 :  0.0%     4052k:  0.1%     7192 :  0.0%       26k
cfg.c:202 (connect_src)                                617k:  0.0%      277M:  1.9%     1755k:  0.0%     1133k:  0.2%     7053k
tree-ssanames.c:308 (make_ssa_name_fn)                 627k:  0.0%      466M:  3.2%        0 :  0.0%        0 :  0.0%     6642k
tree.c:7887 (build_pointer_type_for_mode)              635k:  0.0%     1094k:  0.0%        0 :  0.0%        0 :  0.0%       10k
cgraph.c:1989 (rtl_info)                               661k:  0.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%       27k
cfg.c:212 (connect_dest)                               698k:  0.0%      287M:  2.0%    10181k:  0.2%     2490k:  0.5%     7200k
symbol-summary.h:108 (allocate_new)                    736k:  0.0%        0 :  0.0%     8663k:  0.2%        0 :  0.0%      391k
varpool.c:137 (create_empty)                           746k:  0.0%        0 :  0.0%     6257k:  0.1%        0 :  0.0%       54k
varasm.c:1513 (make_decl_rtl)                          834k:  0.0%      866k:  0.0%        0 :  0.0%        0 :  0.0%       70k
emit-rtl.c:4074 (make_jump_insn_raw)                   913k:  0.0%      100M:  0.7%        0 :  0.0%        0 :  0.0%     1448k
tree-phinodes.c:119 (allocate_phi_node)                943k:  0.0%      164M:  1.1%        0 :  0.0%     3563k:  0.7%      343k
emit-rtl.c:386 (set_mem_attrs)                         982k:  0.0%      171M:  1.2%        0 :  0.0%        0 :  0.0%     4413k
tree.c:1311 (build_new_int_cst)                       1080k:  0.0%      838k:  0.0%       66M:  1.3%        0 :  0.0%     2188k
langhooks.c:664 (build_builtin_function)              1125k:  0.0%      137k:  0.0%        0 :  0.0%      170k:  0.0%     4367 
emit-rtl.c:486 (gen_raw_REG)                          1158k:  0.0%      221M:  1.5%       96 :  0.0%        0 :  0.0%     9517k
cfg.c:266 (unchecked_make_edge)                       1179k:  0.0%       69M:  0.5%      356M:  6.8%        0 :  0.0%     9119k
varasm.c:3350 (build_constant_desc)                   1232k:  0.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
varasm.c:3397 (build_constant_desc)                   1232k:  0.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
tree.c:1497 (cache_wide_int_in_type_cache)            1342k:  0.0%       44k:  0.0%        0 :  0.0%     3184 :  0.0%       18k
cfg.c:127 (alloc_block)                               1597k:  0.0%      720M:  5.0%        0 :  0.0%        0 :  0.0%     7113k
tree-inline.c:837 (remap_block)                       1738k:  0.1%      187M:  1.3%        0 :  0.0%        0 :  0.0%     2016k
dwarf2out.c:15872 (mem_loc_descriptor)                2048k:  0.1%        0 :  0.0%     1531k:  0.0%      512 :  0.0%       10 
emit-rtl.c:856 (gen_rtx_MEM)                          2138k:  0.1%      297M:  2.1%        0 :  0.0%        0 :  0.0%       12M
symtab.c:596 (create_reference)                       2486k:  0.1%        0 :  0.0%       44M:  0.8%      341k:  0.1%      192k
tree-inline.c:5038 (expand_call_inline)               2687k:  0.1%        0 :  0.0%     2434k:  0.0%       15k:  0.0%     6432 
dwarf2out.c:1028 (dwarf2out_alloc_current_fde)        3084k:  0.1%        0 :  0.0%        0 :  0.0%        0 :  0.0%       27k
ipa-prop.c:5276 (read_ipcp_transformation_info)       3549k:  0.1%       34k:  0.0%        0 :  0.0%      737k:  0.1%     6508 
alias.c:1200 (record_alias_subset)                    4712k:  0.1%        0 :  0.0%     3096 :  0.0%       36k:  0.0%     4679 
tree.c:2264 (build_string)                            5163k:  0.2%     1782k:  0.0%        0 :  0.0%      652k:  0.1%      115k
function.c:4438 (reorder_blocks_1)                    5470k:  0.2%      193M:  1.3%        0 :  0.0%        0 :  0.0%     2121k
varasm.c:3359 (build_constant_desc)                   7393k:  0.2%        0 :  0.0%        0 :  0.0%        0 :  0.0%       51k
dwarf2cfi.c:2341 (add_cfis_to_fde)                    8078k:  0.2%        0 :  0.0%     4933k:  0.1%     1417k:  0.3%       78k
dwarf2cfi.c:418 (new_cfi)                               10M:  0.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%      447k
stringpool.c:63 (alloc_node)                            10M:  0.3%       12M:  0.1%        0 :  0.0%        0 :  0.0%      591k
tree.c:1642 (wide_int_to_tree_1)                        10M:  0.3%     2375k:  0.0%        0 :  0.0%        0 :  0.0%      549k
stringpool.c:41 (stringpool_ggc_alloc)                  10M:  0.3%     7328k:  0.0%        0 :  0.0%     2279k:  0.5%      591k
cgraph.c:290 (create_empty)                             11M:  0.3%        0 :  0.0%       96M:  1.8%        0 :  0.0%      372k
tree-inline.c:5946 (copy_decl_to_var)                   16M:  0.5%       74M:  0.5%        0 :  0.0%        0 :  0.0%      647k
tree-streamer-in.c:558 (streamer_read_tree_bitfi        22M:  0.7%       13k:  0.0%        0 :  0.0%       22k:  0.0%       64k
stringpool.c:79 (ggc_alloc_string)                      27M:  0.8%     7321k:  0.0%        0 :  0.0%     6640k:  1.3%     1784k
dwarf2out.c:11728 (add_ranges_num)                      32M:  1.0%        0 :  0.0%       32M:  0.6%      144 :  0.0%       20 
tree-inline.c:5998 (copy_decl_no_change)                34M:  1.0%      315M:  2.2%        0 :  0.0%        0 :  0.0%     2504k
hash-table.h:802 (expand)                              142M:  4.3%       10M:  0.1%      185M:  3.5%       32M:  6.6%       29k
dwarf2out.c:10087 (new_loc_list)                       199M:  6.0%     9350k:  0.1%        0 :  0.0%        0 :  0.0%     2666k
tree-streamer-in.c:637 (streamer_alloc_tree)           315M:  9.5%      491M:  3.4%        0 :  0.0%     4243k:  0.8%     9820k
dwarf2out.c:5702 (new_die_raw)                         412M: 12.4%        0 :  0.0%        0 :  0.0%        0 :  0.0%     5285k
dwarf2out.c:1383 (new_loc_descr)                       480M: 14.4%     9653k:  0.1%     2880 :  0.0%        0 :  0.0%     6265k
dwarf2out.c:4420 (add_dwarf_attr)                      750M: 22.5%        0 :  0.0%       94M:  1.8%       13M:  2.7%     3891k
toplev.c:906 (realloc_for_line_map)                    768M: 23.0%        0 :  0.0%      767M: 14.6%      255M: 52.3%       33 
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 3332M:100.0%    14432M:100.0%     5267M:100.0%      489M:100.0%      389M
--------------------------------------------------------------------------------------------------------------------------------------------

So it seems there is a reduction from 3.6G to 3.3G

Honza
diff mbox series

Patch

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9480935ff84..35a0182b847 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -3872,7 +3872,7 @@  cgraph_node::function_or_virtual_thunk_symbol
 }
 
 /* When doing LTO, read cgraph_node's body from disk if it is not already
-   present.  */
+   present.  Also perform any necessary clone materializations.  */
 
 bool
 cgraph_node::get_untransformed_body (void)
@@ -3882,6 +3882,17 @@  cgraph_node::get_untransformed_body (void)
   size_t len;
   tree decl = this->decl;
 
+  /* See if there is clone to be materialized.
+     (inline clones does not need materialization, but we can be seeing
+      an inline clone of real clone).  */
+  cgraph_node *p = this;
+  for (cgraph_node *c = clone_of; c; c = c->clone_of)
+    {
+      if (c->decl != decl)
+	p->materialize_clone ();
+      p = c;
+    }
+
   /* Check if body is already there.  Either we have gimple body or
      the function is thunk and in that case we set DECL_ARGUMENTS.  */
   if (DECL_ARGUMENTS (decl) || gimple_has_body_p (decl))
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index c953a1b6711..d3279410c2e 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1152,6 +1152,8 @@  struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
      apply them.  */
   bool get_body (void);
 
+  void materialize_clone (void);
+
   /* Release memory used to represent body of function.
      Use this only for functions that are released before being translated to
      target code (i.e. RTL).  Functions that are compiled to RTL and beyond
@@ -2286,13 +2288,6 @@  public:
      functions inserted into callgraph already at construction time.  */
   void process_new_functions (void);
 
-  /* Once all functions from compilation unit are in memory, produce all clones
-     and update all calls.  We might also do this on demand if we don't want to
-     bring all functions to memory prior compilation, but current WHOPR
-     implementation does that and it is bit easier to keep everything right
-     in this order.  */
-  void materialize_all_clones (void);
-
   /* Register a symbol NODE.  */
   inline void register_symbol (symtab_node *node);
 
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index f920dcb4c29..07a51a58aef 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -1083,114 +1083,57 @@  void cgraph_node::remove_from_clone_tree ()
 
 /* Given virtual clone, turn it into actual clone.  */
 
-static void
-cgraph_materialize_clone (cgraph_node *node)
-{
-  bitmap_obstack_initialize (NULL);
-  node->former_clone_of = node->clone_of->decl;
-  if (node->clone_of->former_clone_of)
-    node->former_clone_of = node->clone_of->former_clone_of;
-  /* Copy the OLD_VERSION_NODE function tree to the new version.  */
-  tree_function_versioning (node->clone_of->decl, node->decl,
-			    node->clone.tree_map, node->clone.param_adjustments,
-			    true, NULL, NULL);
-  if (symtab->dump_file)
-    {
-      dump_function_to_file (node->clone_of->decl, symtab->dump_file,
-			     dump_flags);
-      dump_function_to_file (node->decl, symtab->dump_file, dump_flags);
-    }
-
-  cgraph_node *clone_of = node->clone_of;
-  /* Function is no longer clone.  */
-  node->remove_from_clone_tree ();
-  if (!clone_of->analyzed && !clone_of->clones)
-    {
-      clone_of->release_body ();
-      clone_of->remove_callees ();
-      clone_of->remove_all_references ();
-    }
-  bitmap_obstack_release (NULL);
-}
-
-/* Once all functions from compilation unit are in memory, produce all clones
-   and update all calls.  We might also do this on demand if we don't want to
-   bring all functions to memory prior compilation, but current WHOPR
-   implementation does that and it is a bit easier to keep everything right in
-   this order.  */
-
 void
-symbol_table::materialize_all_clones (void)
+cgraph_node::materialize_clone ()
 {
-  cgraph_node *node;
-  bool stabilized = false;
-  
-
+  clone_of->get_untransformed_body ();
+  former_clone_of = clone_of->decl;
+  if (clone_of->former_clone_of)
+    former_clone_of = clone_of->former_clone_of;
   if (symtab->dump_file)
-    fprintf (symtab->dump_file, "Materializing clones\n");
-
-  cgraph_node::checking_verify_cgraph_nodes ();
-
-  /* We can also do topological order, but number of iterations should be
-     bounded by number of IPA passes since single IPA pass is probably not
-     going to create clones of clones it created itself.  */
-  while (!stabilized)
     {
-      stabilized = true;
-      FOR_EACH_FUNCTION (node)
+      fprintf (symtab->dump_file, "cloning %s to %s\n",
+	       clone_of->dump_name (),
+	       dump_name ());
+      if (clone.tree_map)
         {
-	  if (node->clone_of && node->decl != node->clone_of->decl
-	      && !gimple_has_body_p (node->decl))
+	  fprintf (symtab->dump_file, "    replace map:");
+	  for (unsigned int i = 0;
+	       i < vec_safe_length (clone.tree_map);
+	       i++)
 	    {
-	      if (!node->clone_of->clone_of)
-		node->clone_of->get_untransformed_body ();
-	      if (gimple_has_body_p (node->clone_of->decl))
-	        {
-		  if (symtab->dump_file)
-		    {
-		      fprintf (symtab->dump_file, "cloning %s to %s\n",
-			       node->clone_of->dump_name (),
-			       node->dump_name ());
-		      if (node->clone.tree_map)
-		        {
-			  unsigned int i;
-			  fprintf (symtab->dump_file, "    replace map:");
-			  for (i = 0;
-			       i < vec_safe_length (node->clone.tree_map);
-			       i++)
-			    {
-			      ipa_replace_map *replace_info;
-			      replace_info = (*node->clone.tree_map)[i];
-			      fprintf (symtab->dump_file, "%s %i -> ",
-				       i ? "," : "", replace_info->parm_num);
-			      print_generic_expr (symtab->dump_file,
-						  replace_info->new_tree);
-			    }
-			  fprintf (symtab->dump_file, "\n");
-			}
-		      if (node->clone.param_adjustments)
-			node->clone.param_adjustments->dump (symtab->dump_file);
-		    }
-		  cgraph_materialize_clone (node);
-		  stabilized = false;
-	        }
+	      ipa_replace_map *replace_info;
+	      replace_info = (*clone.tree_map)[i];
+	      fprintf (symtab->dump_file, "%s %i -> ",
+		       i ? "," : "", replace_info->parm_num);
+	      print_generic_expr (symtab->dump_file,
+				  replace_info->new_tree);
 	    }
+	  fprintf (symtab->dump_file, "\n");
 	}
+      if (clone.param_adjustments)
+	clone.param_adjustments->dump (symtab->dump_file);
     }
-  FOR_EACH_FUNCTION (node)
-    if (!node->analyzed && node->callees)
-      {
-	node->remove_callees ();
-	node->remove_all_references ();
-      }
-    else
-      node->clear_stmts_in_references ();
+  /* Copy the OLD_VERSION_NODE function tree to the new version.  */
+  tree_function_versioning (clone_of->decl, decl,
+			    clone.tree_map, clone.param_adjustments,
+			    true, NULL, NULL);
   if (symtab->dump_file)
-    fprintf (symtab->dump_file, "Materialization Call site updates done.\n");
-
-  cgraph_node::checking_verify_cgraph_nodes ();
+    {
+      dump_function_to_file (clone_of->decl, symtab->dump_file,
+			     dump_flags);
+      dump_function_to_file (decl, symtab->dump_file, dump_flags);
+    }
 
-  symtab->remove_unreachable_nodes (symtab->dump_file);
+  cgraph_node *this_clone_of = clone_of;
+  /* Function is no longer clone.  */
+  remove_from_clone_tree ();
+  if (!this_clone_of->analyzed && !this_clone_of->clones)
+    {
+      this_clone_of->release_body ();
+      this_clone_of->remove_callees ();
+      this_clone_of->remove_all_references ();
+    }
 }
 
 #include "gt-cgraphclones.h"
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 05713c28cf0..1e2262789dd 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1601,6 +1601,7 @@  mark_functions_to_output (void)
   FOR_EACH_FUNCTION (node)
     {
       tree decl = node->decl;
+      node->clear_stmts_in_references ();
 
       gcc_assert (!node->process || node->same_comdat_group);
       if (node->process)
@@ -2274,6 +2275,9 @@  cgraph_node::expand (void)
   announce_function (decl);
   process = 0;
   gcc_assert (lowered);
+
+  /* Initialize the default bitmap obstack.  */
+  bitmap_obstack_initialize (NULL);
   get_untransformed_body ();
 
   /* Generate RTL for the body of DECL.  */
@@ -2282,9 +2286,6 @@  cgraph_node::expand (void)
 
   gcc_assert (symtab->global_info_ready);
 
-  /* Initialize the default bitmap obstack.  */
-  bitmap_obstack_initialize (NULL);
-
   /* Initialize the RTL code for the function.  */
   saved_loc = input_location;
   input_location = DECL_SOURCE_LOCATION (decl);
@@ -2298,7 +2299,8 @@  cgraph_node::expand (void)
   bitmap_obstack_initialize (&reg_obstack); /* FIXME, only at RTL generation*/
 
   update_ssa (TODO_update_ssa_only_virtuals);
-  execute_all_ipa_transforms (false);
+  if (ipa_transforms_to_apply.exists ())
+    execute_all_ipa_transforms (false);
 
   /* Perform all tree transforms and optimizations.  */
 
diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index af2c2856aaa..f419df04961 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -644,16 +644,16 @@  save_inline_function_body (struct cgraph_node *node)
   tree_function_versioning (node->decl, first_clone->decl,
 			    NULL, NULL, true, NULL, NULL);
 
-  /* The function will be short lived and removed after we inline all the clones,
-     but make it internal so we won't confuse ourself.  */
+  /* The function will be short lived and removed after we inline all the
+     clones, but make it internal so we won't confuse ourself.  */
   DECL_EXTERNAL (first_clone->decl) = 0;
   TREE_PUBLIC (first_clone->decl) = 0;
   DECL_COMDAT (first_clone->decl) = 0;
   first_clone->ipa_transforms_to_apply.release ();
 
   /* When doing recursive inlining, the clone may become unnecessary.
-     This is possible i.e. in the case when the recursive function is proved to be
-     non-throwing and the recursion happens only in the EH landing pad.
+     This is possible i.e. in the case when the recursive function is proved to
+     be non-throwing and the recursion happens only in the EH landing pad.
      We cannot remove the clone until we are done with saving the body.
      Remove it now.  */
   if (!first_clone->callers)
@@ -696,6 +696,14 @@  inline_transform (struct cgraph_node *node)
   if (cfun->after_inlining)
     return 0;
 
+  cgraph_node *next_clone;
+  for (cgraph_node *n = node->clones; n; n = next_clone)
+    {
+      next_clone = n->next_sibling_clone;
+      if (n->decl != node->decl)
+	n->materialize_clone ();
+    }
+
   /* We might need the body of this function so that we can expand
      it inline somewhere else.  */
   if (preserve_function_body_p (node))
diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 5fc0de56556..438f4bd5a68 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -783,6 +783,13 @@  ipa_param_adjustments::modify_call (gcall *stmt,
     {
       vec<tree, va_gc> **debug_args = NULL;
       unsigned i = 0;
+      cgraph_node *callee_node = cgraph_node::get (callee_decl);
+
+      /* FIXME: we don't seem to be able to insert debug args before clone
+	 is materialized.  Materializing them early leads to extra memory
+	 use.  */
+      if (callee_node->clone_of)
+	callee_node->get_untransformed_body ();
       for (tree old_parm = DECL_ARGUMENTS (old_decl);
 	   old_parm && i < old_nargs && ((int) i) < m_always_copy_start;
 	   old_parm = DECL_CHAIN (old_parm), i++)
diff --git a/gcc/ipa.c b/gcc/ipa.c
index 288b58cf73d..ab7256d857f 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -1386,43 +1386,3 @@  make_pass_ipa_single_use (gcc::context *ctxt)
   return new pass_ipa_single_use (ctxt);
 }
 
-/* Materialize all clones.  */
-
-namespace {
-
-const pass_data pass_data_materialize_all_clones =
-{
-  SIMPLE_IPA_PASS, /* type */
-  "materialize-all-clones", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
-  TV_IPA_OPT, /* tv_id */
-  0, /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  0, /* todo_flags_finish */
-};
-
-class pass_materialize_all_clones : public simple_ipa_opt_pass
-{
-public:
-  pass_materialize_all_clones (gcc::context *ctxt)
-    : simple_ipa_opt_pass (pass_data_materialize_all_clones, ctxt)
-  {}
-
-  /* opt_pass methods: */
-  virtual unsigned int execute (function *)
-    {
-      symtab->materialize_all_clones ();
-      return 0;
-    }
-
-}; // class pass_materialize_all_clones
-
-} // anon namespace
-
-simple_ipa_opt_pass *
-make_pass_materialize_all_clones (gcc::context *ctxt)
-{
-  return new pass_materialize_all_clones (ctxt);
-}
diff --git a/gcc/passes.c b/gcc/passes.c
index 6ff31ec37d7..1942b7cd1c3 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2271,6 +2271,14 @@  execute_all_ipa_transforms (bool do_not_collect)
     return;
   node = cgraph_node::get (current_function_decl);
 
+  cgraph_node *next_clone;
+  for (cgraph_node *n = node->clones; n; n = next_clone)
+    {
+      next_clone = n->next_sibling_clone;
+      if (n->decl != node->decl)
+	n->materialize_clone ();
+    }
+
   if (node->ipa_transforms_to_apply.exists ())
     {
       unsigned int i;
diff --git a/gcc/passes.def b/gcc/passes.def
index f865bdc19ac..cf15d8eafca 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -172,7 +172,6 @@  along with GCC; see the file COPYING3.  If not see
      passes are executed after partitioning and thus see just parts of the
      compiled unit.  */
   INSERT_PASSES_AFTER (all_late_ipa_passes)
-  NEXT_PASS (pass_materialize_all_clones);
   NEXT_PASS (pass_ipa_pta);
   NEXT_PASS (pass_omp_simd_clone);
   TERMINATE_PASS_LIST (all_late_ipa_passes)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 62e5b696cab..1e8badfe4be 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -519,8 +519,6 @@  extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_modref (gcc::context *ctxt);
-extern simple_ipa_opt_pass *make_pass_materialize_all_clones (gcc::context *
-							      ctxt);
 
 extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context
 							       *ctxt);