Message ID | 20191105103642.y7obi2q65lqykecv@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Add object allocators to symbol and call summaries | expand |
On 11/5/19 11:36 AM, Jan Hubicka wrote: > Hi, > this patch adds object allocators to manage IPA summaries. This reduces > malloc overhead and fragmentation. I now get peak memory use 7.5GB instead > of 10GB for firefox WPA because reduced fragmentation leads to less COWs after > forks. That sounds promising. > Additional bonus is that we now have statistics gathered by mem-reports > which makes my life easier, too. What's currently bad with the detailed memory statistics? I updated the code that one should see the allocation for the underlying hash_map and vec? > (though memory stats are far from ideal - we need to pass location info around > bit more). Please rename allocator to m_allocator. Martin
On 11/5/19 11:45 AM, Martin Liška wrote:
> Please rename allocator to m_allocator.
You were faster and installed that patch.
Thus I'm sending the adjustment.
Martin
> On 11/5/19 11:36 AM, Jan Hubicka wrote: > > Hi, > > this patch adds object allocators to manage IPA summaries. This reduces > > malloc overhead and fragmentation. I now get peak memory use 7.5GB instead > > of 10GB for firefox WPA because reduced fragmentation leads to less COWs after > > forks. > > That sounds promising. > > > Additional bonus is that we now have statistics gathered by mem-reports > > which makes my life easier, too. > > What's currently bad with the detailed memory statistics? I updated the > code that one should see the allocation for the underlying hash_map and > vec? I currently get: -------------------------------------------------------------------------------------------------------------------------------------------- Pool name Allocation pool Pools Leak Peak Times Elt size -------------------------------------------------------------------------------------------------------------------------------------------- tree_scc lto/lto-common.c:2709 (read_cgraph_and_symbols) 1 0 : 0.0% 99M 3169k: 43.7% 32 IPA histogram ipa-profile.c:77 (__static_initialization_and_de 1 16 : 0.0% 16 1 : 0.0% 16 IPA-PROP ref descriptions ipa-prop.c:170 (__static_initialization_and_dest 1 226k: 0.3% 226k 9670 : 0.1% 24 function summary ipa-fnsummary.c:557 (ipa_fn_summary_alloc) 1 6145k: 7.0% 6257k 391k: 5.4% 16 function summary ipa-pure-const.c:136 (__base_ctor ) 1 6863k: 7.9% 9449k 590k: 8.1% 16 edge predicates ipa-fnsummary.c:93 (__static_initialization_and_ 1 8327k: 9.5% 8385k 209k: 2.9% 40 call summary ipa-sra.c:436 (__base_ctor ) 1 18M: 21.3% 21M 1393k: 19.2% 16 call summary ipa-fnsummary.h:276 (__base_ctor ) 1 46M: 54.0% 46M 1483k: 20.5% 32 -------------------------------------------------------------------------------------------------------------------------------------------- Pool name Allocation pool Pools Leak Peak Times Elt size -------------------------------------------------------------------------------------------------------------------------------------------- Total 9 85M -------------------------------------------------------------------------------------------------------------------------------------------- This is quite readable, though we may give them different names and update constructors. Not a big deal IMO. For GGC statistics I see: varpool.c:137 (create_empty) 7924k: 0.4% 0 : 0.0% 3214k: 0.2% 0 : 0.0% 87k cgraph.c:939 (cgraph_allocate_init_indirect_info 8566k: 0.4% 0 : 0.0% 1395k: 0.1% 0 : 0.0% 113k alias.c:1170 (record_alias_subset) 12M: 0.6% 0 : 0.0% 12k: 0.0% 99k: 0.1% 12k ipa-sra.c:2717 (isra_read_node_info) 12M: 0.6% 0 : 0.0% 4179k: 0.2% 21k: 0.0% 376k toplev.c:904 (realloc_for_line_map) 16M: 0.8% 0 : 0.0% 15M: 0.9% 144 : 0.0% 12 ipa-prop.c:278 (ipa_alloc_node_params) 16M: 0.8% 266k: 0.4% 0 : 0.0% 22k: 0.0% 366k symbol-summary.h:555 (allocate_new) 18M: 0.9% 0 : 0.0% 119k: 0.0% 0 : 0.0% 1171k ^^^ here we should point the caller of get_create ipa-fnsummary.c:3877 (inline_read_section) 28M: 1.4% 0 : 0.0% 552k: 0.0% 392k: 0.3% 261k lto-section-in.c:388 (lto_new_in_decl_state) 29M: 1.4% 0 : 0.0% 11M: 0.7% 0 : 0.0% 587k symtab.c:582 (create_reference) 35M: 1.7% 0 : 0.0% 50M: 2.9% 1199k: 0.9% 541k symbol-summary.h:64 (allocate_new) 46M: 2.2% 0 : 0.0% 2445k: 0.1% 0 : 0.0% 1168k ^^^ same here. stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 ^^^ some of memory comes here which ought to be accounted to caller of expand. stringpool.c:41 (stringpool_ggc_alloc) 92M: 4.4% 0 : 0.0% 0 : 0.0% 6600k: 5.2% 1217k cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k cgraph.c:851 (create_edge) 149M: 7.1% 0 : 0.0% 27M: 1.6% 0 : 0.0% 1743k ipa-fnsummary.c:3936 (inline_read_section) 174M: 8.3% 0 : 0.0% 4190k: 0.2% 12M: 10.2% 391k lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 9.6% 0 : 0.0% 65M: 3.8% 19M: 15.5% 1731k ipa-prop.c:4478 (ipa_read_edge_info) 210M: 10.0% 0 : 0.0% 1361k: 0.1% 17M: 14.4% 1171k tree-streamer-in.c:631 (streamer_alloc_tree) 647M: 30.8% 55M: 84.5% 1267M: 73.4% 64M: 52.1% 13M -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 2100M:100.0% 65M:100.0% 1726M:100.0% 124M:100.0% 29M -------------------------------------------------------------------------------------------------------------------------------------------- One very odd thing is that at the end of WPA of firefox I see: hash-table.h:801 (expand) 100M: 2.9% 2088 : 0.0% 193M: 6.4% 90k: 0.0% 3379 tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 3.0% 0 : 0.0% 39M: 1.3% 0 : 0.0% 105k stringpool.c:41 (stringpool_ggc_alloc) 106M: 3.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k ipa-fnsummary.c:3936 (inline_read_section) 174M: 5.1% 0 : 0.0% 4190k: 0.1% 12M: 4.0% 391k ^^^ those are size_tale vectors that ought to be freed. lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 5.8% 0 : 0.0% 65M: 2.2% 19M: 6.1% 1731k ipa-prop.c:4478 (ipa_read_edge_info) 210M: 6.1% 0 : 0.0% 1361k: 0.0% 17M: 5.7% 1171k ^^^ those are jumptables that ought to be freed too. cgraph.c:851 (create_edge) 285M: 8.3% 0 : 0.0% 33M: 1.1% 0 : 0.0% 3141k cgraph.h:2712 (allocate_cgraph_symbol) 417M: 12.1% 0 : 0.0% 121M: 4.0% 0 : 0.0% 1567k tree-streamer-in.c:631 (streamer_alloc_tree) 758M: 22.0% 96M: 23.0% 1267M: 41.7% 64M: 20.6% 15M -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 3453M:100.0% 418M:100.0% 3039M:100.0% 313M:100.0% 49M -------------------------------------------------------------------------------------------------------------------------------------------- I am not sure where the problem is - it is GGC memory and we release those summaries after inlining so there should not be any pointers to them. At worst it should account to garbage, so it may be also some accounting bug. I suppose first thing to try is to breakpoint in the ggc walker of these and see if it shows up in the final ggc. Honza
> On 11/5/19 11:45 AM, Martin Liška wrote: > > Please rename allocator to m_allocator. > > You were faster and installed that patch. > > Thus I'm sending the adjustment. > > Martin > From 6edd5d8c4afb0451aaaf05ba857435219b31814d Mon Sep 17 00:00:00 2001 > From: Martin Liska <mliska@suse.cz> > Date: Tue, 5 Nov 2019 11:50:32 +0100 > Subject: [PATCH] Update coding style in symbol-summary.h. > > gcc/ChangeLog: > > 2019-11-05 Martin Liska <mliska@suse.cz> > > * symbol-summary.h: Rename allocator to m_allocator and > add comment. This is OK, thanks! Honza > --- > gcc/symbol-summary.h | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h > index d663cbb90fd..8aedcfe9143 100644 > --- a/gcc/symbol-summary.h > +++ b/gcc/symbol-summary.h > @@ -31,7 +31,7 @@ public: > function_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO): > m_symtab (symtab), > m_insertion_enabled (true), > - allocator ("function summary" PASS_MEM_STAT) > + m_allocator ("function summary" PASS_MEM_STAT) > {} > > /* Basic implementation of insert operation. */ > @@ -62,7 +62,7 @@ protected: > /* Call gcc_internal_because we do not want to call finalizer for > a type T. We call dtor explicitly. */ > return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () > - : allocator.allocate () ; > + : m_allocator.allocate () ; > } > > /* Release an item that is stored within map. */ > @@ -74,7 +74,7 @@ protected: > ggc_free (item); > } > else > - allocator.remove (item); > + m_allocator.remove (item); > } > > /* Unregister all call-graph hooks. */ > @@ -95,7 +95,9 @@ protected: > private: > /* Return true when the summary uses GGC memory for allocation. */ > virtual bool is_ggc () = 0; > - object_allocator<T> allocator; > + > + /* Object allocator for heap allocation. */ > + object_allocator<T> m_allocator; > }; > > template <typename T> > @@ -537,7 +539,7 @@ public: > call_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO): > m_symtab (symtab), > m_initialize_when_cloning (false), > - allocator ("call summary" PASS_MEM_STAT) > + m_allocator ("call summary" PASS_MEM_STAT) > {} > > /* Basic implementation of removal operation. */ > @@ -553,7 +555,7 @@ protected: > /* Call gcc_internal_because we do not want to call finalizer for > a type T. We call dtor explicitly. */ > return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () > - : allocator.allocate (); > + : m_allocator.allocate (); > } > > /* Release an item that is stored within map. */ > @@ -565,7 +567,7 @@ protected: > ggc_free (item); > } > else > - allocator.remove (item); > + m_allocator.remove (item); > } > > /* Unregister all call-graph hooks. */ > @@ -584,7 +586,9 @@ protected: > private: > /* Return true when the summary uses GGC memory for allocation. */ > virtual bool is_ggc () = 0; > - object_allocator<T> allocator; > + > + /* Object allocator for heap allocation. */ > + object_allocator<T> m_allocator; > }; > > template <typename T> > -- > 2.23.0 >
On 11/5/19 12:01 PM, Jan Hubicka wrote: >> On 11/5/19 11:36 AM, Jan Hubicka wrote: >>> Hi, >>> this patch adds object allocators to manage IPA summaries. This reduces >>> malloc overhead and fragmentation. I now get peak memory use 7.5GB instead >>> of 10GB for firefox WPA because reduced fragmentation leads to less COWs after >>> forks. >> >> That sounds promising. >> >>> Additional bonus is that we now have statistics gathered by mem-reports >>> which makes my life easier, too. >> >> What's currently bad with the detailed memory statistics? I updated the >> code that one should see the allocation for the underlying hash_map and >> vec? > > I currently get: > > -------------------------------------------------------------------------------------------------------------------------------------------- > Pool name Allocation pool Pools Leak Peak Times Elt size > -------------------------------------------------------------------------------------------------------------------------------------------- > tree_scc lto/lto-common.c:2709 (read_cgraph_and_symbols) 1 0 : 0.0% 99M 3169k: 43.7% 32 > IPA histogram ipa-profile.c:77 (__static_initialization_and_de 1 16 : 0.0% 16 1 : 0.0% 16 > IPA-PROP ref descriptions ipa-prop.c:170 (__static_initialization_and_dest 1 226k: 0.3% 226k 9670 : 0.1% 24 > function summary ipa-fnsummary.c:557 (ipa_fn_summary_alloc) 1 6145k: 7.0% 6257k 391k: 5.4% 16 > function summary ipa-pure-const.c:136 (__base_ctor ) 1 6863k: 7.9% 9449k 590k: 8.1% 16 > edge predicates ipa-fnsummary.c:93 (__static_initialization_and_ 1 8327k: 9.5% 8385k 209k: 2.9% 40 > call summary ipa-sra.c:436 (__base_ctor ) 1 18M: 21.3% 21M 1393k: 19.2% 16 > call summary ipa-fnsummary.h:276 (__base_ctor ) 1 46M: 54.0% 46M 1483k: 20.5% 32 > -------------------------------------------------------------------------------------------------------------------------------------------- > Pool name Allocation pool Pools Leak Peak Times Elt size > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 9 85M > -------------------------------------------------------------------------------------------------------------------------------------------- > > This is quite readable, though we may give them different names and > update constructors. Not a big deal IMO. > > For GGC statistics I see: > > varpool.c:137 (create_empty) 7924k: 0.4% 0 : 0.0% 3214k: 0.2% 0 : 0.0% 87k > cgraph.c:939 (cgraph_allocate_init_indirect_info 8566k: 0.4% 0 : 0.0% 1395k: 0.1% 0 : 0.0% 113k > alias.c:1170 (record_alias_subset) 12M: 0.6% 0 : 0.0% 12k: 0.0% 99k: 0.1% 12k > ipa-sra.c:2717 (isra_read_node_info) 12M: 0.6% 0 : 0.0% 4179k: 0.2% 21k: 0.0% 376k > toplev.c:904 (realloc_for_line_map) 16M: 0.8% 0 : 0.0% 15M: 0.9% 144 : 0.0% 12 > ipa-prop.c:278 (ipa_alloc_node_params) 16M: 0.8% 266k: 0.4% 0 : 0.0% 22k: 0.0% 366k > symbol-summary.h:555 (allocate_new) 18M: 0.9% 0 : 0.0% 119k: 0.0% 0 : 0.0% 1171k > ^^^ here we should point the caller of get_create > > ipa-fnsummary.c:3877 (inline_read_section) 28M: 1.4% 0 : 0.0% 552k: 0.0% 392k: 0.3% 261k > lto-section-in.c:388 (lto_new_in_decl_state) 29M: 1.4% 0 : 0.0% 11M: 0.7% 0 : 0.0% 587k > symtab.c:582 (create_reference) 35M: 1.7% 0 : 0.0% 50M: 2.9% 1199k: 0.9% 541k > symbol-summary.h:64 (allocate_new) 46M: 2.2% 0 : 0.0% 2445k: 0.1% 0 : 0.0% 1168k > ^^^ same here. > > stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > ^^^ some of memory comes here which ought to be accounted to caller of > expand. Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description for each created symbol/call_summary and register manually every allocation to such descriptor. > stringpool.c:41 (stringpool_ggc_alloc) 92M: 4.4% 0 : 0.0% 0 : 0.0% 6600k: 5.2% 1217k > cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k > cgraph.c:851 (create_edge) 149M: 7.1% 0 : 0.0% 27M: 1.6% 0 : 0.0% 1743k > ipa-fnsummary.c:3936 (inline_read_section) 174M: 8.3% 0 : 0.0% 4190k: 0.2% 12M: 10.2% 391k > lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 9.6% 0 : 0.0% 65M: 3.8% 19M: 15.5% 1731k > ipa-prop.c:4478 (ipa_read_edge_info) 210M: 10.0% 0 : 0.0% 1361k: 0.1% 17M: 14.4% 1171k > tree-streamer-in.c:631 (streamer_alloc_tree) 647M: 30.8% 55M: 84.5% 1267M: 73.4% 64M: 52.1% 13M > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 2100M:100.0% 65M:100.0% 1726M:100.0% 124M:100.0% 29M > -------------------------------------------------------------------------------------------------------------------------------------------- > > One very odd thing is that at the end of WPA of firefox I see: > > hash-table.h:801 (expand) 100M: 2.9% 2088 : 0.0% 193M: 6.4% 90k: 0.0% 3379 > tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 3.0% 0 : 0.0% 39M: 1.3% 0 : 0.0% 105k > stringpool.c:41 (stringpool_ggc_alloc) 106M: 3.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k > ipa-fnsummary.c:3936 (inline_read_section) 174M: 5.1% 0 : 0.0% 4190k: 0.1% 12M: 4.0% 391k > ^^^ those are size_tale vectors that ought to be freed. > > lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 5.8% 0 : 0.0% 65M: 2.2% 19M: 6.1% 1731k > ipa-prop.c:4478 (ipa_read_edge_info) 210M: 6.1% 0 : 0.0% 1361k: 0.0% 17M: 5.7% 1171k > ^^^ those are jumptables that ought to be freed too. I verified this and I can confirm that class GTY((for_user)) ipa_edge_args { public: /* Default constructor. */ ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL) {} /* Destructor. */ ~ipa_edge_args () { vec_free (jump_functions); vec_free (polymorphic_call_contexts); } is called which then calls vec_free. I traced that down and m_reverse_object_map does not contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory is released. Martin > > cgraph.c:851 (create_edge) 285M: 8.3% 0 : 0.0% 33M: 1.1% 0 : 0.0% 3141k > cgraph.h:2712 (allocate_cgraph_symbol) 417M: 12.1% 0 : 0.0% 121M: 4.0% 0 : 0.0% 1567k > tree-streamer-in.c:631 (streamer_alloc_tree) 758M: 22.0% 96M: 23.0% 1267M: 41.7% 64M: 20.6% 15M > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 3453M:100.0% 418M:100.0% 3039M:100.0% 313M:100.0% 49M > -------------------------------------------------------------------------------------------------------------------------------------------- > > I am not sure where the problem is - it is GGC memory and we release > those summaries after inlining so there should not be any pointers to > them. At worst it should account to garbage, so it may be also some > accounting bug. > > I suppose first thing to try is to breakpoint in the ggc walker of these > and see if it shows up in the final ggc. > > Honza >
> > > > stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > > hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > > ^^^ some of memory comes here which ought to be accounted to caller of > > expand. > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description > for each created symbol/call_summary and register manually every allocation to such descriptor. Or just pass memory stats from caller of expand and transitively pass it from caller of summary. This will get us the line info of get_create call that is IMO OK. > > lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 5.8% 0 : 0.0% 65M: 2.2% 19M: 6.1% 1731k > > ipa-prop.c:4478 (ipa_read_edge_info) 210M: 6.1% 0 : 0.0% 1361k: 0.0% 17M: 5.7% 1171k > > ^^^ those are jumptables that ought to be freed too. > > I verified this and I can confirm that > > class GTY((for_user)) ipa_edge_args > { > public: > > /* Default constructor. */ > ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL) > {} > > /* Destructor. */ > ~ipa_edge_args () > { > vec_free (jump_functions); > vec_free (polymorphic_call_contexts); > } > > is called which then calls vec_free. I traced that down and m_reverse_object_map does not > contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory > is released. Well, it is not very minor given that it is third largest allocation. It would be nice to track this down so we can trust the summaries again. It is very strange that overhead gets registered but never unregistered - both freeing or garbage collecting should do that. I will try to see what happens. Honza > > Martin > > > > > cgraph.c:851 (create_edge) 285M: 8.3% 0 : 0.0% 33M: 1.1% 0 : 0.0% 3141k > > cgraph.h:2712 (allocate_cgraph_symbol) 417M: 12.1% 0 : 0.0% 121M: 4.0% 0 : 0.0% 1567k > > tree-streamer-in.c:631 (streamer_alloc_tree) 758M: 22.0% 96M: 23.0% 1267M: 41.7% 64M: 20.6% 15M > > -------------------------------------------------------------------------------------------------------------------------------------------- > > GGC memory Leak Garbage Freed Overhead Times > > -------------------------------------------------------------------------------------------------------------------------------------------- > > Total 3453M:100.0% 418M:100.0% 3039M:100.0% 313M:100.0% 49M > > -------------------------------------------------------------------------------------------------------------------------------------------- > > > > I am not sure where the problem is - it is GGC memory and we release > > those summaries after inlining so there should not be any pointers to > > them. At worst it should account to garbage, so it may be also some > > accounting bug. > > > > I suppose first thing to try is to breakpoint in the ggc walker of these > > and see if it shows up in the final ggc. > > > > Honza > > >
On 11/5/19 3:48 PM, Jan Hubicka wrote: >>> >>> stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k >>> ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k >>> hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 >>> ^^^ some of memory comes here which ought to be accounted to caller of >>> expand. >> >> Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description >> for each created symbol/call_summary and register manually every allocation to such descriptor. > > Or just pass memory stats from caller of expand and transitively pass it > from caller of summary. This will get us the line info of get_create > call that is IMO OK. The issue with this approach is that you will spread a summary allocation along all the ::get_create places. Which is not ideal. >>> lto/lto-common.c:204 (lto_read_in_decl_state) 200M: 5.8% 0 : 0.0% 65M: 2.2% 19M: 6.1% 1731k >>> ipa-prop.c:4478 (ipa_read_edge_info) 210M: 6.1% 0 : 0.0% 1361k: 0.0% 17M: 5.7% 1171k >>> ^^^ those are jumptables that ought to be freed too. >> >> I verified this and I can confirm that >> >> class GTY((for_user)) ipa_edge_args >> { >> public: >> >> /* Default constructor. */ >> ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL) >> {} >> >> /* Destructor. */ >> ~ipa_edge_args () >> { >> vec_free (jump_functions); >> vec_free (polymorphic_call_contexts); >> } >> >> is called which then calls vec_free. I traced that down and m_reverse_object_map does not >> contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory >> is released. > > Well, it is not very minor given that it is third largest allocation. It > would be nice to track this down so we can trust the summaries again. > It is very strange that overhead gets registered but never unregistered > - both freeing or garbage collecting should do that. > > I will try to see what happens. Try to take a look, or we can debug that on Thursday together? Martin > > Honza >> >> Martin >> >>> >>> cgraph.c:851 (create_edge) 285M: 8.3% 0 : 0.0% 33M: 1.1% 0 : 0.0% 3141k >>> cgraph.h:2712 (allocate_cgraph_symbol) 417M: 12.1% 0 : 0.0% 121M: 4.0% 0 : 0.0% 1567k >>> tree-streamer-in.c:631 (streamer_alloc_tree) 758M: 22.0% 96M: 23.0% 1267M: 41.7% 64M: 20.6% 15M >>> -------------------------------------------------------------------------------------------------------------------------------------------- >>> GGC memory Leak Garbage Freed Overhead Times >>> -------------------------------------------------------------------------------------------------------------------------------------------- >>> Total 3453M:100.0% 418M:100.0% 3039M:100.0% 313M:100.0% 49M >>> -------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> I am not sure where the problem is - it is GGC memory and we release >>> those summaries after inlining so there should not be any pointers to >>> them. At worst it should account to garbage, so it may be also some >>> accounting bug. >>> >>> I suppose first thing to try is to breakpoint in the ggc walker of these >>> and see if it shows up in the final ggc. >>> >>> Honza >>> >>
> On 11/5/19 3:48 PM, Jan Hubicka wrote: > > > > > > > > stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > > > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > > > > hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > > > > ^^^ some of memory comes here which ought to be accounted to caller of > > > > expand. > > > > > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description > > > for each created symbol/call_summary and register manually every allocation to such descriptor. > > > > Or just pass memory stats from caller of expand and transitively pass it > > from caller of summary. This will get us the line info of get_create > > call that is IMO OK. > > The issue with this approach is that you will spread a summary allocation > along all the ::get_create places. Which is not ideal. We get it with other allocations, too. Not ideal, but better. Even better solutions are welcome :) > > Try to take a look, or we can debug that on Thursday together? > Martin Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks all active allocations objects and looks if they was collected accoutnig their collection and then throws away all allocations (including those not colelcted) and those gets no longer accounted later. So we basically misaccount everything that survives ggc_collect. No wonder that it makes me to hunt ghosts 8-O Also the last memory report was sorted by garbage not leak for reason - for normal compilation we care about garbage produces primarily because those triggers ggc collects and makes compiler slow. BTW I like how advanced C++ gets back to lisp :) With the fix I get following stats by end of firefox WPA cfg.c:127 (alloc_block) 32M: 1.2% 12M: 2.6% 0 : 0.0% 0 : 0.0% 446k symtab.c:582 (create_reference) 42M: 1.6% 0 : 0.0% 65M: 1.7% 1329k: 0.4% 840k gimple-streamer-in.c:101 (input_gimple_stmt) 49M: 1.9% 17M: 3.5% 0 : 0.0% 375k: 0.1% 747k tree-ssanames.c:308 (make_ssa_name_fn) 51M: 2.0% 16M: 3.4% 0 : 0.0% 0 : 0.0% 973k ipa-cp.c:5157 (ipcp_store_vr_results) 51M: 2.0% 1243k: 0.2% 0 : 0.0% 9561k: 3.0% 146k stringpool.c:63 (alloc_node) 53M: 2.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1362k ipa-prop.c:3988 (duplicate) 63M: 2.4% 1115k: 0.2% 0 : 0.0% 10M: 3.2% 264k toplev.c:904 (realloc_for_line_map) 72M: 2.8% 0 : 0.0% 71M: 1.9% 15M: 5.1% 27 tree-ssanames.c:83 (init_ssanames) 96M: 3.7% 121M: 24.4% 44M: 1.2% 87M: 27.8% 174k tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 4.0% 0 : 0.0% 39M: 1.0% 0 : 0.0% 105k stringpool.c:41 (stringpool_ggc_alloc) 106M: 4.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k lto/lto-common.c:204 (lto_read_in_decl_state) 160M: 6.2% 0 : 0.0% 105M: 2.8% 19M: 6.1% 1731k cgraph.c:851 (create_edge) 248M: 9.5% 0 : 0.0% 70M: 1.9% 0 : 0.0% 3141k cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7% 0 : 0.0% 155M: 4.1% 0 : 0.0% 1567k tree-streamer-in.c:631 (streamer_alloc_tree) 718M: 27.5% 136M: 27.5% 1267M: 33.3% 64M: 20.6% 15M -------------------------------------------------------------------------------------------------------------------------------------------- GGC memory Leak Garbage Freed Overhead Times -------------------------------------------------------------------------------------------------------------------------------------------- Total 2609M:100.0% 497M:100.0% 3804M:100.0% 313M:100.0% 49M -------------------------------------------------------------------------------------------------------------------------------------------- This looks more realistic. ssa_operands and init_ssanames shows that we read really a lot of bodies into memory. I also wonder if we realy want to compute virutal ssa form for them when we only want to compare them. After reading and symbol table merging I get: cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k So it seems that about half of callgrpah nodes are inline clones, so working on reducing clone overhead (in addition to re-visiting tree merging once again) seems to be most meaningful right now. OK if patch passes testing? * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving memory allocations away. * mem-stats.h (mem_alloc_description<T>::release_object_overhead): do not silently ignore invalid release requests. Index: ggc-common.c =================================================================== --- ggc-common.c (revision 277796) +++ ggc-common.c (working copy) @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void) for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it) if (!ggc_marked_p ((*it).first)) - (*it).second.first->m_collected += (*it).second.second; - - delete ggc_mem_desc.m_reverse_object_map; - ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false); + { + (*it).second.first->m_collected += (*it).second.second; + ggc_mem_desc.m_reverse_object_map->remove ((*it).first); + } } /* Return memory used by heap in kb, 0 if this info is not available. */ Index: mem-stats.h =================================================================== --- mem-stats.h (revision 277796) +++ mem-stats.h (working copy) @@ -535,11 +535,8 @@ inline void mem_alloc_description<T>::release_object_overhead (void *ptr) { std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr); - if (entry) - { - entry->first->release_overhead (entry->second); - m_reverse_object_map->remove (ptr); - } + entry->first->release_overhead (entry->second); + m_reverse_object_map->remove (ptr); } /* Unregister a memory allocation descriptor registered with
On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka <hubicka@ucw.cz> wrote: > > > On 11/5/19 3:48 PM, Jan Hubicka wrote: > > > > > > > > > > stringpool.c:63 (alloc_node) 47M: 2.3% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1217k > > > > > ipa-prop.c:4480 (ipa_read_edge_info) 51M: 2.4% 0 : 0.0% 260k: 0.0% 404k: 0.3% 531k > > > > > hash-table.h:801 (expand) 81M: 3.9% 0 : 0.0% 80M: 4.7% 88k: 0.1% 3349 > > > > > ^^^ some of memory comes here which ought to be accounted to caller of > > > > > expand. > > > > > > > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description > > > > for each created symbol/call_summary and register manually every allocation to such descriptor. > > > > > > Or just pass memory stats from caller of expand and transitively pass it > > > from caller of summary. This will get us the line info of get_create > > > call that is IMO OK. > > > > The issue with this approach is that you will spread a summary allocation > > along all the ::get_create places. Which is not ideal. > > We get it with other allocations, too. Not ideal, but better. > Even better solutions are welcome :) > > > > Try to take a look, or we can debug that on Thursday together? > > Martin > > Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks > all active allocations objects and looks if they was collected accoutnig > their collection and then throws away all allocations (including those > not colelcted) and those gets no longer accounted later. So we > basically misaccount everything that survives ggc_collect. > > No wonder that it makes me to hunt ghosts 8-O > > Also the last memory report was sorted by garbage not leak for reason - > for normal compilation we care about garbage produces primarily because > those triggers ggc collects and makes compiler slow. > > BTW I like how advanced C++ gets back to lisp :) > > With the fix I get following stats by end of firefox WPA > > cfg.c:127 (alloc_block) 32M: 1.2% 12M: 2.6% 0 : 0.0% 0 : 0.0% 446k > symtab.c:582 (create_reference) 42M: 1.6% 0 : 0.0% 65M: 1.7% 1329k: 0.4% 840k > gimple-streamer-in.c:101 (input_gimple_stmt) 49M: 1.9% 17M: 3.5% 0 : 0.0% 375k: 0.1% 747k > tree-ssanames.c:308 (make_ssa_name_fn) 51M: 2.0% 16M: 3.4% 0 : 0.0% 0 : 0.0% 973k > ipa-cp.c:5157 (ipcp_store_vr_results) 51M: 2.0% 1243k: 0.2% 0 : 0.0% 9561k: 3.0% 146k > stringpool.c:63 (alloc_node) 53M: 2.0% 0 : 0.0% 0 : 0.0% 0 : 0.0% 1362k > ipa-prop.c:3988 (duplicate) 63M: 2.4% 1115k: 0.2% 0 : 0.0% 10M: 3.2% 264k > toplev.c:904 (realloc_for_line_map) 72M: 2.8% 0 : 0.0% 71M: 1.9% 15M: 5.1% 27 > tree-ssanames.c:83 (init_ssanames) 96M: 3.7% 121M: 24.4% 44M: 1.2% 87M: 27.8% 174k > tree-ssa-operands.c:265 (ssa_operand_alloc) 104M: 4.0% 0 : 0.0% 39M: 1.0% 0 : 0.0% 105k > stringpool.c:41 (stringpool_ggc_alloc) 106M: 4.1% 0 : 0.0% 0 : 0.0% 7652k: 2.4% 1362k > lto/lto-common.c:204 (lto_read_in_decl_state) 160M: 6.2% 0 : 0.0% 105M: 2.8% 19M: 6.1% 1731k > cgraph.c:851 (create_edge) 248M: 9.5% 0 : 0.0% 70M: 1.9% 0 : 0.0% 3141k > cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7% 0 : 0.0% 155M: 4.1% 0 : 0.0% 1567k > tree-streamer-in.c:631 (streamer_alloc_tree) 718M: 27.5% 136M: 27.5% 1267M: 33.3% 64M: 20.6% 15M > -------------------------------------------------------------------------------------------------------------------------------------------- > GGC memory Leak Garbage Freed Overhead Times > -------------------------------------------------------------------------------------------------------------------------------------------- > Total 2609M:100.0% 497M:100.0% 3804M:100.0% 313M:100.0% 49M > -------------------------------------------------------------------------------------------------------------------------------------------- > > This looks more realistic. ssa_operands and init_ssanames shows that we > read really a lot of bodies into memory. I also wonder if we realy want > to compute virutal ssa form for them when we only want to compare them. > > After reading and symbol table merging I get: > > cgraph.h:2712 (allocate_cgraph_symbol) 148M: 7.1% 0 : 0.0% 115M: 6.7% 0 : 0.0% 767k > > So it seems that about half of callgrpah nodes are inline clones, so > working on reducing clone overhead (in addition to re-visiting tree > merging once again) seems to be most meaningful right now. > > OK if patch passes testing? OK. > * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving > memory allocations away. > * mem-stats.h (mem_alloc_description<T>::release_object_overhead): > do not silently ignore invalid release requests. > Index: ggc-common.c > =================================================================== > --- ggc-common.c (revision 277796) > +++ ggc-common.c (working copy) > @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void) > > for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it) > if (!ggc_marked_p ((*it).first)) > - (*it).second.first->m_collected += (*it).second.second; > - > - delete ggc_mem_desc.m_reverse_object_map; > - ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false); > + { > + (*it).second.first->m_collected += (*it).second.second; > + ggc_mem_desc.m_reverse_object_map->remove ((*it).first); > + } > } > > /* Return memory used by heap in kb, 0 if this info is not available. */ > Index: mem-stats.h > =================================================================== > --- mem-stats.h (revision 277796) > +++ mem-stats.h (working copy) > @@ -535,11 +535,8 @@ inline void > mem_alloc_description<T>::release_object_overhead (void *ptr) > { > std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr); > - if (entry) > - { > - entry->first->release_overhead (entry->second); > - m_reverse_object_map->remove (ptr); > - } > + entry->first->release_overhead (entry->second); > + m_reverse_object_map->remove (ptr); > } > > /* Unregister a memory allocation descriptor registered with
On 11/5/19 6:53 PM, Jan Hubicka wrote: > Found it. It turns out that ggc_prune_ovehread_list is bogus. It walks > all active allocations objects and looks if they was collected accoutnig > their collection and then throws away all allocations (including those > not colelcted) and those gets no longer accounted later. So we > basically misaccount everything that survives ggc_collect. I've just read the patch and it's correct. It was really a bogus. Thanks for it, Martin
Index: hsa-brig.c =================================================================== --- hsa-brig.c (revision 277796) +++ hsa-brig.c (working copy) @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. #include "cgraph.h" #include "dumpfile.h" #include "print-tree.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" #include "gomp-constants.h" Index: hsa-dump.c =================================================================== --- hsa-dump.c (revision 277796) +++ hsa-dump.c (working copy) @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. #include "gimple-pretty-print.h" #include "cgraph.h" #include "print-tree.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" Index: hsa-gen.c =================================================================== --- hsa-gen.c (revision 277796) +++ hsa-gen.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. #include "ssa-iterators.h" #include "cgraph.h" #include "print-tree.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" #include "cfghooks.h" Index: hsa-regalloc.c =================================================================== --- hsa-regalloc.c (revision 277796) +++ hsa-regalloc.c (working copy) @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. #include "cgraph.h" #include "print-tree.h" #include "cfghooks.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" Index: ipa-hsa.c =================================================================== --- ipa-hsa.c (revision 277796) +++ ipa-hsa.c (working copy) @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. #include "stringpool.h" #include "cgraph.h" #include "print-tree.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" Index: ipa-predicate.c =================================================================== --- ipa-predicate.c (revision 277796) +++ ipa-predicate.c (working copy) @@ -25,8 +25,8 @@ along with GCC; see the file COPYING3. #include "tree.h" #include "cgraph.h" #include "tree-vrp.h" -#include "symbol-summary.h" #include "alloc-pool.h" +#include "symbol-summary.h" #include "ipa-prop.h" #include "ipa-fnsummary.h" #include "real.h" Index: ipa-reference.c =================================================================== --- ipa-reference.c (revision 277796) +++ ipa-reference.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. #include "calls.h" #include "ipa-utils.h" #include "ipa-reference.h" +#include "alloc-pool.h" #include "symbol-summary.h" /* The static variables defined within the compilation unit that are Index: ipa-sra.c =================================================================== --- ipa-sra.c (revision 277796) +++ ipa-sra.c (working copy) @@ -75,6 +75,7 @@ along with GCC; see the file COPYING3. #include "gimple-walk.h" #include "tree-dfa.h" #include "tree-sra.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "params.h" #include "dbgcnt.h" Index: omp-expand.c =================================================================== --- omp-expand.c (revision 277796) +++ omp-expand.c (working copy) @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. #include "omp-general.h" #include "omp-offload.h" #include "tree-cfgcleanup.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "gomp-constants.h" #include "gimple-pretty-print.h" Index: omp-general.c =================================================================== --- omp-general.c (revision 277796) +++ omp-general.c (working copy) @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. #include "attribs.h" #include "gimplify.h" #include "cgraph.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "hsa-common.h" #include "tree-pass.h" Index: omp-low.c =================================================================== --- omp-low.c (revision 277796) +++ omp-low.c (working copy) @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. #include "omp-low.h" #include "omp-grid.h" #include "gimple-low.h" +#include "alloc-pool.h" #include "symbol-summary.h" #include "tree-nested.h" #include "context.h" Index: symbol-summary.h =================================================================== --- symbol-summary.h (revision 277796) +++ symbol-summary.h (working copy) @@ -28,8 +28,10 @@ class function_summary_base { public: /* Default construction takes SYMTAB as an argument. */ - function_summary_base (symbol_table *symtab): m_symtab (symtab), - m_insertion_enabled (true) + function_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO): + m_symtab (symtab), + m_insertion_enabled (true), + allocator ("function summary" PASS_MEM_STAT) {} /* Basic implementation of insert operation. */ @@ -59,7 +61,8 @@ protected: { /* Call gcc_internal_because we do not want to call finalizer for a type T. We call dtor explicitly. */ - return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () : new T () ; + return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () + : allocator.allocate () ; } /* Release an item that is stored within map. */ @@ -71,7 +74,7 @@ protected: ggc_free (item); } else - delete item; + allocator.remove (item); } /* Unregister all call-graph hooks. */ @@ -92,6 +95,7 @@ protected: private: /* Return true when the summary uses GGC memory for allocation. */ virtual bool is_ggc () = 0; + object_allocator<T> allocator; }; template <typename T> @@ -215,9 +219,8 @@ private: template <typename T> function_summary<T *>::function_summary (symbol_table *symtab, bool ggc MEM_STAT_DECL): - function_summary_base<T> (symtab), m_ggc (ggc), m_map (13, ggc, true, - GATHER_STATISTICS - PASS_MEM_STAT) + function_summary_base<T> (symtab PASS_MEM_STAT), m_ggc (ggc), + m_map (13, ggc, true, GATHER_STATISTICS PASS_MEM_STAT) { this->m_symtab_insertion_hook = this->m_symtab->add_cgraph_insertion_hook (function_summary::symtab_insertion, @@ -411,7 +414,7 @@ private: template <typename T, typename V> fast_function_summary<T *, V>::fast_function_summary (symbol_table *symtab MEM_STAT_DECL): - function_summary_base<T> (symtab), m_vector (NULL) + function_summary_base<T> (symtab PASS_MEM_STAT), m_vector (NULL) { vec_alloc (m_vector, 13 PASS_MEM_STAT); this->m_symtab_insertion_hook @@ -531,8 +534,10 @@ class call_summary_base { public: /* Default construction takes SYMTAB as an argument. */ - call_summary_base (symbol_table *symtab): m_symtab (symtab), - m_initialize_when_cloning (false) + call_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO): + m_symtab (symtab), + m_initialize_when_cloning (false), + allocator ("call summary" PASS_MEM_STAT) {} /* Basic implementation of removal operation. */ @@ -547,7 +552,8 @@ protected: { /* Call gcc_internal_because we do not want to call finalizer for a type T. We call dtor explicitly. */ - return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () : new T () ; + return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () + : allocator.allocate (); } /* Release an item that is stored within map. */ @@ -559,7 +565,7 @@ protected: ggc_free (item); } else - delete item; + allocator.remove (item); } /* Unregister all call-graph hooks. */ @@ -578,6 +584,7 @@ protected: private: /* Return true when the summary uses GGC memory for allocation. */ virtual bool is_ggc () = 0; + object_allocator<T> allocator; }; template <typename T> @@ -607,9 +614,8 @@ public: /* Default construction takes SYMTAB as an argument. */ call_summary (symbol_table *symtab, bool ggc = false CXX_MEM_STAT_INFO) - : call_summary_base<T> (symtab), m_ggc (ggc), m_map (13, ggc, true, - GATHER_STATISTICS - PASS_MEM_STAT) + : call_summary_base<T> (symtab PASS_MEM_STAT), m_ggc (ggc), + m_map (13, ggc, true, GATHER_STATISTICS PASS_MEM_STAT) { this->m_symtab_removal_hook = this->m_symtab->add_edge_removal_hook (call_summary::symtab_removal, @@ -775,7 +781,7 @@ class GTY((user)) fast_call_summary <T * public: /* Default construction takes SYMTAB as an argument. */ fast_call_summary (symbol_table *symtab CXX_MEM_STAT_INFO) - : call_summary_base<T> (symtab), m_vector (NULL) + : call_summary_base<T> (symtab PASS_MEM_STAT), m_vector (NULL) { vec_alloc (m_vector, 13 PASS_MEM_STAT); this->m_symtab_removal_hook