Add object allocators to symbol and call summaries
diff mbox series

Message ID 20191105103642.y7obi2q65lqykecv@kam.mff.cuni.cz
State New
Headers show
Series
  • Add object allocators to symbol and call summaries
Related show

Commit Message

Jan Hubicka Nov. 5, 2019, 10:36 a.m. UTC
Hi,
this patch adds object allocators to manage IPA summaries. This reduces
malloc overhead and fragmentation.  I now get peak memory use 7.5GB instead
of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
forks.  Additional bonus is that we now have statistics gathered by mem-reports
which makes my life easier, too.
(though memory stats are far from ideal - we need to pass location info around
bit more).

Bootstrapped/regtested x86_64-linux, will commit it shortly.

	* hsa-brig.c: Include alloc-pool.h
	* hsa-dump.c: Likewise.
	* hsa-gen.c: Likewise.
	* hse-regalloc.c: Likewise.
	* ipa-hsa.c: Likewise.
	* ipa-predicate.c: Likewise.
	* ipa-reference.c: Likewise.
	* ipa-sra.c: Likewise.
	* omp-expand.c: Likewise.
	* omp-general.c: Likewise.
	* omp-low.c: Likewise.
	* sumbol-summary.h (function_summary_base): Add allocator.
	(function_summary<T *>::function_summary): Update construction.
	(fast_function_summary<T *, V>::fast_function_summary): Likewise.
	(call_summary_base): Add allcator.
	(call_summary<T *>::call_summary): Update construction.
	(fast_call_summary<T *, V>::fast_call_summary): Likewise.

Comments

Martin Liška Nov. 5, 2019, 10:45 a.m. UTC | #1
On 11/5/19 11:36 AM, Jan Hubicka wrote:
> Hi,
> this patch adds object allocators to manage IPA summaries. This reduces
> malloc overhead and fragmentation.  I now get peak memory use 7.5GB instead
> of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
> forks.

That sounds promising.

> Additional bonus is that we now have statistics gathered by mem-reports
> which makes my life easier, too.

What's currently bad with the detailed memory statistics? I updated the
code that one should see the allocation for the underlying hash_map and
vec?

> (though memory stats are far from ideal - we need to pass location info around
> bit more).

Please rename allocator to m_allocator.

Martin
Martin Liška Nov. 5, 2019, 10:51 a.m. UTC | #2
On 11/5/19 11:45 AM, Martin Liška wrote:
> Please rename allocator to m_allocator.

You were faster and installed that patch.

Thus I'm sending the adjustment.

Martin
Jan Hubicka Nov. 5, 2019, 11:01 a.m. UTC | #3
> On 11/5/19 11:36 AM, Jan Hubicka wrote:
> > Hi,
> > this patch adds object allocators to manage IPA summaries. This reduces
> > malloc overhead and fragmentation.  I now get peak memory use 7.5GB instead
> > of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
> > forks.
> 
> That sounds promising.
> 
> > Additional bonus is that we now have statistics gathered by mem-reports
> > which makes my life easier, too.
> 
> What's currently bad with the detailed memory statistics? I updated the
> code that one should see the allocation for the underlying hash_map and
> vec?

I currently get:

--------------------------------------------------------------------------------------------------------------------------------------------
Pool name                       Allocation pool                                   Pools       Leak            Peak            Times    Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
tree_scc                        lto/lto-common.c:2709 (read_cgraph_and_symbols)      1         0 :  0.0%       99M     3169k: 43.7%          32
IPA histogram                   ipa-profile.c:77 (__static_initialization_and_de     1        16 :  0.0%       16         1 :  0.0%          16
IPA-PROP ref descriptions       ipa-prop.c:170 (__static_initialization_and_dest     1       226k:  0.3%      226k     9670 :  0.1%          24
function summary                ipa-fnsummary.c:557 (ipa_fn_summary_alloc)           1      6145k:  7.0%     6257k      391k:  5.4%          16
function summary                ipa-pure-const.c:136 (__base_ctor )                  1      6863k:  7.9%     9449k      590k:  8.1%          16
edge predicates                 ipa-fnsummary.c:93 (__static_initialization_and_     1      8327k:  9.5%     8385k      209k:  2.9%          40
call summary                    ipa-sra.c:436 (__base_ctor )                         1        18M: 21.3%       21M     1393k: 19.2%          16
call summary                    ipa-fnsummary.h:276 (__base_ctor )                   1        46M: 54.0%       46M     1483k: 20.5%          32
--------------------------------------------------------------------------------------------------------------------------------------------
Pool name                       Allocation pool                                   Pools       Leak            Peak            Times    Elt size
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                 9         85M
--------------------------------------------------------------------------------------------------------------------------------------------

This is quite readable, though we may give them different names and
update constructors. Not a big deal IMO.

For GGC statistics I see:

varpool.c:137 (create_empty)                          7924k:  0.4%        0 :  0.0%     3214k:  0.2%        0 :  0.0%       87k
cgraph.c:939 (cgraph_allocate_init_indirect_info      8566k:  0.4%        0 :  0.0%     1395k:  0.1%        0 :  0.0%      113k
alias.c:1170 (record_alias_subset)                      12M:  0.6%        0 :  0.0%       12k:  0.0%       99k:  0.1%       12k
ipa-sra.c:2717 (isra_read_node_info)                    12M:  0.6%        0 :  0.0%     4179k:  0.2%       21k:  0.0%      376k
toplev.c:904 (realloc_for_line_map)                     16M:  0.8%        0 :  0.0%       15M:  0.9%      144 :  0.0%       12 
ipa-prop.c:278 (ipa_alloc_node_params)                  16M:  0.8%      266k:  0.4%        0 :  0.0%       22k:  0.0%      366k
symbol-summary.h:555 (allocate_new)                     18M:  0.9%        0 :  0.0%      119k:  0.0%        0 :  0.0%     1171k
 ^^^ here we should point the caller of get_create

ipa-fnsummary.c:3877 (inline_read_section)              28M:  1.4%        0 :  0.0%      552k:  0.0%      392k:  0.3%      261k
lto-section-in.c:388 (lto_new_in_decl_state)            29M:  1.4%        0 :  0.0%       11M:  0.7%        0 :  0.0%      587k
symtab.c:582 (create_reference)                         35M:  1.7%        0 :  0.0%       50M:  2.9%     1199k:  0.9%      541k
symbol-summary.h:64 (allocate_new)                      46M:  2.2%        0 :  0.0%     2445k:  0.1%        0 :  0.0%     1168k
 ^^^ same here.

stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349 
 ^^^ some of memory comes here which ought to be accounted to caller of
 expand.
stringpool.c:41 (stringpool_ggc_alloc)                  92M:  4.4%        0 :  0.0%        0 :  0.0%     6600k:  5.2%     1217k
cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 :  0.0%      115M:  6.7%        0 :  0.0%      767k
cgraph.c:851 (create_edge)                             149M:  7.1%        0 :  0.0%       27M:  1.6%        0 :  0.0%     1743k
ipa-fnsummary.c:3936 (inline_read_section)             174M:  8.3%        0 :  0.0%     4190k:  0.2%       12M: 10.2%      391k
lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  9.6%        0 :  0.0%       65M:  3.8%       19M: 15.5%     1731k
ipa-prop.c:4478 (ipa_read_edge_info)                   210M: 10.0%        0 :  0.0%     1361k:  0.1%       17M: 14.4%     1171k
tree-streamer-in.c:631 (streamer_alloc_tree)           647M: 30.8%       55M: 84.5%     1267M: 73.4%       64M: 52.1%       13M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 2100M:100.0%       65M:100.0%     1726M:100.0%      124M:100.0%       29M
--------------------------------------------------------------------------------------------------------------------------------------------

One very odd thing is that at the end of WPA of firefox I see:

hash-table.h:801 (expand)                              100M:  2.9%     2088 :  0.0%      193M:  6.4%       90k:  0.0%     3379 
tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  3.0%        0 :  0.0%       39M:  1.3%        0 :  0.0%      105k
stringpool.c:41 (stringpool_ggc_alloc)                 106M:  3.1%        0 :  0.0%        0 :  0.0%     7652k:  2.4%     1362k
ipa-fnsummary.c:3936 (inline_read_section)             174M:  5.1%        0 :  0.0%     4190k:  0.1%       12M:  4.0%      391k
  ^^^ those are size_tale vectors that ought to be freed.  

lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  5.8%        0 :  0.0%       65M:  2.2%       19M:  6.1%     1731k
ipa-prop.c:4478 (ipa_read_edge_info)                   210M:  6.1%        0 :  0.0%     1361k:  0.0%       17M:  5.7%     1171k
  ^^^ those are jumptables that ought to be freed too.

cgraph.c:851 (create_edge)                             285M:  8.3%        0 :  0.0%       33M:  1.1%        0 :  0.0%     3141k
cgraph.h:2712 (allocate_cgraph_symbol)                 417M: 12.1%        0 :  0.0%      121M:  4.0%        0 :  0.0%     1567k
tree-streamer-in.c:631 (streamer_alloc_tree)           758M: 22.0%       96M: 23.0%     1267M: 41.7%       64M: 20.6%       15M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 3453M:100.0%      418M:100.0%     3039M:100.0%      313M:100.0%       49M
--------------------------------------------------------------------------------------------------------------------------------------------

I am not sure where the problem is - it is GGC memory and we release
those summaries after inlining so there should not be any pointers to
them. At worst it should account to garbage, so it may be also some
accounting bug.

I suppose first thing to try is to breakpoint in the ggc walker of these
and see if it shows up in the final ggc.

Honza
Jan Hubicka Nov. 5, 2019, 11:02 a.m. UTC | #4
> On 11/5/19 11:45 AM, Martin Liška wrote:
> > Please rename allocator to m_allocator.
> 
> You were faster and installed that patch.
> 
> Thus I'm sending the adjustment.
> 
> Martin

> From 6edd5d8c4afb0451aaaf05ba857435219b31814d Mon Sep 17 00:00:00 2001
> From: Martin Liska <mliska@suse.cz>
> Date: Tue, 5 Nov 2019 11:50:32 +0100
> Subject: [PATCH] Update coding style in symbol-summary.h.
> 
> gcc/ChangeLog:
> 
> 2019-11-05  Martin Liska  <mliska@suse.cz>
> 
> 	* symbol-summary.h: Rename allocator to m_allocator and
> 	add comment.

This is OK, thanks!
Honza
> ---
>  gcc/symbol-summary.h | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
> index d663cbb90fd..8aedcfe9143 100644
> --- a/gcc/symbol-summary.h
> +++ b/gcc/symbol-summary.h
> @@ -31,7 +31,7 @@ public:
>    function_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO):
>    m_symtab (symtab),
>    m_insertion_enabled (true),
> -  allocator ("function summary" PASS_MEM_STAT)
> +  m_allocator ("function summary" PASS_MEM_STAT)
>    {}
>  
>    /* Basic implementation of insert operation.  */
> @@ -62,7 +62,7 @@ protected:
>      /* Call gcc_internal_because we do not want to call finalizer for
>         a type T.  We call dtor explicitly.  */
>      return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T ()
> -		     : allocator.allocate () ;
> +		     : m_allocator.allocate () ;
>    }
>  
>    /* Release an item that is stored within map.  */
> @@ -74,7 +74,7 @@ protected:
>  	ggc_free (item);
>        }
>      else
> -      allocator.remove (item);
> +      m_allocator.remove (item);
>    }
>  
>    /* Unregister all call-graph hooks.  */
> @@ -95,7 +95,9 @@ protected:
>  private:
>    /* Return true when the summary uses GGC memory for allocation.  */
>    virtual bool is_ggc () = 0;
> -  object_allocator<T> allocator;
> +
> +  /* Object allocator for heap allocation.  */
> +  object_allocator<T> m_allocator;
>  };
>  
>  template <typename T>
> @@ -537,7 +539,7 @@ public:
>    call_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO):
>    m_symtab (symtab),
>    m_initialize_when_cloning (false),
> -  allocator ("call summary" PASS_MEM_STAT)
> +  m_allocator ("call summary" PASS_MEM_STAT)
>    {}
>  
>    /* Basic implementation of removal operation.  */
> @@ -553,7 +555,7 @@ protected:
>      /* Call gcc_internal_because we do not want to call finalizer for
>         a type T.  We call dtor explicitly.  */
>      return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T ()
> -		     : allocator.allocate ();
> +		     : m_allocator.allocate ();
>    }
>  
>    /* Release an item that is stored within map.  */
> @@ -565,7 +567,7 @@ protected:
>  	ggc_free (item);
>        }
>      else
> -      allocator.remove (item);
> +      m_allocator.remove (item);
>    }
>  
>    /* Unregister all call-graph hooks.  */
> @@ -584,7 +586,9 @@ protected:
>  private:
>    /* Return true when the summary uses GGC memory for allocation.  */
>    virtual bool is_ggc () = 0;
> -  object_allocator<T> allocator;
> +
> +  /* Object allocator for heap allocation.  */
> +  object_allocator<T> m_allocator;
>  };
>  
>  template <typename T>
> -- 
> 2.23.0
>
Martin Liška Nov. 5, 2019, 12:05 p.m. UTC | #5
On 11/5/19 12:01 PM, Jan Hubicka wrote:
>> On 11/5/19 11:36 AM, Jan Hubicka wrote:
>>> Hi,
>>> this patch adds object allocators to manage IPA summaries. This reduces
>>> malloc overhead and fragmentation.  I now get peak memory use 7.5GB instead
>>> of 10GB for firefox WPA because reduced fragmentation leads to less COWs after
>>> forks.
>>
>> That sounds promising.
>>
>>> Additional bonus is that we now have statistics gathered by mem-reports
>>> which makes my life easier, too.
>>
>> What's currently bad with the detailed memory statistics? I updated the
>> code that one should see the allocation for the underlying hash_map and
>> vec?
> 
> I currently get:
> 
> --------------------------------------------------------------------------------------------------------------------------------------------
> Pool name                       Allocation pool                                   Pools       Leak            Peak            Times    Elt size
> --------------------------------------------------------------------------------------------------------------------------------------------
> tree_scc                        lto/lto-common.c:2709 (read_cgraph_and_symbols)      1         0 :  0.0%       99M     3169k: 43.7%          32
> IPA histogram                   ipa-profile.c:77 (__static_initialization_and_de     1        16 :  0.0%       16         1 :  0.0%          16
> IPA-PROP ref descriptions       ipa-prop.c:170 (__static_initialization_and_dest     1       226k:  0.3%      226k     9670 :  0.1%          24
> function summary                ipa-fnsummary.c:557 (ipa_fn_summary_alloc)           1      6145k:  7.0%     6257k      391k:  5.4%          16
> function summary                ipa-pure-const.c:136 (__base_ctor )                  1      6863k:  7.9%     9449k      590k:  8.1%          16
> edge predicates                 ipa-fnsummary.c:93 (__static_initialization_and_     1      8327k:  9.5%     8385k      209k:  2.9%          40
> call summary                    ipa-sra.c:436 (__base_ctor )                         1        18M: 21.3%       21M     1393k: 19.2%          16
> call summary                    ipa-fnsummary.h:276 (__base_ctor )                   1        46M: 54.0%       46M     1483k: 20.5%          32
> --------------------------------------------------------------------------------------------------------------------------------------------
> Pool name                       Allocation pool                                   Pools       Leak            Peak            Times    Elt size
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                                                 9         85M
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> This is quite readable, though we may give them different names and
> update constructors. Not a big deal IMO.
> 
> For GGC statistics I see:
> 
> varpool.c:137 (create_empty)                          7924k:  0.4%        0 :  0.0%     3214k:  0.2%        0 :  0.0%       87k
> cgraph.c:939 (cgraph_allocate_init_indirect_info      8566k:  0.4%        0 :  0.0%     1395k:  0.1%        0 :  0.0%      113k
> alias.c:1170 (record_alias_subset)                      12M:  0.6%        0 :  0.0%       12k:  0.0%       99k:  0.1%       12k
> ipa-sra.c:2717 (isra_read_node_info)                    12M:  0.6%        0 :  0.0%     4179k:  0.2%       21k:  0.0%      376k
> toplev.c:904 (realloc_for_line_map)                     16M:  0.8%        0 :  0.0%       15M:  0.9%      144 :  0.0%       12
> ipa-prop.c:278 (ipa_alloc_node_params)                  16M:  0.8%      266k:  0.4%        0 :  0.0%       22k:  0.0%      366k
> symbol-summary.h:555 (allocate_new)                     18M:  0.9%        0 :  0.0%      119k:  0.0%        0 :  0.0%     1171k
>   ^^^ here we should point the caller of get_create
> 
> ipa-fnsummary.c:3877 (inline_read_section)              28M:  1.4%        0 :  0.0%      552k:  0.0%      392k:  0.3%      261k
> lto-section-in.c:388 (lto_new_in_decl_state)            29M:  1.4%        0 :  0.0%       11M:  0.7%        0 :  0.0%      587k
> symtab.c:582 (create_reference)                         35M:  1.7%        0 :  0.0%       50M:  2.9%     1199k:  0.9%      541k
> symbol-summary.h:64 (allocate_new)                      46M:  2.2%        0 :  0.0%     2445k:  0.1%        0 :  0.0%     1168k
>   ^^^ same here.
> 
> stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
> ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
> hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
>   ^^^ some of memory comes here which ought to be accounted to caller of
>   expand.

Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
for each created symbol/call_summary and register manually every allocation to such descriptor.

> stringpool.c:41 (stringpool_ggc_alloc)                  92M:  4.4%        0 :  0.0%        0 :  0.0%     6600k:  5.2%     1217k
> cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 :  0.0%      115M:  6.7%        0 :  0.0%      767k
> cgraph.c:851 (create_edge)                             149M:  7.1%        0 :  0.0%       27M:  1.6%        0 :  0.0%     1743k
> ipa-fnsummary.c:3936 (inline_read_section)             174M:  8.3%        0 :  0.0%     4190k:  0.2%       12M: 10.2%      391k
> lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  9.6%        0 :  0.0%       65M:  3.8%       19M: 15.5%     1731k
> ipa-prop.c:4478 (ipa_read_edge_info)                   210M: 10.0%        0 :  0.0%     1361k:  0.1%       17M: 14.4%     1171k
> tree-streamer-in.c:631 (streamer_alloc_tree)           647M: 30.8%       55M: 84.5%     1267M: 73.4%       64M: 52.1%       13M
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 2100M:100.0%       65M:100.0%     1726M:100.0%      124M:100.0%       29M
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> One very odd thing is that at the end of WPA of firefox I see:
> 
> hash-table.h:801 (expand)                              100M:  2.9%     2088 :  0.0%      193M:  6.4%       90k:  0.0%     3379
> tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  3.0%        0 :  0.0%       39M:  1.3%        0 :  0.0%      105k
> stringpool.c:41 (stringpool_ggc_alloc)                 106M:  3.1%        0 :  0.0%        0 :  0.0%     7652k:  2.4%     1362k
> ipa-fnsummary.c:3936 (inline_read_section)             174M:  5.1%        0 :  0.0%     4190k:  0.1%       12M:  4.0%      391k
>    ^^^ those are size_tale vectors that ought to be freed.
> 
> lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  5.8%        0 :  0.0%       65M:  2.2%       19M:  6.1%     1731k
> ipa-prop.c:4478 (ipa_read_edge_info)                   210M:  6.1%        0 :  0.0%     1361k:  0.0%       17M:  5.7%     1171k
>    ^^^ those are jumptables that ought to be freed too.

I verified this and I can confirm that

class GTY((for_user)) ipa_edge_args
{
  public:

   /* Default constructor.  */
   ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
     {}

   /* Destructor.  */
   ~ipa_edge_args ()
     {
       vec_free (jump_functions);
       vec_free (polymorphic_call_contexts);
     }

is called which then calls vec_free. I traced that down and m_reverse_object_map does not
contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory
is released.

Martin

> 
> cgraph.c:851 (create_edge)                             285M:  8.3%        0 :  0.0%       33M:  1.1%        0 :  0.0%     3141k
> cgraph.h:2712 (allocate_cgraph_symbol)                 417M: 12.1%        0 :  0.0%      121M:  4.0%        0 :  0.0%     1567k
> tree-streamer-in.c:631 (streamer_alloc_tree)           758M: 22.0%       96M: 23.0%     1267M: 41.7%       64M: 20.6%       15M
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 3453M:100.0%      418M:100.0%     3039M:100.0%      313M:100.0%       49M
> --------------------------------------------------------------------------------------------------------------------------------------------
> 
> I am not sure where the problem is - it is GGC memory and we release
> those summaries after inlining so there should not be any pointers to
> them. At worst it should account to garbage, so it may be also some
> accounting bug.
> 
> I suppose first thing to try is to breakpoint in the ggc walker of these
> and see if it shows up in the final ggc.
> 
> Honza
>
Jan Hubicka Nov. 5, 2019, 2:48 p.m. UTC | #6
> > 
> > stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
> > ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
> > hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
> >   ^^^ some of memory comes here which ought to be accounted to caller of
> >   expand.
> 
> Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
> for each created symbol/call_summary and register manually every allocation to such descriptor.

Or just pass memory stats from caller of expand and transitively pass it
from caller of summary. This will get us the line info of get_create
call that is IMO OK.
> > lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  5.8%        0 :  0.0%       65M:  2.2%       19M:  6.1%     1731k
> > ipa-prop.c:4478 (ipa_read_edge_info)                   210M:  6.1%        0 :  0.0%     1361k:  0.0%       17M:  5.7%     1171k
> >    ^^^ those are jumptables that ought to be freed too.
> 
> I verified this and I can confirm that
> 
> class GTY((for_user)) ipa_edge_args
> {
>  public:
> 
>   /* Default constructor.  */
>   ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
>     {}
> 
>   /* Destructor.  */
>   ~ipa_edge_args ()
>     {
>       vec_free (jump_functions);
>       vec_free (polymorphic_call_contexts);
>     }
> 
> is called which then calls vec_free. I traced that down and m_reverse_object_map does not
> contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory
> is released.

Well, it is not very minor given that it is third largest allocation. It
would be nice to track this down so we can trust the summaries again.
It is very strange that overhead gets registered but never unregistered
- both freeing or garbage collecting should do that.

I will try to see what happens.

Honza
> 
> Martin
> 
> > 
> > cgraph.c:851 (create_edge)                             285M:  8.3%        0 :  0.0%       33M:  1.1%        0 :  0.0%     3141k
> > cgraph.h:2712 (allocate_cgraph_symbol)                 417M: 12.1%        0 :  0.0%      121M:  4.0%        0 :  0.0%     1567k
> > tree-streamer-in.c:631 (streamer_alloc_tree)           758M: 22.0%       96M: 23.0%     1267M: 41.7%       64M: 20.6%       15M
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > Total                                                 3453M:100.0%      418M:100.0%     3039M:100.0%      313M:100.0%       49M
> > --------------------------------------------------------------------------------------------------------------------------------------------
> > 
> > I am not sure where the problem is - it is GGC memory and we release
> > those summaries after inlining so there should not be any pointers to
> > them. At worst it should account to garbage, so it may be also some
> > accounting bug.
> > 
> > I suppose first thing to try is to breakpoint in the ggc walker of these
> > and see if it shows up in the final ggc.
> > 
> > Honza
> > 
>
Martin Liška Nov. 5, 2019, 3:06 p.m. UTC | #7
On 11/5/19 3:48 PM, Jan Hubicka wrote:
>>>
>>> stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
>>> ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
>>> hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
>>>    ^^^ some of memory comes here which ought to be accounted to caller of
>>>    expand.
>>
>> Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
>> for each created symbol/call_summary and register manually every allocation to such descriptor.
> 
> Or just pass memory stats from caller of expand and transitively pass it
> from caller of summary. This will get us the line info of get_create
> call that is IMO OK.

The issue with this approach is that you will spread a summary allocation
along all the ::get_create places. Which is not ideal.

>>> lto/lto-common.c:204 (lto_read_in_decl_state)          200M:  5.8%        0 :  0.0%       65M:  2.2%       19M:  6.1%     1731k
>>> ipa-prop.c:4478 (ipa_read_edge_info)                   210M:  6.1%        0 :  0.0%     1361k:  0.0%       17M:  5.7%     1171k
>>>     ^^^ those are jumptables that ought to be freed too.
>>
>> I verified this and I can confirm that
>>
>> class GTY((for_user)) ipa_edge_args
>> {
>>   public:
>>
>>    /* Default constructor.  */
>>    ipa_edge_args () : jump_functions (NULL), polymorphic_call_contexts (NULL)
>>      {}
>>
>>    /* Destructor.  */
>>    ~ipa_edge_args ()
>>      {
>>        vec_free (jump_functions);
>>        vec_free (polymorphic_call_contexts);
>>      }
>>
>> is called which then calls vec_free. I traced that down and m_reverse_object_map does not
>> contain the pointer. So some minor issue in allocation tracing. But I'm pretty sure the memory
>> is released.
> 
> Well, it is not very minor given that it is third largest allocation. It
> would be nice to track this down so we can trust the summaries again.
> It is very strange that overhead gets registered but never unregistered
> - both freeing or garbage collecting should do that.
> 
> I will try to see what happens.

Try to take a look, or we can debug that on Thursday together?
Martin

> 
> Honza
>>
>> Martin
>>
>>>
>>> cgraph.c:851 (create_edge)                             285M:  8.3%        0 :  0.0%       33M:  1.1%        0 :  0.0%     3141k
>>> cgraph.h:2712 (allocate_cgraph_symbol)                 417M: 12.1%        0 :  0.0%      121M:  4.0%        0 :  0.0%     1567k
>>> tree-streamer-in.c:631 (streamer_alloc_tree)           758M: 22.0%       96M: 23.0%     1267M: 41.7%       64M: 20.6%       15M
>>> --------------------------------------------------------------------------------------------------------------------------------------------
>>> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
>>> --------------------------------------------------------------------------------------------------------------------------------------------
>>> Total                                                 3453M:100.0%      418M:100.0%     3039M:100.0%      313M:100.0%       49M
>>> --------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> I am not sure where the problem is - it is GGC memory and we release
>>> those summaries after inlining so there should not be any pointers to
>>> them. At worst it should account to garbage, so it may be also some
>>> accounting bug.
>>>
>>> I suppose first thing to try is to breakpoint in the ggc walker of these
>>> and see if it shows up in the final ggc.
>>>
>>> Honza
>>>
>>
Jan Hubicka Nov. 5, 2019, 5:53 p.m. UTC | #8
> On 11/5/19 3:48 PM, Jan Hubicka wrote:
> > > > 
> > > > stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
> > > > ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
> > > > hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
> > > >    ^^^ some of memory comes here which ought to be accounted to caller of
> > > >    expand.
> > > 
> > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
> > > for each created symbol/call_summary and register manually every allocation to such descriptor.
> > 
> > Or just pass memory stats from caller of expand and transitively pass it
> > from caller of summary. This will get us the line info of get_create
> > call that is IMO OK.
> 
> The issue with this approach is that you will spread a summary allocation
> along all the ::get_create places. Which is not ideal.

We get it with other allocations, too. Not ideal, but better.
Even better solutions are welcome :)
> 
> Try to take a look, or we can debug that on Thursday together?
> Martin

Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
all active allocations objects and looks if they was collected accoutnig
their collection and then throws away all allocations (including those
not colelcted) and those gets no longer accounted later.  So we
basically misaccount everything that survives ggc_collect.

No wonder that it makes me to hunt ghosts 8-O

Also the last memory report was sorted by garbage not leak for reason -
for normal compilation we care about garbage produces primarily because
those triggers ggc collects and makes compiler slow.

BTW I like how advanced C++ gets back to lisp :)

With the fix I get following stats by end of firefox WPA

cfg.c:127 (alloc_block)                                 32M:  1.2%       12M:  2.6%        0 :  0.0%        0 :  0.0%      446k
symtab.c:582 (create_reference)                         42M:  1.6%        0 :  0.0%       65M:  1.7%     1329k:  0.4%      840k
gimple-streamer-in.c:101 (input_gimple_stmt)            49M:  1.9%       17M:  3.5%        0 :  0.0%      375k:  0.1%      747k
tree-ssanames.c:308 (make_ssa_name_fn)                  51M:  2.0%       16M:  3.4%        0 :  0.0%        0 :  0.0%      973k
ipa-cp.c:5157 (ipcp_store_vr_results)                   51M:  2.0%     1243k:  0.2%        0 :  0.0%     9561k:  3.0%      146k
stringpool.c:63 (alloc_node)                            53M:  2.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1362k
ipa-prop.c:3988 (duplicate)                             63M:  2.4%     1115k:  0.2%        0 :  0.0%       10M:  3.2%      264k
toplev.c:904 (realloc_for_line_map)                     72M:  2.8%        0 :  0.0%       71M:  1.9%       15M:  5.1%       27 
tree-ssanames.c:83 (init_ssanames)                      96M:  3.7%      121M: 24.4%       44M:  1.2%       87M: 27.8%      174k
tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  4.0%        0 :  0.0%       39M:  1.0%        0 :  0.0%      105k
stringpool.c:41 (stringpool_ggc_alloc)                 106M:  4.1%        0 :  0.0%        0 :  0.0%     7652k:  2.4%     1362k
lto/lto-common.c:204 (lto_read_in_decl_state)          160M:  6.2%        0 :  0.0%      105M:  2.8%       19M:  6.1%     1731k
cgraph.c:851 (create_edge)                             248M:  9.5%        0 :  0.0%       70M:  1.9%        0 :  0.0%     3141k
cgraph.h:2712 (allocate_cgraph_symbol)                 383M: 14.7%        0 :  0.0%      155M:  4.1%        0 :  0.0%     1567k
tree-streamer-in.c:631 (streamer_alloc_tree)           718M: 27.5%      136M: 27.5%     1267M: 33.3%       64M: 20.6%       15M
--------------------------------------------------------------------------------------------------------------------------------------------
GGC memory                                              Leak          Garbage            Freed        Overhead            Times
--------------------------------------------------------------------------------------------------------------------------------------------
Total                                                 2609M:100.0%      497M:100.0%     3804M:100.0%      313M:100.0%       49M
--------------------------------------------------------------------------------------------------------------------------------------------

This looks more realistic. ssa_operands and init_ssanames shows that we
read really a lot of bodies into memory. I also wonder if we realy want
to compute virutal ssa form for them when we only want to compare them.

After reading and symbol table merging I get:

cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 :  0.0%      115M:  6.7%        0 :  0.0%      767k

So it seems that about half of callgrpah nodes are inline clones, so
working on reducing clone overhead (in addition to re-visiting tree
merging once again) seems to be most meaningful right now.

OK if patch passes testing?

	* ggc-common.c (ggc_prune_overhead_list): Do not throw surviving
	memory allocations away.
	* mem-stats.h (mem_alloc_description<T>::release_object_overhead):
	do not silently ignore invalid release requests.
Index: ggc-common.c
===================================================================
--- ggc-common.c	(revision 277796)
+++ ggc-common.c	(working copy)
@@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void)
 
   for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it)
     if (!ggc_marked_p ((*it).first))
-      (*it).second.first->m_collected += (*it).second.second;
-
-  delete ggc_mem_desc.m_reverse_object_map;
-  ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false);
+      {
+        (*it).second.first->m_collected += (*it).second.second;
+	ggc_mem_desc.m_reverse_object_map->remove ((*it).first);
+      }
 }
 
 /* Return memory used by heap in kb, 0 if this info is not available.  */
Index: mem-stats.h
===================================================================
--- mem-stats.h	(revision 277796)
+++ mem-stats.h	(working copy)
@@ -535,11 +535,8 @@ inline void
 mem_alloc_description<T>::release_object_overhead (void *ptr)
 {
   std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr);
-  if (entry)
-    {
-      entry->first->release_overhead (entry->second);
-      m_reverse_object_map->remove (ptr);
-    }
+  entry->first->release_overhead (entry->second);
+  m_reverse_object_map->remove (ptr);
 }
 
 /* Unregister a memory allocation descriptor registered with
Richard Biener Nov. 6, 2019, 9:42 a.m. UTC | #9
On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On 11/5/19 3:48 PM, Jan Hubicka wrote:
> > > > >
> > > > > stringpool.c:63 (alloc_node)                            47M:  2.3%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1217k
> > > > > ipa-prop.c:4480 (ipa_read_edge_info)                    51M:  2.4%        0 :  0.0%      260k:  0.0%      404k:  0.3%      531k
> > > > > hash-table.h:801 (expand)                               81M:  3.9%        0 :  0.0%       80M:  4.7%       88k:  0.1%     3349
> > > > >    ^^^ some of memory comes here which ought to be accounted to caller of
> > > > >    expand.
> > > >
> > > > Yes, these all come from ggc_internal_alloc. Ideally we should register a mem_alloc_description
> > > > for each created symbol/call_summary and register manually every allocation to such descriptor.
> > >
> > > Or just pass memory stats from caller of expand and transitively pass it
> > > from caller of summary. This will get us the line info of get_create
> > > call that is IMO OK.
> >
> > The issue with this approach is that you will spread a summary allocation
> > along all the ::get_create places. Which is not ideal.
>
> We get it with other allocations, too. Not ideal, but better.
> Even better solutions are welcome :)
> >
> > Try to take a look, or we can debug that on Thursday together?
> > Martin
>
> Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
> all active allocations objects and looks if they was collected accoutnig
> their collection and then throws away all allocations (including those
> not colelcted) and those gets no longer accounted later.  So we
> basically misaccount everything that survives ggc_collect.
>
> No wonder that it makes me to hunt ghosts 8-O
>
> Also the last memory report was sorted by garbage not leak for reason -
> for normal compilation we care about garbage produces primarily because
> those triggers ggc collects and makes compiler slow.
>
> BTW I like how advanced C++ gets back to lisp :)
>
> With the fix I get following stats by end of firefox WPA
>
> cfg.c:127 (alloc_block)                                 32M:  1.2%       12M:  2.6%        0 :  0.0%        0 :  0.0%      446k
> symtab.c:582 (create_reference)                         42M:  1.6%        0 :  0.0%       65M:  1.7%     1329k:  0.4%      840k
> gimple-streamer-in.c:101 (input_gimple_stmt)            49M:  1.9%       17M:  3.5%        0 :  0.0%      375k:  0.1%      747k
> tree-ssanames.c:308 (make_ssa_name_fn)                  51M:  2.0%       16M:  3.4%        0 :  0.0%        0 :  0.0%      973k
> ipa-cp.c:5157 (ipcp_store_vr_results)                   51M:  2.0%     1243k:  0.2%        0 :  0.0%     9561k:  3.0%      146k
> stringpool.c:63 (alloc_node)                            53M:  2.0%        0 :  0.0%        0 :  0.0%        0 :  0.0%     1362k
> ipa-prop.c:3988 (duplicate)                             63M:  2.4%     1115k:  0.2%        0 :  0.0%       10M:  3.2%      264k
> toplev.c:904 (realloc_for_line_map)                     72M:  2.8%        0 :  0.0%       71M:  1.9%       15M:  5.1%       27
> tree-ssanames.c:83 (init_ssanames)                      96M:  3.7%      121M: 24.4%       44M:  1.2%       87M: 27.8%      174k
> tree-ssa-operands.c:265 (ssa_operand_alloc)            104M:  4.0%        0 :  0.0%       39M:  1.0%        0 :  0.0%      105k
> stringpool.c:41 (stringpool_ggc_alloc)                 106M:  4.1%        0 :  0.0%        0 :  0.0%     7652k:  2.4%     1362k
> lto/lto-common.c:204 (lto_read_in_decl_state)          160M:  6.2%        0 :  0.0%      105M:  2.8%       19M:  6.1%     1731k
> cgraph.c:851 (create_edge)                             248M:  9.5%        0 :  0.0%       70M:  1.9%        0 :  0.0%     3141k
> cgraph.h:2712 (allocate_cgraph_symbol)                 383M: 14.7%        0 :  0.0%      155M:  4.1%        0 :  0.0%     1567k
> tree-streamer-in.c:631 (streamer_alloc_tree)           718M: 27.5%      136M: 27.5%     1267M: 33.3%       64M: 20.6%       15M
> --------------------------------------------------------------------------------------------------------------------------------------------
> GGC memory                                              Leak          Garbage            Freed        Overhead            Times
> --------------------------------------------------------------------------------------------------------------------------------------------
> Total                                                 2609M:100.0%      497M:100.0%     3804M:100.0%      313M:100.0%       49M
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> This looks more realistic. ssa_operands and init_ssanames shows that we
> read really a lot of bodies into memory. I also wonder if we realy want
> to compute virutal ssa form for them when we only want to compare them.
>
> After reading and symbol table merging I get:
>
> cgraph.h:2712 (allocate_cgraph_symbol)                 148M:  7.1%        0 :  0.0%      115M:  6.7%        0 :  0.0%      767k
>
> So it seems that about half of callgrpah nodes are inline clones, so
> working on reducing clone overhead (in addition to re-visiting tree
> merging once again) seems to be most meaningful right now.
>
> OK if patch passes testing?

OK.

>         * ggc-common.c (ggc_prune_overhead_list): Do not throw surviving
>         memory allocations away.
>         * mem-stats.h (mem_alloc_description<T>::release_object_overhead):
>         do not silently ignore invalid release requests.
> Index: ggc-common.c
> ===================================================================
> --- ggc-common.c        (revision 277796)
> +++ ggc-common.c        (working copy)
> @@ -1003,10 +1003,10 @@ ggc_prune_overhead_list (void)
>
>    for (; it != ggc_mem_desc.m_reverse_object_map->end (); ++it)
>      if (!ggc_marked_p ((*it).first))
> -      (*it).second.first->m_collected += (*it).second.second;
> -
> -  delete ggc_mem_desc.m_reverse_object_map;
> -  ggc_mem_desc.m_reverse_object_map = new map_t (13, false, false, false);
> +      {
> +        (*it).second.first->m_collected += (*it).second.second;
> +       ggc_mem_desc.m_reverse_object_map->remove ((*it).first);
> +      }
>  }
>
>  /* Return memory used by heap in kb, 0 if this info is not available.  */
> Index: mem-stats.h
> ===================================================================
> --- mem-stats.h (revision 277796)
> +++ mem-stats.h (working copy)
> @@ -535,11 +535,8 @@ inline void
>  mem_alloc_description<T>::release_object_overhead (void *ptr)
>  {
>    std::pair <T *, size_t> *entry = m_reverse_object_map->get (ptr);
> -  if (entry)
> -    {
> -      entry->first->release_overhead (entry->second);
> -      m_reverse_object_map->remove (ptr);
> -    }
> +  entry->first->release_overhead (entry->second);
> +  m_reverse_object_map->remove (ptr);
>  }
>
>  /* Unregister a memory allocation descriptor registered with
Martin Liška Nov. 6, 2019, 3:21 p.m. UTC | #10
On 11/5/19 6:53 PM, Jan Hubicka wrote:
> Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
> all active allocations objects and looks if they was collected accoutnig
> their collection and then throws away all allocations (including those
> not colelcted) and those gets no longer accounted later.  So we
> basically misaccount everything that survives ggc_collect.

I've just read the patch and it's correct. It was really a bogus.

Thanks for it,
Martin

Patch
diff mbox series

Index: hsa-brig.c
===================================================================
--- hsa-brig.c	(revision 277796)
+++ hsa-brig.c	(working copy)
@@ -44,6 +44,7 @@  along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "dumpfile.h"
 #include "print-tree.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 #include "gomp-constants.h"
Index: hsa-dump.c
===================================================================
--- hsa-dump.c	(revision 277796)
+++ hsa-dump.c	(working copy)
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.
 #include "gimple-pretty-print.h"
 #include "cgraph.h"
 #include "print-tree.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 
Index: hsa-gen.c
===================================================================
--- hsa-gen.c	(revision 277796)
+++ hsa-gen.c	(working copy)
@@ -48,6 +48,7 @@  along with GCC; see the file COPYING3.
 #include "ssa-iterators.h"
 #include "cgraph.h"
 #include "print-tree.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 #include "cfghooks.h"
Index: hsa-regalloc.c
===================================================================
--- hsa-regalloc.c	(revision 277796)
+++ hsa-regalloc.c	(working copy)
@@ -35,6 +35,7 @@  along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "print-tree.h"
 #include "cfghooks.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 
Index: ipa-hsa.c
===================================================================
--- ipa-hsa.c	(revision 277796)
+++ ipa-hsa.c	(working copy)
@@ -40,6 +40,7 @@  along with GCC; see the file COPYING3.
 #include "stringpool.h"
 #include "cgraph.h"
 #include "print-tree.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 
Index: ipa-predicate.c
===================================================================
--- ipa-predicate.c	(revision 277796)
+++ ipa-predicate.c	(working copy)
@@ -25,8 +25,8 @@  along with GCC; see the file COPYING3.
 #include "tree.h"
 #include "cgraph.h"
 #include "tree-vrp.h"
-#include "symbol-summary.h"
 #include "alloc-pool.h"
+#include "symbol-summary.h"
 #include "ipa-prop.h"
 #include "ipa-fnsummary.h"
 #include "real.h"
Index: ipa-reference.c
===================================================================
--- ipa-reference.c	(revision 277796)
+++ ipa-reference.c	(working copy)
@@ -48,6 +48,7 @@  along with GCC; see the file COPYING3.
 #include "calls.h"
 #include "ipa-utils.h"
 #include "ipa-reference.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 
 /* The static variables defined within the compilation unit that are
Index: ipa-sra.c
===================================================================
--- ipa-sra.c	(revision 277796)
+++ ipa-sra.c	(working copy)
@@ -75,6 +75,7 @@  along with GCC; see the file COPYING3.
 #include "gimple-walk.h"
 #include "tree-dfa.h"
 #include "tree-sra.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "params.h"
 #include "dbgcnt.h"
Index: omp-expand.c
===================================================================
--- omp-expand.c	(revision 277796)
+++ omp-expand.c	(working copy)
@@ -52,6 +52,7 @@  along with GCC; see the file COPYING3.
 #include "omp-general.h"
 #include "omp-offload.h"
 #include "tree-cfgcleanup.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "gomp-constants.h"
 #include "gimple-pretty-print.h"
Index: omp-general.c
===================================================================
--- omp-general.c	(revision 277796)
+++ omp-general.c	(working copy)
@@ -37,6 +37,7 @@  along with GCC; see the file COPYING3.
 #include "attribs.h"
 #include "gimplify.h"
 #include "cgraph.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "hsa-common.h"
 #include "tree-pass.h"
Index: omp-low.c
===================================================================
--- omp-low.c	(revision 277796)
+++ omp-low.c	(working copy)
@@ -52,6 +52,7 @@  along with GCC; see the file COPYING3.
 #include "omp-low.h"
 #include "omp-grid.h"
 #include "gimple-low.h"
+#include "alloc-pool.h"
 #include "symbol-summary.h"
 #include "tree-nested.h"
 #include "context.h"
Index: symbol-summary.h
===================================================================
--- symbol-summary.h	(revision 277796)
+++ symbol-summary.h	(working copy)
@@ -28,8 +28,10 @@  class function_summary_base
 {
 public:
   /* Default construction takes SYMTAB as an argument.  */
-  function_summary_base (symbol_table *symtab): m_symtab (symtab),
-  m_insertion_enabled (true)
+  function_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO):
+  m_symtab (symtab),
+  m_insertion_enabled (true),
+  allocator ("function summary" PASS_MEM_STAT)
   {}
 
   /* Basic implementation of insert operation.  */
@@ -59,7 +61,8 @@  protected:
   {
     /* Call gcc_internal_because we do not want to call finalizer for
        a type T.  We call dtor explicitly.  */
-    return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () : new T () ;
+    return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T ()
+		     : allocator.allocate () ;
   }
 
   /* Release an item that is stored within map.  */
@@ -71,7 +74,7 @@  protected:
 	ggc_free (item);
       }
     else
-      delete item;
+      allocator.remove (item);
   }
 
   /* Unregister all call-graph hooks.  */
@@ -92,6 +95,7 @@  protected:
 private:
   /* Return true when the summary uses GGC memory for allocation.  */
   virtual bool is_ggc () = 0;
+  object_allocator<T> allocator;
 };
 
 template <typename T>
@@ -215,9 +219,8 @@  private:
 template <typename T>
 function_summary<T *>::function_summary (symbol_table *symtab, bool ggc
 					 MEM_STAT_DECL):
-  function_summary_base<T> (symtab), m_ggc (ggc), m_map (13, ggc, true,
-							 GATHER_STATISTICS
-							 PASS_MEM_STAT)
+  function_summary_base<T> (symtab PASS_MEM_STAT), m_ggc (ggc),
+  m_map (13, ggc, true, GATHER_STATISTICS PASS_MEM_STAT)
 {
   this->m_symtab_insertion_hook
     = this->m_symtab->add_cgraph_insertion_hook (function_summary::symtab_insertion,
@@ -411,7 +414,7 @@  private:
 
 template <typename T, typename V>
 fast_function_summary<T *, V>::fast_function_summary (symbol_table *symtab MEM_STAT_DECL):
-  function_summary_base<T> (symtab), m_vector (NULL)
+  function_summary_base<T> (symtab PASS_MEM_STAT), m_vector (NULL)
 {
   vec_alloc (m_vector, 13 PASS_MEM_STAT);
   this->m_symtab_insertion_hook
@@ -531,8 +534,10 @@  class call_summary_base
 {
 public:
   /* Default construction takes SYMTAB as an argument.  */
-  call_summary_base (symbol_table *symtab): m_symtab (symtab),
-  m_initialize_when_cloning (false)
+  call_summary_base (symbol_table *symtab CXX_MEM_STAT_INFO):
+  m_symtab (symtab),
+  m_initialize_when_cloning (false),
+  allocator ("call summary" PASS_MEM_STAT)
   {}
 
   /* Basic implementation of removal operation.  */
@@ -547,7 +552,8 @@  protected:
   {
     /* Call gcc_internal_because we do not want to call finalizer for
        a type T.  We call dtor explicitly.  */
-    return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T () : new T () ;
+    return is_ggc () ? new (ggc_internal_alloc (sizeof (T))) T ()
+		     : allocator.allocate ();
   }
 
   /* Release an item that is stored within map.  */
@@ -559,7 +565,7 @@  protected:
 	ggc_free (item);
       }
     else
-      delete item;
+      allocator.remove (item);
   }
 
   /* Unregister all call-graph hooks.  */
@@ -578,6 +584,7 @@  protected:
 private:
   /* Return true when the summary uses GGC memory for allocation.  */
   virtual bool is_ggc () = 0;
+  object_allocator<T> allocator;
 };
 
 template <typename T>
@@ -607,9 +614,8 @@  public:
   /* Default construction takes SYMTAB as an argument.  */
   call_summary (symbol_table *symtab, bool ggc = false
 		CXX_MEM_STAT_INFO)
-  : call_summary_base<T> (symtab), m_ggc (ggc), m_map (13, ggc, true,
-						       GATHER_STATISTICS
-						       PASS_MEM_STAT)
+  : call_summary_base<T> (symtab PASS_MEM_STAT), m_ggc (ggc),
+    m_map (13, ggc, true, GATHER_STATISTICS PASS_MEM_STAT)
   {
     this->m_symtab_removal_hook
       = this->m_symtab->add_edge_removal_hook (call_summary::symtab_removal,
@@ -775,7 +781,7 @@  class GTY((user)) fast_call_summary <T *
 public:
   /* Default construction takes SYMTAB as an argument.  */
   fast_call_summary (symbol_table *symtab CXX_MEM_STAT_INFO)
-  : call_summary_base<T> (symtab), m_vector (NULL)
+  : call_summary_base<T> (symtab PASS_MEM_STAT), m_vector (NULL)
   {
     vec_alloc (m_vector, 13 PASS_MEM_STAT);
     this->m_symtab_removal_hook