Free memory_block pool
diff mbox series

Message ID 20191120150314.yqcngkw6adhipvkc@kam.mff.cuni.cz
State New
Headers show
Series
  • Free memory_block pool
Related show

Commit Message

Jan Hubicka Nov. 20, 2019, 3:03 p.m. UTC
Hi,
I have noticed that for Firefox around 1GB of peak memory use goes into
the fact that we never free memory_block_pool::freelist.

This patch adds memory_block_pool::trim which reduces freelist to a given
size.  It is called from ggc_collect which is a convenient place to return
heap allocations too and fully freeed prior forking in ggc_collect.

I originaly was freeing block directly in memory_block_pool::release
but that makes it non-leaf function which prevents optimization.
So I decided to go this way we get tiny bit better code
given that we already have ggc_collect that is conveninet place
to do such a bookeeping.

Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK?

Honza

	* memory-block.h (memory_block_pool::freelist): New constant.
	(memory_block_pool::clear_free_list): Rename to ...
	(memory_block_pool::reduce_free_list): ... this.
	(memory_block_pool::trim): New function.
	(memory_block_pool::block_list): Add m_prev.
	(memory_block_pool::m_num_blocks): New field.
	(memory_block_pool::m_block_end): New field.
	(memory_block_pool::allocate): Maintain m_num_blocks and m_blocks_end.
	(memory_block_pool::release): Likewise.
	* memory-block.cc (memory_block_pool::memory_block_pool): Initialize
	new fields.
	(memory_block_pool::clear_free_list): Rename to ...
	(memory_block_pool::reduce_free_list): ... this one; free from end
	and add NUM parameter.
	(memory_block_pool::trim): New.
	* ggc-page.c (ggc_collect): Call memory_block_pool::trim.

	* lto.c: Call memory_block_pool::trim.

Comments

Martin Liška Nov. 20, 2019, 3:05 p.m. UTC | #1
On 11/20/19 4:03 PM, Jan Hubicka wrote:
> I have noticed that for Firefox around 1GB of peak memory use goes into
> the fact that we never free memory_block_pool::freelist.

Great, can you reduce the peak memory by the 1GB with the patch?

Thanks,
Martin
Jan Hubicka Nov. 20, 2019, 3:08 p.m. UTC | #2
> On 11/20/19 4:03 PM, Jan Hubicka wrote:
> > I have noticed that for Firefox around 1GB of peak memory use goes into
> > the fact that we never free memory_block_pool::freelist.
> 
> Great, can you reduce the peak memory by the 1GB with the patch?

About half of it, approx 0.5GB (8.3GB -> 7.8GB) judging from memory use
graphs.  I was checking and malloc returns to system everything over
128k (at least it claims so) and malloc_trim does not make much
difference.  I am not sure how the vmstat reacts to MADV_DONTNEED.

Honza
Richard Biener Nov. 21, 2019, 7:51 a.m. UTC | #3
On Wed, 20 Nov 2019, Jan Hubicka wrote:

> Hi,
> I have noticed that for Firefox around 1GB of peak memory use goes into
> the fact that we never free memory_block_pool::freelist.
> 
> This patch adds memory_block_pool::trim which reduces freelist to a given
> size.  It is called from ggc_collect which is a convenient place to return
> heap allocations too and fully freeed prior forking in ggc_collect.
> 
> I originaly was freeing block directly in memory_block_pool::release
> but that makes it non-leaf function which prevents optimization.
> So I decided to go this way we get tiny bit better code
> given that we already have ggc_collect that is conveninet place
> to do such a bookeeping.
> 
> Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK?

Huh, I think given that trimming happens explicitely the whole
overhead of counting the number of freelist entries plus adding
a prev member to the block list is useless overhead.  In trim
just do

 block_list **tailp = &m_blocks;
 for (unsigned cnt = freelist_size; cnt != 0 && *tailp; --cnt)
   tailp = &(*tailp)->next;
 while (*tailp)
   free blocks starting from here

the list walk is O(constant) and the above would just mean adding
a single method rather than cahnges all over the place.

Richard.

> Honza
> 
> 	* memory-block.h (memory_block_pool::freelist): New constant.
> 	(memory_block_pool::clear_free_list): Rename to ...
> 	(memory_block_pool::reduce_free_list): ... this.
> 	(memory_block_pool::trim): New function.
> 	(memory_block_pool::block_list): Add m_prev.
> 	(memory_block_pool::m_num_blocks): New field.
> 	(memory_block_pool::m_block_end): New field.
> 	(memory_block_pool::allocate): Maintain m_num_blocks and m_blocks_end.
> 	(memory_block_pool::release): Likewise.
> 	* memory-block.cc (memory_block_pool::memory_block_pool): Initialize
> 	new fields.
> 	(memory_block_pool::clear_free_list): Rename to ...
> 	(memory_block_pool::reduce_free_list): ... this one; free from end
> 	and add NUM parameter.
> 	(memory_block_pool::trim): New.
> 	* ggc-page.c (ggc_collect): Call memory_block_pool::trim.
> 
> 	* lto.c: Call memory_block_pool::trim.
> Index: memory-block.h
> ===================================================================
> --- memory-block.h	(revision 278464)
> +++ memory-block.h	(working copy)
> @@ -28,12 +28,15 @@ class memory_block_pool
>  public:
>    /* Blocks have fixed size.  This is necessary for sharing.  */
>    static const size_t block_size = 64 * 1024;
> +  /* Number of blocks we keep in the freelists.  */
> +  static const size_t freelist_size = 1024 * 1024 / block_size;
>  
>    memory_block_pool ();
>  
>    static inline void *allocate () ATTRIBUTE_MALLOC;
>    static inline void release (void *);
> -  void clear_free_list ();
> +  static void trim (int nblocks = freelist_size);
> +  void reduce_free_list (int);
>  
>  private:
>    /* memory_block_pool singleton instance, defined in memory-block.cc.  */
> @@ -42,10 +45,13 @@ private:
>    struct block_list
>    {
>      block_list *m_next;
> +    block_list *m_prev;
>    };
>  
>    /* Free list.  */
>    block_list *m_blocks;
> +  block_list *m_blocks_end;
> +  int m_num_blocks;
>  };
>  
>  /* Allocate a single block.  Reuse a previously returned block, if possible.  */
> @@ -57,6 +63,9 @@ memory_block_pool::allocate ()
>  
>    void *result = instance.m_blocks;
>    instance.m_blocks = instance.m_blocks->m_next;
> +  instance.m_num_blocks--;
> +  if (!instance.m_blocks)
> +    instance.m_blocks_end = NULL;
>    VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (result, block_size));
>    return result;
>  }
> @@ -67,7 +76,12 @@ memory_block_pool::release (void *uncast
>  {
>    block_list *block = new (uncast_block) block_list;
>    block->m_next = instance.m_blocks;
> +  if (instance.m_blocks)
> +    instance.m_blocks->m_prev = block;
> +  else
> +    instance.m_blocks_end = block;
>    instance.m_blocks = block;
> +  instance.m_num_blocks++;
>  
>    VALGRIND_DISCARD (VALGRIND_MAKE_MEM_NOACCESS ((char *)uncast_block
>  						+ sizeof (block_list),
> Index: memory-block.cc
> ===================================================================
> --- memory-block.cc	(revision 278464)
> +++ memory-block.cc	(working copy)
> @@ -26,18 +27,27 @@ along with GCC; see the file COPYING3.
>  /* Global singleton-like instance.  */
>  memory_block_pool memory_block_pool::instance;
>  
> -memory_block_pool::memory_block_pool () : m_blocks (NULL) {}
> +/* Default constructor.  */
> +memory_block_pool::memory_block_pool ()
> + : m_blocks (NULL), m_blocks_end (NULL), m_num_blocks (0)
> +{
> +}
>  
> -/* Return all blocks from free list to the OS.  */
> +/* Reduce free list to NUM blocks.  */
>  void
> -memory_block_pool::clear_free_list ()
> +memory_block_pool::reduce_free_list (int num)
>  {
> -  while (m_blocks)
> +  gcc_checking_assert (num >= 0);
> +  while (m_num_blocks > num)
>      {
> -      block_list *next = m_blocks->m_next;
> -      XDELETEVEC (m_blocks);
> -      m_blocks = next;
> +      block_list *prev = m_blocks_end->m_prev;
> +      XDELETEVEC (m_blocks_end);
> +      m_blocks_end = prev;
> +      prev->m_next = NULL;
> +      m_num_blocks--;
>      }
> +  if (!m_num_blocks)
> +    m_blocks = m_blocks_end = 0;
>  }
>  
>  /* Allocate a chunk for obstack.  Use the pool if requested chunk size matches
> @@ -62,3 +72,10 @@ mempool_obstack_chunk_free (void *chunk)
>    else
>      XDELETEVEC (chunk);
>  }
> +
> +/* Return allocated memory back to malloc (and to system).  */
> +void
> +memory_block_pool::trim (int num)
> +{
> +  instance.reduce_free_list (num);
> +}
> Index: ggc-page.c
> ===================================================================
> --- ggc-page.c	(revision 278464)
> +++ ggc-page.c	(working copy)
> @@ -2186,6 +2186,9 @@ ggc_collect (void)
>    float allocated_last_gc =
>      MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024);
>  
> +  /* It is a also good time to get memory block pool into limits.  */
> +  memory_block_pool::trim ();
> +
>    float min_expand = allocated_last_gc * param_ggc_min_expand / 100;
>    if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect)
>      return;
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 278464)
> +++ lto/lto.c	(working copy)
> @@ -387,6 +387,7 @@ lto_wpa_write_files (void)
>        temp_priority.safe_push (part->insns);
>        temp_filenames.safe_push (xstrdup (temp_filename));
>      }
> +  memory_block_pool::trim (0);
>  
>    for (int set = 0; set < MAX (lto_parallelism, 1); set++)
>      {
>
Jan Hubicka Nov. 21, 2019, 8:05 a.m. UTC | #4
> On Wed, 20 Nov 2019, Jan Hubicka wrote:
> 
> > Hi,
> > I have noticed that for Firefox around 1GB of peak memory use goes into
> > the fact that we never free memory_block_pool::freelist.
> > 
> > This patch adds memory_block_pool::trim which reduces freelist to a given
> > size.  It is called from ggc_collect which is a convenient place to return
> > heap allocations too and fully freeed prior forking in ggc_collect.
> > 
> > I originaly was freeing block directly in memory_block_pool::release
> > but that makes it non-leaf function which prevents optimization.
> > So I decided to go this way we get tiny bit better code
> > given that we already have ggc_collect that is conveninet place
> > to do such a bookeeping.
> > 
> > Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK?
> 
> Huh, I think given that trimming happens explicitely the whole
> overhead of counting the number of freelist entries plus adding
> a prev member to the block list is useless overhead.  In trim
> just do
> 
>  block_list **tailp = &m_blocks;
>  for (unsigned cnt = freelist_size; cnt != 0 && *tailp; --cnt)
>    tailp = &(*tailp)->next;
>  while (*tailp)
>    free blocks starting from here
> 
> the list walk is O(constant) and the above would just mean adding
> a single method rather than cahnges all over the place.

Hmm, you are right - I changed the code bit way too many times and lost
track :))

Will implement your suggestion.
Honza
Jan Hubicka Nov. 21, 2019, 8:27 p.m. UTC | #5
Hi,
here is updated patch.  Bootstrapped/regtested x86_64-linux, OK?

Honza

	* ggc-page.c (ggc_collect): Call memory_block_pool::trim.
	* memory-block.cc (memory_block_pool::clear_free_list): Rename to ...
	(memory_block_pool::reduce_free_list): ... this one.
	(memory_block_pool::trim): New static function.
	* memory-block.h (memory_block_pool::freelist_size): New constant
	(memory_block_pool::clear_free_list): Rename to ...
	(memory_block_pool::reduce_free_list): ... this one.
	(memory_block_pool::trim): Declare.
	
	* lto.c (lto_wpa_write_files): Call memory_block_pool::trim.
Index: ggc-page.c
===================================================================
--- ggc-page.c	(revision 278570)
+++ ggc-page.c	(working copy)
@@ -2186,6 +2186,9 @@ ggc_collect (void)
   float allocated_last_gc =
     MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024);
 
+  /* It is also good time to get memory block pool into limits.  */
+  memory_block_pool::trim ();
+
   float min_expand = allocated_last_gc * param_ggc_min_expand / 100;
   if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect)
     return;
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 278570)
+++ lto/lto.c	(working copy)
@@ -387,6 +387,7 @@ lto_wpa_write_files (void)
       temp_priority.safe_push (part->insns);
       temp_filenames.safe_push (xstrdup (temp_filename));
     }
+  memory_block_pool::trim (0);
 
   for (int set = 0; set < MAX (lto_parallelism, 1); set++)
     {
Index: memory-block.cc
===================================================================
--- memory-block.cc	(revision 278570)
+++ memory-block.cc	(working copy)
@@ -28,15 +28,30 @@ memory_block_pool memory_block_pool::ins
 
 memory_block_pool::memory_block_pool () : m_blocks (NULL) {}
 
-/* Return all blocks from free list to the OS.  */
+/* Reduce free list to NUM blocks and return remaining to malloc.  */
 void
-memory_block_pool::clear_free_list ()
+memory_block_pool::reduce_free_list (int num)
 {
-  while (m_blocks)
+  block_list **blocks = &m_blocks;
+
+  /* First skip NUM blocks.  */
+
+  for (;num > 0 && *blocks; num--)
+    blocks = &(*blocks)->m_next;
+
+  if (!*blocks)
+    return;
+
+  /* And free the remainder of them.  */
+
+  block_list *to_free = *blocks;
+  *blocks = NULL;
+
+  while (to_free)
     {
-      block_list *next = m_blocks->m_next;
-      XDELETEVEC (m_blocks);
-      m_blocks = next;

+      block_list *next = to_free->m_next;
+      XDELETEVEC (to_free);
+      to_free = next;
     }
 }
 
@@ -62,3 +77,10 @@ mempool_obstack_chunk_free (void *chunk)
   else
     XDELETEVEC (chunk);
 }
+
+/* Return allocated memory back to malloc (and to system).  */
+void
+memory_block_pool::trim (int num)
+{
+  instance.reduce_free_list (num);
+}
Index: memory-block.h
===================================================================
--- memory-block.h	(revision 278570)
+++ memory-block.h	(working copy)
@@ -28,12 +28,15 @@ class memory_block_pool
 public:
   /* Blocks have fixed size.  This is necessary for sharing.  */
   static const size_t block_size = 64 * 1024;
+  /* Number of blocks we keep in the freelists.  */
+  static const size_t freelist_size = 1024 * 1024 / block_size;
 
   memory_block_pool ();
 
   static inline void *allocate () ATTRIBUTE_MALLOC;
   static inline void release (void *);
-  void clear_free_list ();
+  static void trim (int nblocks = freelist_size);
+  void reduce_free_list (int);
 
 private:
   /* memory_block_pool singleton instance, defined in memory-block.cc.  */
Richard Biener Nov. 22, 2019, 6:26 a.m. UTC | #6
On November 21, 2019 9:27:15 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>Hi,
>here is updated patch.  Bootstrapped/regtested x86_64-linux, OK?

Ok. 

Richard. 

>Honza
>
>	* ggc-page.c (ggc_collect): Call memory_block_pool::trim.
>	* memory-block.cc (memory_block_pool::clear_free_list): Rename to ...
>	(memory_block_pool::reduce_free_list): ... this one.
>	(memory_block_pool::trim): New static function.
>	* memory-block.h (memory_block_pool::freelist_size): New constant
>	(memory_block_pool::clear_free_list): Rename to ...
>	(memory_block_pool::reduce_free_list): ... this one.
>	(memory_block_pool::trim): Declare.
>	
>	* lto.c (lto_wpa_write_files): Call memory_block_pool::trim.
>Index: ggc-page.c
>===================================================================
>--- ggc-page.c	(revision 278570)
>+++ ggc-page.c	(working copy)
>@@ -2186,6 +2186,9 @@ ggc_collect (void)
>   float allocated_last_gc =
>     MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024);
> 
>+  /* It is also good time to get memory block pool into limits.  */
>+  memory_block_pool::trim ();
>+
>   float min_expand = allocated_last_gc * param_ggc_min_expand / 100;
>if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect)
>     return;
>Index: lto/lto.c
>===================================================================
>--- lto/lto.c	(revision 278570)
>+++ lto/lto.c	(working copy)
>@@ -387,6 +387,7 @@ lto_wpa_write_files (void)
>       temp_priority.safe_push (part->insns);
>       temp_filenames.safe_push (xstrdup (temp_filename));
>     }
>+  memory_block_pool::trim (0);
> 
>   for (int set = 0; set < MAX (lto_parallelism, 1); set++)
>     {
>Index: memory-block.cc
>===================================================================
>--- memory-block.cc	(revision 278570)
>+++ memory-block.cc	(working copy)
>@@ -28,15 +28,30 @@ memory_block_pool memory_block_pool::ins
> 
> memory_block_pool::memory_block_pool () : m_blocks (NULL) {}
> 
>-/* Return all blocks from free list to the OS.  */
>+/* Reduce free list to NUM blocks and return remaining to malloc.  */
> void
>-memory_block_pool::clear_free_list ()
>+memory_block_pool::reduce_free_list (int num)
> {
>-  while (m_blocks)
>+  block_list **blocks = &m_blocks;
>+
>+  /* First skip NUM blocks.  */
>+
>+  for (;num > 0 && *blocks; num--)
>+    blocks = &(*blocks)->m_next;
>+
>+  if (!*blocks)
>+    return;
>+
>+  /* And free the remainder of them.  */
>+
>+  block_list *to_free = *blocks;
>+  *blocks = NULL;
>+
>+  while (to_free)
>     {
>-      block_list *next = m_blocks->m_next;
>-      XDELETEVEC (m_blocks);
>-      m_blocks = next;
>
>+      block_list *next = to_free->m_next;
>+      XDELETEVEC (to_free);
>+      to_free = next;
>     }
> }
> 
>@@ -62,3 +77,10 @@ mempool_obstack_chunk_free (void *chunk)
>   else
>     XDELETEVEC (chunk);
> }
>+
>+/* Return allocated memory back to malloc (and to system).  */
>+void
>+memory_block_pool::trim (int num)
>+{
>+  instance.reduce_free_list (num);
>+}
>Index: memory-block.h
>===================================================================
>--- memory-block.h	(revision 278570)
>+++ memory-block.h	(working copy)
>@@ -28,12 +28,15 @@ class memory_block_pool
> public:
>   /* Blocks have fixed size.  This is necessary for sharing.  */
>   static const size_t block_size = 64 * 1024;
>+  /* Number of blocks we keep in the freelists.  */
>+  static const size_t freelist_size = 1024 * 1024 / block_size;
> 
>   memory_block_pool ();
> 
>   static inline void *allocate () ATTRIBUTE_MALLOC;
>   static inline void release (void *);
>-  void clear_free_list ();
>+  static void trim (int nblocks = freelist_size);
>+  void reduce_free_list (int);
> 
> private:
>/* memory_block_pool singleton instance, defined in memory-block.cc. 
>*/

Patch
diff mbox series

Index: memory-block.h
===================================================================
--- memory-block.h	(revision 278464)
+++ memory-block.h	(working copy)
@@ -28,12 +28,15 @@  class memory_block_pool
 public:
   /* Blocks have fixed size.  This is necessary for sharing.  */
   static const size_t block_size = 64 * 1024;
+  /* Number of blocks we keep in the freelists.  */
+  static const size_t freelist_size = 1024 * 1024 / block_size;
 
   memory_block_pool ();
 
   static inline void *allocate () ATTRIBUTE_MALLOC;
   static inline void release (void *);
-  void clear_free_list ();
+  static void trim (int nblocks = freelist_size);
+  void reduce_free_list (int);
 
 private:
   /* memory_block_pool singleton instance, defined in memory-block.cc.  */
@@ -42,10 +45,13 @@  private:
   struct block_list
   {
     block_list *m_next;
+    block_list *m_prev;
   };
 
   /* Free list.  */
   block_list *m_blocks;
+  block_list *m_blocks_end;
+  int m_num_blocks;
 };
 
 /* Allocate a single block.  Reuse a previously returned block, if possible.  */
@@ -57,6 +63,9 @@  memory_block_pool::allocate ()
 
   void *result = instance.m_blocks;
   instance.m_blocks = instance.m_blocks->m_next;
+  instance.m_num_blocks--;
+  if (!instance.m_blocks)
+    instance.m_blocks_end = NULL;
   VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (result, block_size));
   return result;
 }
@@ -67,7 +76,12 @@  memory_block_pool::release (void *uncast
 {
   block_list *block = new (uncast_block) block_list;
   block->m_next = instance.m_blocks;
+  if (instance.m_blocks)
+    instance.m_blocks->m_prev = block;
+  else
+    instance.m_blocks_end = block;
   instance.m_blocks = block;
+  instance.m_num_blocks++;
 
   VALGRIND_DISCARD (VALGRIND_MAKE_MEM_NOACCESS ((char *)uncast_block
 						+ sizeof (block_list),
Index: memory-block.cc
===================================================================
--- memory-block.cc	(revision 278464)
+++ memory-block.cc	(working copy)
@@ -26,18 +27,27 @@  along with GCC; see the file COPYING3.
 /* Global singleton-like instance.  */
 memory_block_pool memory_block_pool::instance;
 
-memory_block_pool::memory_block_pool () : m_blocks (NULL) {}
+/* Default constructor.  */
+memory_block_pool::memory_block_pool ()
+ : m_blocks (NULL), m_blocks_end (NULL), m_num_blocks (0)
+{
+}
 
-/* Return all blocks from free list to the OS.  */
+/* Reduce free list to NUM blocks.  */
 void
-memory_block_pool::clear_free_list ()
+memory_block_pool::reduce_free_list (int num)
 {
-  while (m_blocks)
+  gcc_checking_assert (num >= 0);
+  while (m_num_blocks > num)
     {
-      block_list *next = m_blocks->m_next;
-      XDELETEVEC (m_blocks);
-      m_blocks = next;
+      block_list *prev = m_blocks_end->m_prev;
+      XDELETEVEC (m_blocks_end);
+      m_blocks_end = prev;
+      prev->m_next = NULL;
+      m_num_blocks--;
     }
+  if (!m_num_blocks)
+    m_blocks = m_blocks_end = 0;
 }
 
 /* Allocate a chunk for obstack.  Use the pool if requested chunk size matches
@@ -62,3 +72,10 @@  mempool_obstack_chunk_free (void *chunk)
   else
     XDELETEVEC (chunk);
 }
+
+/* Return allocated memory back to malloc (and to system).  */
+void
+memory_block_pool::trim (int num)
+{
+  instance.reduce_free_list (num);
+}
Index: ggc-page.c
===================================================================
--- ggc-page.c	(revision 278464)
+++ ggc-page.c	(working copy)
@@ -2186,6 +2186,9 @@  ggc_collect (void)
   float allocated_last_gc =
     MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024);
 
+  /* It is a also good time to get memory block pool into limits.  */
+  memory_block_pool::trim ();
+
   float min_expand = allocated_last_gc * param_ggc_min_expand / 100;
   if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect)
     return;
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 278464)
+++ lto/lto.c	(working copy)
@@ -387,6 +387,7 @@  lto_wpa_write_files (void)
       temp_priority.safe_push (part->insns);
       temp_filenames.safe_push (xstrdup (temp_filename));
     }
+  memory_block_pool::trim (0);
 
   for (int set = 0; set < MAX (lto_parallelism, 1); set++)
     {