Message ID | 20191120150314.yqcngkw6adhipvkc@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Free memory_block pool | expand |
On 11/20/19 4:03 PM, Jan Hubicka wrote: > I have noticed that for Firefox around 1GB of peak memory use goes into > the fact that we never free memory_block_pool::freelist. Great, can you reduce the peak memory by the 1GB with the patch? Thanks, Martin
> On 11/20/19 4:03 PM, Jan Hubicka wrote: > > I have noticed that for Firefox around 1GB of peak memory use goes into > > the fact that we never free memory_block_pool::freelist. > > Great, can you reduce the peak memory by the 1GB with the patch? About half of it, approx 0.5GB (8.3GB -> 7.8GB) judging from memory use graphs. I was checking and malloc returns to system everything over 128k (at least it claims so) and malloc_trim does not make much difference. I am not sure how the vmstat reacts to MADV_DONTNEED. Honza
On Wed, 20 Nov 2019, Jan Hubicka wrote: > Hi, > I have noticed that for Firefox around 1GB of peak memory use goes into > the fact that we never free memory_block_pool::freelist. > > This patch adds memory_block_pool::trim which reduces freelist to a given > size. It is called from ggc_collect which is a convenient place to return > heap allocations too and fully freeed prior forking in ggc_collect. > > I originaly was freeing block directly in memory_block_pool::release > but that makes it non-leaf function which prevents optimization. > So I decided to go this way we get tiny bit better code > given that we already have ggc_collect that is conveninet place > to do such a bookeeping. > > Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK? Huh, I think given that trimming happens explicitely the whole overhead of counting the number of freelist entries plus adding a prev member to the block list is useless overhead. In trim just do block_list **tailp = &m_blocks; for (unsigned cnt = freelist_size; cnt != 0 && *tailp; --cnt) tailp = &(*tailp)->next; while (*tailp) free blocks starting from here the list walk is O(constant) and the above would just mean adding a single method rather than cahnges all over the place. Richard. > Honza > > * memory-block.h (memory_block_pool::freelist): New constant. > (memory_block_pool::clear_free_list): Rename to ... > (memory_block_pool::reduce_free_list): ... this. > (memory_block_pool::trim): New function. > (memory_block_pool::block_list): Add m_prev. > (memory_block_pool::m_num_blocks): New field. > (memory_block_pool::m_block_end): New field. > (memory_block_pool::allocate): Maintain m_num_blocks and m_blocks_end. > (memory_block_pool::release): Likewise. > * memory-block.cc (memory_block_pool::memory_block_pool): Initialize > new fields. > (memory_block_pool::clear_free_list): Rename to ... > (memory_block_pool::reduce_free_list): ... this one; free from end > and add NUM parameter. > (memory_block_pool::trim): New. > * ggc-page.c (ggc_collect): Call memory_block_pool::trim. > > * lto.c: Call memory_block_pool::trim. > Index: memory-block.h > =================================================================== > --- memory-block.h (revision 278464) > +++ memory-block.h (working copy) > @@ -28,12 +28,15 @@ class memory_block_pool > public: > /* Blocks have fixed size. This is necessary for sharing. */ > static const size_t block_size = 64 * 1024; > + /* Number of blocks we keep in the freelists. */ > + static const size_t freelist_size = 1024 * 1024 / block_size; > > memory_block_pool (); > > static inline void *allocate () ATTRIBUTE_MALLOC; > static inline void release (void *); > - void clear_free_list (); > + static void trim (int nblocks = freelist_size); > + void reduce_free_list (int); > > private: > /* memory_block_pool singleton instance, defined in memory-block.cc. */ > @@ -42,10 +45,13 @@ private: > struct block_list > { > block_list *m_next; > + block_list *m_prev; > }; > > /* Free list. */ > block_list *m_blocks; > + block_list *m_blocks_end; > + int m_num_blocks; > }; > > /* Allocate a single block. Reuse a previously returned block, if possible. */ > @@ -57,6 +63,9 @@ memory_block_pool::allocate () > > void *result = instance.m_blocks; > instance.m_blocks = instance.m_blocks->m_next; > + instance.m_num_blocks--; > + if (!instance.m_blocks) > + instance.m_blocks_end = NULL; > VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (result, block_size)); > return result; > } > @@ -67,7 +76,12 @@ memory_block_pool::release (void *uncast > { > block_list *block = new (uncast_block) block_list; > block->m_next = instance.m_blocks; > + if (instance.m_blocks) > + instance.m_blocks->m_prev = block; > + else > + instance.m_blocks_end = block; > instance.m_blocks = block; > + instance.m_num_blocks++; > > VALGRIND_DISCARD (VALGRIND_MAKE_MEM_NOACCESS ((char *)uncast_block > + sizeof (block_list), > Index: memory-block.cc > =================================================================== > --- memory-block.cc (revision 278464) > +++ memory-block.cc (working copy) > @@ -26,18 +27,27 @@ along with GCC; see the file COPYING3. > /* Global singleton-like instance. */ > memory_block_pool memory_block_pool::instance; > > -memory_block_pool::memory_block_pool () : m_blocks (NULL) {} > +/* Default constructor. */ > +memory_block_pool::memory_block_pool () > + : m_blocks (NULL), m_blocks_end (NULL), m_num_blocks (0) > +{ > +} > > -/* Return all blocks from free list to the OS. */ > +/* Reduce free list to NUM blocks. */ > void > -memory_block_pool::clear_free_list () > +memory_block_pool::reduce_free_list (int num) > { > - while (m_blocks) > + gcc_checking_assert (num >= 0); > + while (m_num_blocks > num) > { > - block_list *next = m_blocks->m_next; > - XDELETEVEC (m_blocks); > - m_blocks = next; > + block_list *prev = m_blocks_end->m_prev; > + XDELETEVEC (m_blocks_end); > + m_blocks_end = prev; > + prev->m_next = NULL; > + m_num_blocks--; > } > + if (!m_num_blocks) > + m_blocks = m_blocks_end = 0; > } > > /* Allocate a chunk for obstack. Use the pool if requested chunk size matches > @@ -62,3 +72,10 @@ mempool_obstack_chunk_free (void *chunk) > else > XDELETEVEC (chunk); > } > + > +/* Return allocated memory back to malloc (and to system). */ > +void > +memory_block_pool::trim (int num) > +{ > + instance.reduce_free_list (num); > +} > Index: ggc-page.c > =================================================================== > --- ggc-page.c (revision 278464) > +++ ggc-page.c (working copy) > @@ -2186,6 +2186,9 @@ ggc_collect (void) > float allocated_last_gc = > MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024); > > + /* It is a also good time to get memory block pool into limits. */ > + memory_block_pool::trim (); > + > float min_expand = allocated_last_gc * param_ggc_min_expand / 100; > if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect) > return; > Index: lto/lto.c > =================================================================== > --- lto/lto.c (revision 278464) > +++ lto/lto.c (working copy) > @@ -387,6 +387,7 @@ lto_wpa_write_files (void) > temp_priority.safe_push (part->insns); > temp_filenames.safe_push (xstrdup (temp_filename)); > } > + memory_block_pool::trim (0); > > for (int set = 0; set < MAX (lto_parallelism, 1); set++) > { >
> On Wed, 20 Nov 2019, Jan Hubicka wrote: > > > Hi, > > I have noticed that for Firefox around 1GB of peak memory use goes into > > the fact that we never free memory_block_pool::freelist. > > > > This patch adds memory_block_pool::trim which reduces freelist to a given > > size. It is called from ggc_collect which is a convenient place to return > > heap allocations too and fully freeed prior forking in ggc_collect. > > > > I originaly was freeing block directly in memory_block_pool::release > > but that makes it non-leaf function which prevents optimization. > > So I decided to go this way we get tiny bit better code > > given that we already have ggc_collect that is conveninet place > > to do such a bookeeping. > > > > Bootstrapped/regtested x86_64-linux, tested on Firefox build, OK? > > Huh, I think given that trimming happens explicitely the whole > overhead of counting the number of freelist entries plus adding > a prev member to the block list is useless overhead. In trim > just do > > block_list **tailp = &m_blocks; > for (unsigned cnt = freelist_size; cnt != 0 && *tailp; --cnt) > tailp = &(*tailp)->next; > while (*tailp) > free blocks starting from here > > the list walk is O(constant) and the above would just mean adding > a single method rather than cahnges all over the place. Hmm, you are right - I changed the code bit way too many times and lost track :)) Will implement your suggestion. Honza
Hi, here is updated patch. Bootstrapped/regtested x86_64-linux, OK? Honza * ggc-page.c (ggc_collect): Call memory_block_pool::trim. * memory-block.cc (memory_block_pool::clear_free_list): Rename to ... (memory_block_pool::reduce_free_list): ... this one. (memory_block_pool::trim): New static function. * memory-block.h (memory_block_pool::freelist_size): New constant (memory_block_pool::clear_free_list): Rename to ... (memory_block_pool::reduce_free_list): ... this one. (memory_block_pool::trim): Declare. * lto.c (lto_wpa_write_files): Call memory_block_pool::trim. Index: ggc-page.c =================================================================== --- ggc-page.c (revision 278570) +++ ggc-page.c (working copy) @@ -2186,6 +2186,9 @@ ggc_collect (void) float allocated_last_gc = MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024); + /* It is also good time to get memory block pool into limits. */ + memory_block_pool::trim (); + float min_expand = allocated_last_gc * param_ggc_min_expand / 100; if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect) return; Index: lto/lto.c =================================================================== --- lto/lto.c (revision 278570) +++ lto/lto.c (working copy) @@ -387,6 +387,7 @@ lto_wpa_write_files (void) temp_priority.safe_push (part->insns); temp_filenames.safe_push (xstrdup (temp_filename)); } + memory_block_pool::trim (0); for (int set = 0; set < MAX (lto_parallelism, 1); set++) { Index: memory-block.cc =================================================================== --- memory-block.cc (revision 278570) +++ memory-block.cc (working copy) @@ -28,15 +28,30 @@ memory_block_pool memory_block_pool::ins memory_block_pool::memory_block_pool () : m_blocks (NULL) {} -/* Return all blocks from free list to the OS. */ +/* Reduce free list to NUM blocks and return remaining to malloc. */ void -memory_block_pool::clear_free_list () +memory_block_pool::reduce_free_list (int num) { - while (m_blocks) + block_list **blocks = &m_blocks; + + /* First skip NUM blocks. */ + + for (;num > 0 && *blocks; num--) + blocks = &(*blocks)->m_next; + + if (!*blocks) + return; + + /* And free the remainder of them. */ + + block_list *to_free = *blocks; + *blocks = NULL; + + while (to_free) { - block_list *next = m_blocks->m_next; - XDELETEVEC (m_blocks); - m_blocks = next; + block_list *next = to_free->m_next; + XDELETEVEC (to_free); + to_free = next; } } @@ -62,3 +77,10 @@ mempool_obstack_chunk_free (void *chunk) else XDELETEVEC (chunk); } + +/* Return allocated memory back to malloc (and to system). */ +void +memory_block_pool::trim (int num) +{ + instance.reduce_free_list (num); +} Index: memory-block.h =================================================================== --- memory-block.h (revision 278570) +++ memory-block.h (working copy) @@ -28,12 +28,15 @@ class memory_block_pool public: /* Blocks have fixed size. This is necessary for sharing. */ static const size_t block_size = 64 * 1024; + /* Number of blocks we keep in the freelists. */ + static const size_t freelist_size = 1024 * 1024 / block_size; memory_block_pool (); static inline void *allocate () ATTRIBUTE_MALLOC; static inline void release (void *); - void clear_free_list (); + static void trim (int nblocks = freelist_size); + void reduce_free_list (int); private: /* memory_block_pool singleton instance, defined in memory-block.cc. */
On November 21, 2019 9:27:15 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote: >Hi, >here is updated patch. Bootstrapped/regtested x86_64-linux, OK? Ok. Richard. >Honza > > * ggc-page.c (ggc_collect): Call memory_block_pool::trim. > * memory-block.cc (memory_block_pool::clear_free_list): Rename to ... > (memory_block_pool::reduce_free_list): ... this one. > (memory_block_pool::trim): New static function. > * memory-block.h (memory_block_pool::freelist_size): New constant > (memory_block_pool::clear_free_list): Rename to ... > (memory_block_pool::reduce_free_list): ... this one. > (memory_block_pool::trim): Declare. > > * lto.c (lto_wpa_write_files): Call memory_block_pool::trim. >Index: ggc-page.c >=================================================================== >--- ggc-page.c (revision 278570) >+++ ggc-page.c (working copy) >@@ -2186,6 +2186,9 @@ ggc_collect (void) > float allocated_last_gc = > MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024); > >+ /* It is also good time to get memory block pool into limits. */ >+ memory_block_pool::trim (); >+ > float min_expand = allocated_last_gc * param_ggc_min_expand / 100; >if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect) > return; >Index: lto/lto.c >=================================================================== >--- lto/lto.c (revision 278570) >+++ lto/lto.c (working copy) >@@ -387,6 +387,7 @@ lto_wpa_write_files (void) > temp_priority.safe_push (part->insns); > temp_filenames.safe_push (xstrdup (temp_filename)); > } >+ memory_block_pool::trim (0); > > for (int set = 0; set < MAX (lto_parallelism, 1); set++) > { >Index: memory-block.cc >=================================================================== >--- memory-block.cc (revision 278570) >+++ memory-block.cc (working copy) >@@ -28,15 +28,30 @@ memory_block_pool memory_block_pool::ins > > memory_block_pool::memory_block_pool () : m_blocks (NULL) {} > >-/* Return all blocks from free list to the OS. */ >+/* Reduce free list to NUM blocks and return remaining to malloc. */ > void >-memory_block_pool::clear_free_list () >+memory_block_pool::reduce_free_list (int num) > { >- while (m_blocks) >+ block_list **blocks = &m_blocks; >+ >+ /* First skip NUM blocks. */ >+ >+ for (;num > 0 && *blocks; num--) >+ blocks = &(*blocks)->m_next; >+ >+ if (!*blocks) >+ return; >+ >+ /* And free the remainder of them. */ >+ >+ block_list *to_free = *blocks; >+ *blocks = NULL; >+ >+ while (to_free) > { >- block_list *next = m_blocks->m_next; >- XDELETEVEC (m_blocks); >- m_blocks = next; > >+ block_list *next = to_free->m_next; >+ XDELETEVEC (to_free); >+ to_free = next; > } > } > >@@ -62,3 +77,10 @@ mempool_obstack_chunk_free (void *chunk) > else > XDELETEVEC (chunk); > } >+ >+/* Return allocated memory back to malloc (and to system). */ >+void >+memory_block_pool::trim (int num) >+{ >+ instance.reduce_free_list (num); >+} >Index: memory-block.h >=================================================================== >--- memory-block.h (revision 278570) >+++ memory-block.h (working copy) >@@ -28,12 +28,15 @@ class memory_block_pool > public: > /* Blocks have fixed size. This is necessary for sharing. */ > static const size_t block_size = 64 * 1024; >+ /* Number of blocks we keep in the freelists. */ >+ static const size_t freelist_size = 1024 * 1024 / block_size; > > memory_block_pool (); > > static inline void *allocate () ATTRIBUTE_MALLOC; > static inline void release (void *); >- void clear_free_list (); >+ static void trim (int nblocks = freelist_size); >+ void reduce_free_list (int); > > private: >/* memory_block_pool singleton instance, defined in memory-block.cc. >*/
Index: memory-block.h =================================================================== --- memory-block.h (revision 278464) +++ memory-block.h (working copy) @@ -28,12 +28,15 @@ class memory_block_pool public: /* Blocks have fixed size. This is necessary for sharing. */ static const size_t block_size = 64 * 1024; + /* Number of blocks we keep in the freelists. */ + static const size_t freelist_size = 1024 * 1024 / block_size; memory_block_pool (); static inline void *allocate () ATTRIBUTE_MALLOC; static inline void release (void *); - void clear_free_list (); + static void trim (int nblocks = freelist_size); + void reduce_free_list (int); private: /* memory_block_pool singleton instance, defined in memory-block.cc. */ @@ -42,10 +45,13 @@ private: struct block_list { block_list *m_next; + block_list *m_prev; }; /* Free list. */ block_list *m_blocks; + block_list *m_blocks_end; + int m_num_blocks; }; /* Allocate a single block. Reuse a previously returned block, if possible. */ @@ -57,6 +63,9 @@ memory_block_pool::allocate () void *result = instance.m_blocks; instance.m_blocks = instance.m_blocks->m_next; + instance.m_num_blocks--; + if (!instance.m_blocks) + instance.m_blocks_end = NULL; VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (result, block_size)); return result; } @@ -67,7 +76,12 @@ memory_block_pool::release (void *uncast { block_list *block = new (uncast_block) block_list; block->m_next = instance.m_blocks; + if (instance.m_blocks) + instance.m_blocks->m_prev = block; + else + instance.m_blocks_end = block; instance.m_blocks = block; + instance.m_num_blocks++; VALGRIND_DISCARD (VALGRIND_MAKE_MEM_NOACCESS ((char *)uncast_block + sizeof (block_list), Index: memory-block.cc =================================================================== --- memory-block.cc (revision 278464) +++ memory-block.cc (working copy) @@ -26,18 +27,27 @@ along with GCC; see the file COPYING3. /* Global singleton-like instance. */ memory_block_pool memory_block_pool::instance; -memory_block_pool::memory_block_pool () : m_blocks (NULL) {} +/* Default constructor. */ +memory_block_pool::memory_block_pool () + : m_blocks (NULL), m_blocks_end (NULL), m_num_blocks (0) +{ +} -/* Return all blocks from free list to the OS. */ +/* Reduce free list to NUM blocks. */ void -memory_block_pool::clear_free_list () +memory_block_pool::reduce_free_list (int num) { - while (m_blocks) + gcc_checking_assert (num >= 0); + while (m_num_blocks > num) { - block_list *next = m_blocks->m_next; - XDELETEVEC (m_blocks); - m_blocks = next; + block_list *prev = m_blocks_end->m_prev; + XDELETEVEC (m_blocks_end); + m_blocks_end = prev; + prev->m_next = NULL; + m_num_blocks--; } + if (!m_num_blocks) + m_blocks = m_blocks_end = 0; } /* Allocate a chunk for obstack. Use the pool if requested chunk size matches @@ -62,3 +72,10 @@ mempool_obstack_chunk_free (void *chunk) else XDELETEVEC (chunk); } + +/* Return allocated memory back to malloc (and to system). */ +void +memory_block_pool::trim (int num) +{ + instance.reduce_free_list (num); +} Index: ggc-page.c =================================================================== --- ggc-page.c (revision 278464) +++ ggc-page.c (working copy) @@ -2186,6 +2186,9 @@ ggc_collect (void) float allocated_last_gc = MAX (G.allocated_last_gc, (size_t)param_ggc_min_heapsize * 1024); + /* It is a also good time to get memory block pool into limits. */ + memory_block_pool::trim (); + float min_expand = allocated_last_gc * param_ggc_min_expand / 100; if (G.allocated < allocated_last_gc + min_expand && !ggc_force_collect) return; Index: lto/lto.c =================================================================== --- lto/lto.c (revision 278464) +++ lto/lto.c (working copy) @@ -387,6 +387,7 @@ lto_wpa_write_files (void) temp_priority.safe_push (part->insns); temp_filenames.safe_push (xstrdup (temp_filename)); } + memory_block_pool::trim (0); for (int set = 0; set < MAX (lto_parallelism, 1); set++) {