Patchwork [2/2] Free large chunks in ggc

login
register
mail settings
Submitter Andi Kleen
Date Oct. 19, 2011, 6:40 a.m.
Message ID <1319006408-6012-2-git-send-email-andi@firstfloor.org>
Download mbox | patch
Permalink /patch/120562/
State New
Headers show

Comments

Andi Kleen - Oct. 19, 2011, 6:40 a.m.
From: Andi Kleen <ak@linux.intel.com>

This implements the freeing back of large chunks in the ggc madvise path
Richard Guenther asked for.  This way on systems with limited
address space malloc() and other allocators still have
a chance to get back at some of the memory ggc freed. The
fragmented pages are still just given back, but the address space
stays allocated.

I tried freeing only aligned 2MB areas to optimize for 2MB huge
pages, but the hit rate was quite low, so I switched to 1MB+
unaligned areas. The target size is a param now.

Passed bootstrap and testing on x86_64-linux

gcc/:
2011-10-18  Andi Kleen  <ak@linux.intel.com>

	* ggc-page (release_pages): First free large continuous
	chunks in the madvise path.
	* params.def (GGC_FREE_UNIT): Add.
	* doc/invoke.texi (ggc-free-unit): Add.
---
 gcc/doc/invoke.texi |    5 +++++
 gcc/ggc-page.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/params.def      |    5 +++++
 3 files changed, 58 insertions(+), 0 deletions(-)
Jakub Jelinek - Oct. 19, 2011, 8:34 a.m.
On Wed, Oct 19, 2011 at 08:40:08AM +0200, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> This implements the freeing back of large chunks in the ggc madvise path
> Richard Guenther asked for.  This way on systems with limited
> address space malloc() and other allocators still have
> a chance to get back at some of the memory ggc freed. The
> fragmented pages are still just given back, but the address space
> stays allocated.
> 
> I tried freeing only aligned 2MB areas to optimize for 2MB huge
> pages, but the hit rate was quite low, so I switched to 1MB+
> unaligned areas. The target size is a param now.

If the size to free is smaller than the quirk size, then it has the very
undesirable effect that with using GC only you might run unnecessarily out
of virtual address space, because it allocates pages in 2MB chunks, but
if they are released in 1MB chunks, those released chunks will never be
usable again for GC.  Consider on 32-bit address space allocating 3GB
of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
wanting to allocate through GC the 1.5GB back.

IMHO we should munmap immediately in release_pages the > G.pagesize pages,
those are not very likely to be reused anyway (and it had one in between
ggc_collect cycle to be reused anyway), and for the == G.pagesize
(the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets)
we should note which page was the first one in the GGC_QUIRK_SIZE chunk
and munmap exactly those 2MB starting at the first page only.

	Jakub
Andi Kleen - Oct. 19, 2011, 2:37 p.m.
> If the size to free is smaller than the quirk size, then it has the very
> undesirable effect that with using GC only you might run unnecessarily out
> of virtual address space, because it allocates pages in 2MB chunks, but
> if they are released in 1MB chunks, those released chunks will never be

1MB is just the minimum, it frees it in whatever it can find
(but only for a single GC cycle usually). So when enough continuous
memory is free it will be reused by GC.

I guess it would be possible to add a fallback to allocate a smaller
chunk if the large chunk fails, but unless someone actually comes
up with a test case I have doubts it is really needed.

> usable again for GC.  Consider on 32-bit address space allocating 3GB
> of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
> wanting to allocate through GC the 1.5GB back.
> 
> IMHO we should munmap immediately in release_pages the > G.pagesize pages,

Then you get the fragmentation problem back in full force.

> those are not very likely to be reused anyway (and it had one in between
> ggc_collect cycle to be reused anyway), and for the == G.pagesize
> (the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets)
> we should note which page was the first one in the GGC_QUIRK_SIZE chunk
> and munmap exactly those 2MB starting at the first page only.

I tried this first with aligned 2MB chunks, but it doesn't trigger ever in a 
normal (non LTO) bootstrap.

-Andi
Jakub Jelinek - Oct. 19, 2011, 2:48 p.m.
On Wed, Oct 19, 2011 at 04:37:45PM +0200, Andi Kleen wrote:
> > If the size to free is smaller than the quirk size, then it has the very
> > undesirable effect that with using GC only you might run unnecessarily out
> > of virtual address space, because it allocates pages in 2MB chunks, but
> > if they are released in 1MB chunks, those released chunks will never be
> 
> 1MB is just the minimum, it frees it in whatever it can find
> (but only for a single GC cycle usually). So when enough continuous
> memory is free it will be reused by GC.
> 
> I guess it would be possible to add a fallback to allocate a smaller
> chunk if the large chunk fails, but unless someone actually comes
> up with a test case I have doubts it is really needed.
> 
> > usable again for GC.  Consider on 32-bit address space allocating 3GB
> > of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
> > wanting to allocate through GC the 1.5GB back.
> > 
> > IMHO we should munmap immediately in release_pages the > G.pagesize pages,
> 
> Then you get the fragmentation problem back in full force.

Why?  For one, such allocations are very rare (you only get them when
a single GC allocation requests > page of memory, like perhaps a string
literal over 4KB or similar or function call with over 1000 arguments etc.).
And if they are unlikely to be reused, not munmapping them means wasting
more virtual address space than needed.

	Jakub
Jan Hubicka - Oct. 19, 2011, 3:55 p.m.
> Why?  For one, such allocations are very rare (you only get them when
> a single GC allocation requests > page of memory, like perhaps a string
> literal over 4KB or similar or function call with over 1000 arguments etc.).
> And if they are unlikely to be reused, not munmapping them means wasting
> more virtual address space than needed.

We produce quite bit of large hashtables in GGC and those will happily be
bigger than 4KB.

Honza

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4f55dbc..e622552 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8858,6 +8858,11 @@  very large effectively disables garbage collection.  Setting this
 parameter and @option{ggc-min-expand} to zero causes a full collection
 to occur at every opportunity.
 
+@item  ggc-free-unit
+
+Continuous areas in OS pages to free back to OS immediately. Default is 256
+pages, which is 1MB with 4K pages. 
+
 @item max-reload-search-insns
 The maximum number of instruction reload should look backward for equivalent
 register.  Increasing values mean more aggressive optimization, making the
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index ba88e3f..eb0eeef 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -972,6 +972,54 @@  release_pages (void)
   page_entry *p, *start_p;
   char *start;
   size_t len;
+  size_t mapped_len;
+  page_entry *next, *prev, *newprev;
+  size_t free_unit = PARAM_VALUE (GGC_FREE_UNIT) * G.pagesize;
+
+  /* First free larger continuous areas to the OS.
+     This allows other allocators to grab these areas if needed.
+     This is only done on larger chunks to avoid fragmentation. 
+     This does not always work because the free_pages list is only
+     sorted over a single GC cycle. */
+
+  p = G.free_pages;
+  prev = NULL;
+  while (p)
+    {
+      start = p->page;
+      start_p = p;
+      len = 0;
+      mapped_len = 0;
+      newprev = prev;
+      while (p && p->page == start + len)
+        {
+          len += p->bytes;
+	  if (!p->discarded)
+	      mapped_len += p->bytes;
+	  newprev = p;
+          p = p->next;
+        }
+      if (len >= free_unit)
+        {
+          while (start_p != p)
+            {
+              next = start_p->next;
+              free (start_p);
+              start_p = next;
+            }
+          munmap (start, len);
+	  if (prev)
+	    prev->next = p;
+          else
+            G.free_pages = p;
+          G.bytes_mapped -= mapped_len;
+	  continue;
+        }
+      prev = newprev;
+   }
+
+  /* Now give back the fragmented pages to the OS, but keep the address 
+     space to reuse it next time. */
 
   for (p = G.free_pages; p; )
     {
diff --git a/gcc/params.def b/gcc/params.def
index 5e49c48..edbf0de 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -561,6 +561,11 @@  DEFPARAM(GGC_MIN_HEAPSIZE,
 #undef GGC_MIN_EXPAND_DEFAULT
 #undef GGC_MIN_HEAPSIZE_DEFAULT
 
+DEFPARAM(GGC_FREE_UNIT,
+	 "ggc-free-unit",
+	 "Continuous areas in OS pages to free back immediately",
+	 256, 0, 0)
+
 DEFPARAM(PARAM_MAX_RELOAD_SEARCH_INSNS,
 	 "max-reload-search-insns",
 	 "The maximum number of instructions to search backward when looking for equivalent reload",