diff mbox series

status of dj/malloc branch?

Message ID 20240401191925.M515362@dcvr
State New
Headers show
Series status of dj/malloc branch? | expand

Commit Message

Eric Wong April 1, 2024, 7:19 p.m. UTC
I'm interested in the tracing features described at
https://sourceware.org/glibc/wiki/MallocTracing
to test and validate memory fragmentation avoidance in a long-lived
single-threaded Perl C10K HTTP/IMAP/NNTP/POP3 daemon.

It appears stalled for years, however, and the current glibc
malloc doesn't have the trace + replay features.

I'm currently dogfooding the below patch on an old glibc (Debian
oldstable :x) on my "production" home server.  My theory is the
jemalloc idea of having fewer possible sizes is good for
avoiding fragmentation in long-lived processes.

This is because sizes for string processing are highly
variable and lifetimes are mixed for event-driven C10K servers
where some clients live only for a single request and others for
many.  Clients end up sharing allocations due to caching
and deduplication, so a short-lived client can end up allocating
something that lives a long-time.  Perl does lazy loading
and internal caching+memoization all over the place, too.

The downside is 0-20% waste in initial fits, but I expect it to
get better fits over time...

Not a serious patch against Debian glibc 2.31-13+deb11u8:

Comments

Eric Wong April 6, 2024, 10:02 p.m. UTC | #1
DJ Delorie <dj@redhat.com> wrote:
> 
> If you're wondering about the branch itself, the harsh answer is that
> I'm not maintaining it.
> 
> If you're wondering about the goals therein, IIRC we discovered we had
> no good way to visualize and analyze the heap itself in order to
> understand what causes the problem we're trying to solve.  While these
> are solvable problems, they're big projects and never quite made it to
> the top of our priority lists.

Thanks for the response.

I started doing my own tracing[1] in mwrap-perl[2] and it's
crazy expensive (I/O and storage) to trace all the allocations
done by a busy Perl process.  I had to add compression (using
zstd) to slow down the use of disk space; hope I can get useful
reproducible data before I run out of space.

[1] https://80x24.org/mwrap-perl/20240406214954.159627-1-e@80x24.org/
[2] https://80x24.org/mwrap-perl.git
diff mbox series

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index f7cd29bc..6e0b066d 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -3018,6 +3018,31 @@  tcache_thread_shutdown (void)
 
 #endif /* !USE_TCACHE  */
 
+static inline size_t
+size_class_pad (size_t bytes)
+{
+  if (bytes <= MAX_FAST_SIZE || bytes >= DEFAULT_MMAP_THRESHOLD_MAX)
+    return bytes;
+  /*
+   * Use jemalloc-inspired size classes for mid-size allocations to
+   * minimize fragmentation.  This means we pay a 0-20% overhead on
+   * the initial allocations to improve the likelyhood of reuse.
+   */
+  size_t max = sizeof(void *) << 4;
+  size_t nxt;
+
+  do {
+    if (bytes <= max) {
+      size_t sc_bytes = ALIGN_UP (bytes, max >> 3);
+
+      return sc_bytes <= DEFAULT_MMAP_THRESHOLD_MAX ? sc_bytes : bytes;
+    }
+    nxt = max << 1;
+  } while (nxt > max && nxt < DEFAULT_MMAP_THRESHOLD_MAX && (max = nxt));
+
+  return bytes;
+}
+
 void *
 __libc_malloc (size_t bytes)
 {
@@ -3031,6 +3056,7 @@  __libc_malloc (size_t bytes)
     = atomic_forced_read (__malloc_hook);
   if (__builtin_expect (hook != NULL, 0))
     return (*hook)(bytes, RETURN_ADDRESS (0));
+  bytes = size_class_pad (bytes);
 #if USE_TCACHE
   /* int_free also calls request2size, be careful to not pad twice.  */
   size_t tbytes;
@@ -3150,6 +3176,8 @@  __libc_realloc (void *oldmem, size_t bytes)
   if (oldmem == 0)
     return __libc_malloc (bytes);
 
+  bytes = size_class_pad (bytes);
+
   /* chunk corresponding to oldmem */
   const mchunkptr oldp = mem2chunk (oldmem);
   /* its size */
@@ -3391,6 +3419,7 @@  __libc_calloc (size_t n, size_t elem_size)
       return memset (mem, 0, sz);
     }
 
+  sz = size_class_pad (sz);
   MAYBE_INIT_TCACHE ();
 
   if (SINGLE_THREAD_P)