diff mbox series

[v3] malloc: Remove bin scanning from memalign (bug 30723)

Message ID 87a5uxtwaq.fsf@oldenburg.str.redhat.com
State New
Headers show
Series [v3] malloc: Remove bin scanning from memalign (bug 30723) | expand

Commit Message

Florian Weimer Aug. 11, 2023, 3:48 p.m. UTC
On the test workload (mpv --cache=yes with VP9 video decoding), the
bin scanning has a very poor success rate (less than 2%).  The tcache
scanning has about 50% success rate, so keep that.

Update comments in malloc/tst-memalign-2 to indicate the purpose
of the tests.  Even with the scanning removed, the additional
merging opportunities since commit 542b1105852568c3ebc712225ae78b
("malloc: Enable merging of remainders in memalign (bug 30723)")
are sufficient to pass the existing large bins test.

---
v3: Keep test.
 malloc/malloc.c         | 127 ++----------------------------------------------
 malloc/tst-memalign-2.c |   7 ++-
 2 files changed, 10 insertions(+), 124 deletions(-)


base-commit: 542b1105852568c3ebc712225ae78b8c8ba31a78

Comments

Stefan Liebler Sept. 19, 2023, 2:32 p.m. UTC | #1
On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote:
> On the test workload (mpv --cache=yes with VP9 video decoding), the
> bin scanning has a very poor success rate (less than 2%).  The tcache
> scanning has about 50% success rate, so keep that.
> 
> Update comments in malloc/tst-memalign-2 to indicate the purpose
> of the tests.  Even with the scanning removed, the additional
> merging opportunities since commit 542b1105852568c3ebc712225ae78b
> ("malloc: Enable merging of remainders in memalign (bug 30723)")
> are sufficient to pass the existing large bins test.
Hi,

on s390x, I've observed these test-fails starting with this commit:
FAIL: malloc/tst-memalign-2
FAIL: malloc/tst-memalign-2-malloc-hugetlb1

With this output:
error: tst-memalign-2.c:158: not true: count > LN / 2
error: 1 test failures

Note, that I only see the fails if /proc/sys/kernel/randomize_va_space
is set to 0. Then I get count=3; LN=8;

If randomize_va_space is set to 2, the test passes and I get count=5; LN=8;

Is s390x the only architecture where we get this failure?

Bye,
Stefan
Florian Weimer Sept. 19, 2023, 5:35 p.m. UTC | #2
* Stefan Liebler:

> On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote:
>> On the test workload (mpv --cache=yes with VP9 video decoding), the
>> bin scanning has a very poor success rate (less than 2%).  The tcache
>> scanning has about 50% success rate, so keep that.
>> 
>> Update comments in malloc/tst-memalign-2 to indicate the purpose
>> of the tests.  Even with the scanning removed, the additional
>> merging opportunities since commit 542b1105852568c3ebc712225ae78b
>> ("malloc: Enable merging of remainders in memalign (bug 30723)")
>> are sufficient to pass the existing large bins test.
> Hi,
>
> on s390x, I've observed these test-fails starting with this commit:
> FAIL: malloc/tst-memalign-2
> FAIL: malloc/tst-memalign-2-malloc-hugetlb1
>
> With this output:
> error: tst-memalign-2.c:158: not true: count > LN / 2
> error: 1 test failures
>
> Note, that I only see the fails if /proc/sys/kernel/randomize_va_space
> is set to 0. Then I get count=3; LN=8;
>
> If randomize_va_space is set to 2, the test passes and I get count=5; LN=8;
>
> Is s390x the only architecture where we get this failure?

Does your kernel have a fix for the early sbrk failure?  I suspect what
happens is that we fall over to mmap from sbrk, making the heap
non-contiguous.

The sbrk failure should be visible under strace.

Thanks,
Florian
Stefan Liebler Sept. 20, 2023, 2:35 p.m. UTC | #3
On 19.09.23 19:35, Florian Weimer wrote:
> * Stefan Liebler:
> 
>> On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote:
>>> On the test workload (mpv --cache=yes with VP9 video decoding), the
>>> bin scanning has a very poor success rate (less than 2%).  The tcache
>>> scanning has about 50% success rate, so keep that.
>>>
>>> Update comments in malloc/tst-memalign-2 to indicate the purpose
>>> of the tests.  Even with the scanning removed, the additional
>>> merging opportunities since commit 542b1105852568c3ebc712225ae78b
>>> ("malloc: Enable merging of remainders in memalign (bug 30723)")
>>> are sufficient to pass the existing large bins test.
>> Hi,
>>
>> on s390x, I've observed these test-fails starting with this commit:
>> FAIL: malloc/tst-memalign-2
>> FAIL: malloc/tst-memalign-2-malloc-hugetlb1
>>
>> With this output:
>> error: tst-memalign-2.c:158: not true: count > LN / 2
>> error: 1 test failures
>>
>> Note, that I only see the fails if /proc/sys/kernel/randomize_va_space
>> is set to 0. Then I get count=3; LN=8;
>>
>> If randomize_va_space is set to 2, the test passes and I get count=5; LN=8;
>>
>> Is s390x the only architecture where we get this failure?
> 
> Does your kernel have a fix for the early sbrk failure?  I suspect what
> happens is that we fall over to mmap from sbrk, making the heap
> non-contiguous.
> 
> The sbrk failure should be visible under strace.
> 
Yes, strace shows mmap with randomize_va_space=0:
brk(NULL)                               = 0x3fff7fb2000
brk(0x3fff7fd3000)                      = 0x3fff7fd3000
brk(0x3fff7ff7000)                      = 0x3fff7fd3000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x3fff7b00000
write(1, "error: tst-memalign-2.c:158: not"..., 54error:
tst-memalign-2.c:158: not true: count > LN / 2
) = 54

Note that there is this previous mmap-event before set_tid_address-syscall:
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x3fff7ff7000

With with randomize_va_space=2, I see:
brk(NULL)                               = 0x2aa005e8000
brk(0x2aa00609000)                      = 0x2aa00609000
brk(0x2aa0062d000)                      = 0x2aa0062d000
brk(0x2aa00654000)                      = 0x2aa00654000
exit_group(0)                           = ?


I think you mean the kernel fixes as documented in the static PIE
confiugre-check:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/s390/s390-64/configure.ac;h=4657de0d3795f40388805d438178c63f9558e936;hb=HEAD#l85
Those commits went into linux 5.19 and I see the fails on kernels with 6.xyz



> Thanks,
> Florian
>
Florian Weimer Sept. 20, 2023, 3:31 p.m. UTC | #4
* Stefan Liebler:

>> Does your kernel have a fix for the early sbrk failure?  I suspect what
>> happens is that we fall over to mmap from sbrk, making the heap
>> non-contiguous.
>> 
>> The sbrk failure should be visible under strace.
>> 
> Yes, strace shows mmap with randomize_va_space=0:
> brk(NULL)                               = 0x3fff7fb2000
> brk(0x3fff7fd3000)                      = 0x3fff7fd3000
> brk(0x3fff7ff7000)                      = 0x3fff7fd3000
> mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x3fff7b00000

Ahh, looks like we can get 4 KiB or maybe 8 KiB of heap from the kernel.
That's why we didn't crash before in early startup.  It looks like a
another ASLR layout issue, different from the problem that broke static
PIE.

Not sure what we can do about this.  Obviously, we'd prefer if the test
suite ran with a more production-like malloc configuration.

> write(1, "error: tst-memalign-2.c:158: not"..., 54error:
> tst-memalign-2.c:158: not true: count > LN / 2
> ) = 54
>
> Note that there is this previous mmap-event before set_tid_address-syscall:
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x3fff7ff7000

This is the ld.so minimal malloc, it doesn't use sbrk.

Thanks,
Florian
Stefan Liebler Sept. 21, 2023, 1:05 p.m. UTC | #5
On 20.09.23 17:31, Florian Weimer wrote:
> * Stefan Liebler:
> 
>>> Does your kernel have a fix for the early sbrk failure?  I suspect what
>>> happens is that we fall over to mmap from sbrk, making the heap
>>> non-contiguous.
>>>
>>> The sbrk failure should be visible under strace.
>>>
>> Yes, strace shows mmap with randomize_va_space=0:
>> brk(NULL)                               = 0x3fff7fb2000
>> brk(0x3fff7fd3000)                      = 0x3fff7fd3000
>> brk(0x3fff7ff7000)                      = 0x3fff7fd3000
>> mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>> 0) = 0x3fff7b00000
> 
> Ahh, looks like we can get 4 KiB or maybe 8 KiB of heap from the kernel.
> That's why we didn't crash before in early startup.  It looks like a
> another ASLR layout issue, different from the problem that broke static
> PIE.
> 
> Not sure what we can do about this.  Obviously, we'd prefer if the test
> suite ran with a more production-like malloc configuration.
> 
>> write(1, "error: tst-memalign-2.c:158: not"..., 54error:
>> tst-memalign-2.c:158: not true: count > LN / 2
>> ) = 54
>>
>> Note that there is this previous mmap-event before set_tid_address-syscall:
>> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
>> = 0x3fff7ff7000
> 
> This is the ld.so minimal malloc, it doesn't use sbrk.
> 
> Thanks,
> Florian
> 
Just as summary:

Without randomization (/proc/sys/kernel/randomize_va_space == 0):
3fff7c00000-3fff7e1d000 libc.so
3fff7f00000-3fff7f08000 tst-memalign-2
3fff7f80000-3fff7fb4000 ld.so
3fff7fb4000-3fff7fd5000 [heap]
3fff7ff7000-3fff7ffb000 ld.so-minimal-malloc
3fffffda000-3ffffffb000 [stack]
=> brk tries to extend heap and fails due to "ld.so-minimal-malloc".
Then mmap is used which leads to the test-fail.

If I remember correctly, the kernel-guys mentioned - while static-PIE
issue - that the heap is "always" located after the main-program. The
glibc test-cases are run by executing ld.so directly.

But if randomization (/proc/sys/kernel/randomize_va_space == 2) is
turned on:
2aa019d8000-2aa01a44000 [heap]
3ff80f00000-3ff8111d000 libc.so
3ff81280000-3ff81288000 tst-memalign-2
3ff81301000-3ff81334000 ld.so
3ff81377000-3ff8137b000 ld.so-minimal-malloc
3ffd6ada000-3ffd6afb000 [stack]

As comparison "cat /proc/self/maps" with randomization:
2aa22780000-2aa2278b000 /usr/bin/cat
2aa22f4e000-2aa22f6f000 [heap]
3ff8b000000-3ff8b1db000 libc.so
3ff8b380000-3ff8b3af000 ld.so
3ffdc6da000-3ffdc6fb000 [stack]
and without:
2aa00000000-2aa0000b000 /usr/bin/cat
2aa0000b000-2aa0002c000 [heap]
3fff7c00000-3fff7ddb000 libc.so
3fff7f80000-3fff7faf000 ld.so
3fffffda000-3ffffffb000 [stack]
3ffffffc000-3ffffffe000 [vvar]
3ffffffe000-40000000000 [vdso]

I assume this issue is s390-specific. I will talk to our kernel-guys. I
think the behavior when ld.so is called directly without randomization
should be the same as if activated and the heap should be located
somewhere at 0x2aa00000000
diff mbox series

Patch

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 948f9759af..9c2cab7a59 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -5082,7 +5082,6 @@  _int_memalign (mstate av, size_t alignment, size_t bytes)
   mchunkptr remainder;            /* spare room at end to split off */
   unsigned long remainder_size;   /* its size */
   INTERNAL_SIZE_T size;
-  mchunkptr victim;
 
   nb = checked_request2size (bytes);
   if (nb == 0)
@@ -5101,129 +5100,13 @@  _int_memalign (mstate av, size_t alignment, size_t bytes)
      we don't find anything in those bins, the common malloc code will
      scan starting at 2x.  */
 
-  /* This will be set if we found a candidate chunk.  */
-  victim = NULL;
+  /* Call malloc with worst case padding to hit alignment. */
+  m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
 
-  /* Fast bins are singly-linked, hard to remove a chunk from the middle
-     and unlikely to meet our alignment requirements.  We have not done
-     any experimentation with searching for aligned fastbins.  */
+  if (m == 0)
+    return 0;           /* propagate failure */
 
-  if (av != NULL)
-    {
-      int first_bin_index;
-      int first_largebin_index;
-      int last_bin_index;
-
-      if (in_smallbin_range (nb))
-	first_bin_index = smallbin_index (nb);
-      else
-	first_bin_index = largebin_index (nb);
-
-      if (in_smallbin_range (nb * 2))
-	last_bin_index = smallbin_index (nb * 2);
-      else
-	last_bin_index = largebin_index (nb * 2);
-
-      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
-
-      int victim_index;                 /* its bin index */
-
-      for (victim_index = first_bin_index;
-	   victim_index < last_bin_index;
-	   victim_index ++)
-	{
-	  victim = NULL;
-
-	  if (victim_index < first_largebin_index)
-	    {
-	      /* Check small bins.  Small bin chunks are doubly-linked despite
-		 being the same size.  */
-
-	      mchunkptr fwd;                    /* misc temp for linking */
-	      mchunkptr bck;                    /* misc temp for linking */
-
-	      bck = bin_at (av, victim_index);
-	      fwd = bck->fd;
-	      while (fwd != bck)
-		{
-		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)
-		    {
-		      victim = fwd;
-
-		      /* Unlink it */
-		      victim->fd->bk = victim->bk;
-		      victim->bk->fd = victim->fd;
-		      break;
-		    }
-
-		  fwd = fwd->fd;
-		}
-	    }
-	  else
-	    {
-	      /* Check large bins.  */
-	      mchunkptr fwd;                    /* misc temp for linking */
-	      mchunkptr bck;                    /* misc temp for linking */
-	      mchunkptr best = NULL;
-	      size_t best_size = 0;
-
-	      bck = bin_at (av, victim_index);
-	      fwd = bck->fd;
-
-	      while (fwd != bck)
-		{
-		  int extra;
-
-		  if (chunksize (fwd) < nb)
-		    break;
-		  extra = chunk_ok_for_memalign (fwd, alignment, nb);
-		  if (extra > 0
-		      && (extra <= best_size || best == NULL))
-		    {
-		      best = fwd;
-		      best_size = extra;
-		    }
-
-		  fwd = fwd->fd;
-		}
-	      victim = best;
-
-	      if (victim != NULL)
-		{
-		  unlink_chunk (av, victim);
-		  break;
-		}
-	    }
-
-	  if (victim != NULL)
-	    break;
-	}
-    }
-
-  /* Strategy: find a spot within that chunk that meets the alignment
-     request, and then possibly free the leading and trailing space.
-     This strategy is incredibly costly and can lead to external
-     fragmentation if header and footer chunks are unused.  */
-
-  if (victim != NULL)
-    {
-      p = victim;
-      m = chunk2mem (p);
-      set_inuse (p);
-      if (av != &main_arena)
-	set_non_main_arena (p);
-    }
-  else
-    {
-      /* Call malloc with worst case padding to hit alignment. */
-
-      m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
-
-      if (m == 0)
-	return 0;           /* propagate failure */
-
-      p = mem2chunk (m);
-    }
+  p = mem2chunk (m);
 
   if ((((unsigned long) (m)) % alignment) != 0)   /* misaligned */
     {
diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
index f229283dbf..ecd6fa249e 100644
--- a/malloc/tst-memalign-2.c
+++ b/malloc/tst-memalign-2.c
@@ -86,7 +86,8 @@  do_test (void)
       TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
     }
 
-  /* Test for non-head tcache hits.  */
+  /* Test for non-head tcache hits.  This exercises the memalign
+     scanning code to find matching allocations.  */
   for (i = 0; i < array_length (ptr); ++ i)
     {
       if (i == 4)
@@ -113,7 +114,9 @@  do_test (void)
   free (p);
   TEST_VERIFY (count > 0);
 
-  /* Large bins test.  */
+  /* Large bins test.  This verifies that the over-allocated parts
+     that memalign releases for future allocations can be reused by
+     memalign itself at least in some cases.  */
 
   for (i = 0; i < LN; ++ i)
     {