Message ID | 87a5uxtwaq.fsf@oldenburg.str.redhat.com |
---|---|
State | New |
Headers | show |
Series | [v3] malloc: Remove bin scanning from memalign (bug 30723) | expand |
On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote: > On the test workload (mpv --cache=yes with VP9 video decoding), the > bin scanning has a very poor success rate (less than 2%). The tcache > scanning has about 50% success rate, so keep that. > > Update comments in malloc/tst-memalign-2 to indicate the purpose > of the tests. Even with the scanning removed, the additional > merging opportunities since commit 542b1105852568c3ebc712225ae78b > ("malloc: Enable merging of remainders in memalign (bug 30723)") > are sufficient to pass the existing large bins test. Hi, on s390x, I've observed these test-fails starting with this commit: FAIL: malloc/tst-memalign-2 FAIL: malloc/tst-memalign-2-malloc-hugetlb1 With this output: error: tst-memalign-2.c:158: not true: count > LN / 2 error: 1 test failures Note, that I only see the fails if /proc/sys/kernel/randomize_va_space is set to 0. Then I get count=3; LN=8; If randomize_va_space is set to 2, the test passes and I get count=5; LN=8; Is s390x the only architecture where we get this failure? Bye, Stefan
* Stefan Liebler: > On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote: >> On the test workload (mpv --cache=yes with VP9 video decoding), the >> bin scanning has a very poor success rate (less than 2%). The tcache >> scanning has about 50% success rate, so keep that. >> >> Update comments in malloc/tst-memalign-2 to indicate the purpose >> of the tests. Even with the scanning removed, the additional >> merging opportunities since commit 542b1105852568c3ebc712225ae78b >> ("malloc: Enable merging of remainders in memalign (bug 30723)") >> are sufficient to pass the existing large bins test. > Hi, > > on s390x, I've observed these test-fails starting with this commit: > FAIL: malloc/tst-memalign-2 > FAIL: malloc/tst-memalign-2-malloc-hugetlb1 > > With this output: > error: tst-memalign-2.c:158: not true: count > LN / 2 > error: 1 test failures > > Note, that I only see the fails if /proc/sys/kernel/randomize_va_space > is set to 0. Then I get count=3; LN=8; > > If randomize_va_space is set to 2, the test passes and I get count=5; LN=8; > > Is s390x the only architecture where we get this failure? Does your kernel have a fix for the early sbrk failure? I suspect what happens is that we fall over to mmap from sbrk, making the heap non-contiguous. The sbrk failure should be visible under strace. Thanks, Florian
On 19.09.23 19:35, Florian Weimer wrote: > * Stefan Liebler: > >> On 11.08.23 17:48, Florian Weimer via Libc-alpha wrote: >>> On the test workload (mpv --cache=yes with VP9 video decoding), the >>> bin scanning has a very poor success rate (less than 2%). The tcache >>> scanning has about 50% success rate, so keep that. >>> >>> Update comments in malloc/tst-memalign-2 to indicate the purpose >>> of the tests. Even with the scanning removed, the additional >>> merging opportunities since commit 542b1105852568c3ebc712225ae78b >>> ("malloc: Enable merging of remainders in memalign (bug 30723)") >>> are sufficient to pass the existing large bins test. >> Hi, >> >> on s390x, I've observed these test-fails starting with this commit: >> FAIL: malloc/tst-memalign-2 >> FAIL: malloc/tst-memalign-2-malloc-hugetlb1 >> >> With this output: >> error: tst-memalign-2.c:158: not true: count > LN / 2 >> error: 1 test failures >> >> Note, that I only see the fails if /proc/sys/kernel/randomize_va_space >> is set to 0. Then I get count=3; LN=8; >> >> If randomize_va_space is set to 2, the test passes and I get count=5; LN=8; >> >> Is s390x the only architecture where we get this failure? > > Does your kernel have a fix for the early sbrk failure? I suspect what > happens is that we fall over to mmap from sbrk, making the heap > non-contiguous. > > The sbrk failure should be visible under strace. > Yes, strace shows mmap with randomize_va_space=0: brk(NULL) = 0x3fff7fb2000 brk(0x3fff7fd3000) = 0x3fff7fd3000 brk(0x3fff7ff7000) = 0x3fff7fd3000 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff7b00000 write(1, "error: tst-memalign-2.c:158: not"..., 54error: tst-memalign-2.c:158: not true: count > LN / 2 ) = 54 Note that there is this previous mmap-event before set_tid_address-syscall: mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff7ff7000 With with randomize_va_space=2, I see: brk(NULL) = 0x2aa005e8000 brk(0x2aa00609000) = 0x2aa00609000 brk(0x2aa0062d000) = 0x2aa0062d000 brk(0x2aa00654000) = 0x2aa00654000 exit_group(0) = ? I think you mean the kernel fixes as documented in the static PIE confiugre-check: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/s390/s390-64/configure.ac;h=4657de0d3795f40388805d438178c63f9558e936;hb=HEAD#l85 Those commits went into linux 5.19 and I see the fails on kernels with 6.xyz > Thanks, > Florian >
* Stefan Liebler: >> Does your kernel have a fix for the early sbrk failure? I suspect what >> happens is that we fall over to mmap from sbrk, making the heap >> non-contiguous. >> >> The sbrk failure should be visible under strace. >> > Yes, strace shows mmap with randomize_va_space=0: > brk(NULL) = 0x3fff7fb2000 > brk(0x3fff7fd3000) = 0x3fff7fd3000 > brk(0x3fff7ff7000) = 0x3fff7fd3000 > mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x3fff7b00000 Ahh, looks like we can get 4 KiB or maybe 8 KiB of heap from the kernel. That's why we didn't crash before in early startup. It looks like a another ASLR layout issue, different from the problem that broke static PIE. Not sure what we can do about this. Obviously, we'd prefer if the test suite ran with a more production-like malloc configuration. > write(1, "error: tst-memalign-2.c:158: not"..., 54error: > tst-memalign-2.c:158: not true: count > LN / 2 > ) = 54 > > Note that there is this previous mmap-event before set_tid_address-syscall: > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) > = 0x3fff7ff7000 This is the ld.so minimal malloc, it doesn't use sbrk. Thanks, Florian
On 20.09.23 17:31, Florian Weimer wrote: > * Stefan Liebler: > >>> Does your kernel have a fix for the early sbrk failure? I suspect what >>> happens is that we fall over to mmap from sbrk, making the heap >>> non-contiguous. >>> >>> The sbrk failure should be visible under strace. >>> >> Yes, strace shows mmap with randomize_va_space=0: >> brk(NULL) = 0x3fff7fb2000 >> brk(0x3fff7fd3000) = 0x3fff7fd3000 >> brk(0x3fff7ff7000) = 0x3fff7fd3000 >> mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, >> 0) = 0x3fff7b00000 > > Ahh, looks like we can get 4 KiB or maybe 8 KiB of heap from the kernel. > That's why we didn't crash before in early startup. It looks like a > another ASLR layout issue, different from the problem that broke static > PIE. > > Not sure what we can do about this. Obviously, we'd prefer if the test > suite ran with a more production-like malloc configuration. > >> write(1, "error: tst-memalign-2.c:158: not"..., 54error: >> tst-memalign-2.c:158: not true: count > LN / 2 >> ) = 54 >> >> Note that there is this previous mmap-event before set_tid_address-syscall: >> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) >> = 0x3fff7ff7000 > > This is the ld.so minimal malloc, it doesn't use sbrk. > > Thanks, > Florian > Just as summary: Without randomization (/proc/sys/kernel/randomize_va_space == 0): 3fff7c00000-3fff7e1d000 libc.so 3fff7f00000-3fff7f08000 tst-memalign-2 3fff7f80000-3fff7fb4000 ld.so 3fff7fb4000-3fff7fd5000 [heap] 3fff7ff7000-3fff7ffb000 ld.so-minimal-malloc 3fffffda000-3ffffffb000 [stack] => brk tries to extend heap and fails due to "ld.so-minimal-malloc". Then mmap is used which leads to the test-fail. If I remember correctly, the kernel-guys mentioned - while static-PIE issue - that the heap is "always" located after the main-program. The glibc test-cases are run by executing ld.so directly. But if randomization (/proc/sys/kernel/randomize_va_space == 2) is turned on: 2aa019d8000-2aa01a44000 [heap] 3ff80f00000-3ff8111d000 libc.so 3ff81280000-3ff81288000 tst-memalign-2 3ff81301000-3ff81334000 ld.so 3ff81377000-3ff8137b000 ld.so-minimal-malloc 3ffd6ada000-3ffd6afb000 [stack] As comparison "cat /proc/self/maps" with randomization: 2aa22780000-2aa2278b000 /usr/bin/cat 2aa22f4e000-2aa22f6f000 [heap] 3ff8b000000-3ff8b1db000 libc.so 3ff8b380000-3ff8b3af000 ld.so 3ffdc6da000-3ffdc6fb000 [stack] and without: 2aa00000000-2aa0000b000 /usr/bin/cat 2aa0000b000-2aa0002c000 [heap] 3fff7c00000-3fff7ddb000 libc.so 3fff7f80000-3fff7faf000 ld.so 3fffffda000-3ffffffb000 [stack] 3ffffffc000-3ffffffe000 [vvar] 3ffffffe000-40000000000 [vdso] I assume this issue is s390-specific. I will talk to our kernel-guys. I think the behavior when ld.so is called directly without randomization should be the same as if activated and the heap should be located somewhere at 0x2aa00000000
diff --git a/malloc/malloc.c b/malloc/malloc.c index 948f9759af..9c2cab7a59 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -5082,7 +5082,6 @@ _int_memalign (mstate av, size_t alignment, size_t bytes) mchunkptr remainder; /* spare room at end to split off */ unsigned long remainder_size; /* its size */ INTERNAL_SIZE_T size; - mchunkptr victim; nb = checked_request2size (bytes); if (nb == 0) @@ -5101,129 +5100,13 @@ _int_memalign (mstate av, size_t alignment, size_t bytes) we don't find anything in those bins, the common malloc code will scan starting at 2x. */ - /* This will be set if we found a candidate chunk. */ - victim = NULL; + /* Call malloc with worst case padding to hit alignment. */ + m = (char *) (_int_malloc (av, nb + alignment + MINSIZE)); - /* Fast bins are singly-linked, hard to remove a chunk from the middle - and unlikely to meet our alignment requirements. We have not done - any experimentation with searching for aligned fastbins. */ + if (m == 0) + return 0; /* propagate failure */ - if (av != NULL) - { - int first_bin_index; - int first_largebin_index; - int last_bin_index; - - if (in_smallbin_range (nb)) - first_bin_index = smallbin_index (nb); - else - first_bin_index = largebin_index (nb); - - if (in_smallbin_range (nb * 2)) - last_bin_index = smallbin_index (nb * 2); - else - last_bin_index = largebin_index (nb * 2); - - first_largebin_index = largebin_index (MIN_LARGE_SIZE); - - int victim_index; /* its bin index */ - - for (victim_index = first_bin_index; - victim_index < last_bin_index; - victim_index ++) - { - victim = NULL; - - if (victim_index < first_largebin_index) - { - /* Check small bins. Small bin chunks are doubly-linked despite - being the same size. */ - - mchunkptr fwd; /* misc temp for linking */ - mchunkptr bck; /* misc temp for linking */ - - bck = bin_at (av, victim_index); - fwd = bck->fd; - while (fwd != bck) - { - if (chunk_ok_for_memalign (fwd, alignment, nb) > 0) - { - victim = fwd; - - /* Unlink it */ - victim->fd->bk = victim->bk; - victim->bk->fd = victim->fd; - break; - } - - fwd = fwd->fd; - } - } - else - { - /* Check large bins. */ - mchunkptr fwd; /* misc temp for linking */ - mchunkptr bck; /* misc temp for linking */ - mchunkptr best = NULL; - size_t best_size = 0; - - bck = bin_at (av, victim_index); - fwd = bck->fd; - - while (fwd != bck) - { - int extra; - - if (chunksize (fwd) < nb) - break; - extra = chunk_ok_for_memalign (fwd, alignment, nb); - if (extra > 0 - && (extra <= best_size || best == NULL)) - { - best = fwd; - best_size = extra; - } - - fwd = fwd->fd; - } - victim = best; - - if (victim != NULL) - { - unlink_chunk (av, victim); - break; - } - } - - if (victim != NULL) - break; - } - } - - /* Strategy: find a spot within that chunk that meets the alignment - request, and then possibly free the leading and trailing space. - This strategy is incredibly costly and can lead to external - fragmentation if header and footer chunks are unused. */ - - if (victim != NULL) - { - p = victim; - m = chunk2mem (p); - set_inuse (p); - if (av != &main_arena) - set_non_main_arena (p); - } - else - { - /* Call malloc with worst case padding to hit alignment. */ - - m = (char *) (_int_malloc (av, nb + alignment + MINSIZE)); - - if (m == 0) - return 0; /* propagate failure */ - - p = mem2chunk (m); - } + p = mem2chunk (m); if ((((unsigned long) (m)) % alignment) != 0) /* misaligned */ { diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c index f229283dbf..ecd6fa249e 100644 --- a/malloc/tst-memalign-2.c +++ b/malloc/tst-memalign-2.c @@ -86,7 +86,8 @@ do_test (void) TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2); } - /* Test for non-head tcache hits. */ + /* Test for non-head tcache hits. This exercises the memalign + scanning code to find matching allocations. */ for (i = 0; i < array_length (ptr); ++ i) { if (i == 4) @@ -113,7 +114,9 @@ do_test (void) free (p); TEST_VERIFY (count > 0); - /* Large bins test. */ + /* Large bins test. This verifies that the over-allocated parts + that memalign releases for future allocations can be reused by + memalign itself at least in some cases. */ for (i = 0; i < LN; ++ i) {