[v2] powerpc/mm: Fix growth direction for hugepages mmaps with slice

Message ID 20180109101810.2471D6C6CF@localhost.localdomain
State Superseded
Headers show
Series
  • [v2] powerpc/mm: Fix growth direction for hugepages mmaps with slice
Related show

Commit Message

Christophe LEROY Jan. 9, 2018, 10:18 a.m.
An application running with libhugetlbfs fails to allocate
additional pages to HEAP due to the hugemap being done
inconditionally as topdown mapping:

mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
[...]
mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
munmap(0x73d80000, 1048576)             = 0
[...]
mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
munmap(0x73d00000, 1572864)             = 0
[...]
mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
munmap(0x73d00000, 1572864)             = 0
[...]

As one can see from the above strace log, mmap() allocates further
pages below the initial one.

This patch fixes it by taking into account MAP_GROWSDOWN flag.

Fixes: d0f13e3c20b6f ("[POWERPC] Introduce address space "slices" ")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v2: Added missing include

 arch/powerpc/mm/hugetlbpage.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Aneesh Kumar K.V Jan. 16, 2018, 4:03 p.m. | #1
Christophe Leroy <christophe.leroy@c-s.fr> writes:

> An application running with libhugetlbfs fails to allocate
> additional pages to HEAP due to the hugemap being done
> inconditionally as topdown mapping:
>
> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
> [...]
> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
> munmap(0x73d80000, 1048576)             = 0
> [...]
> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
> munmap(0x73d00000, 1572864)             = 0
> [...]
> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
> munmap(0x73d00000, 1572864)             = 0
> [...]
>

Can you explain the failure details above. I am not sure I understand
what to read from the above output.

> As one can see from the above strace log, mmap() allocates further
> pages below the initial one.
>
> This patch fixes it by taking into account MAP_GROWSDOWN flag.

Rest of the kernel don't depend on that flag to select a topdown search
or not. So what is special with hugetlb? IF we select legacy mmap that
is when we select a bottomup search. Hugetlb on ppc64 always did a
topdown search.

>
> Fixes: d0f13e3c20b6f ("[POWERPC] Introduce address space "slices" ")
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  v2: Added missing include
>
>  arch/powerpc/mm/hugetlbpage.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index 79e1378ee303..0eadf9f199de 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -19,6 +19,7 @@
>  #include <linux/moduleparam.h>
>  #include <linux/swap.h>
>  #include <linux/swapops.h>
> +#include <linux/mman.h>
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
>  #include <asm/tlb.h>
> @@ -558,7 +559,8 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  		return radix__hugetlb_get_unmapped_area(file, addr, len,
>  						       pgoff, flags);
>  #endif
> -	return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
> +	return slice_get_unmapped_area(addr, len, flags, mmu_psize,
> +				       flags & MAP_GROWSDOWN);
>  }
>  #endif
>  
> -- 
> 2.13.3
Christophe LEROY Jan. 16, 2018, 4:48 p.m. | #2
Le 16/01/2018 à 17:03, Aneesh Kumar K.V a écrit :
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
> 
>> An application running with libhugetlbfs fails to allocate
>> additional pages to HEAP due to the hugemap being done
>> inconditionally as topdown mapping:
>>
>> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
>> [...]
>> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
>> munmap(0x73d80000, 1048576)             = 0
>> [...]
>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>> munmap(0x73d00000, 1572864)             = 0
>> [...]
>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>> munmap(0x73d00000, 1572864)             = 0
>> [...]
>>
> 
> Can you explain the failure details above. I am not sure I understand
> what to read from the above output.

libhugetlbfs first requests an area of size 1.5Mbytes, at address 0x10080000
mmap() returns an area at address 0x73e80000

Then libhugetlbfs requests an additional area on top of that, ie at 
address 0x74000000, to expand the heap.
But mmap() returns an area at address 0x73d80000, ie under the previous 
area.

This is not the behaviour when using the generic (ie without mm_slices) 
hugepages code, and this is not what libhugetlbfs expects for expending 
the heap.

> 
>> As one can see from the above strace log, mmap() allocates further
>> pages below the initial one.
>>
>> This patch fixes it by taking into account MAP_GROWSDOWN flag.
> 
> Rest of the kernel don't depend on that flag to select a topdown search
> or not. So what is special with hugetlb? IF we select legacy mmap that
> is when we select a bottomup search. Hugetlb on ppc64 always did a
> topdown search.

The generic hugepage code does a bottomup search. First page is 
allocated at address 0x30000000 and following pages are allocated at 
requested addresses when requested, then libhugetlbfs has no issue 
expanding the heap when required.

> 
>>
>> Fixes: d0f13e3c20b6f ("[POWERPC] Introduce address space "slices" ")
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   v2: Added missing include
>>
>>   arch/powerpc/mm/hugetlbpage.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 79e1378ee303..0eadf9f199de 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -19,6 +19,7 @@
>>   #include <linux/moduleparam.h>
>>   #include <linux/swap.h>
>>   #include <linux/swapops.h>
>> +#include <linux/mman.h>
>>   #include <asm/pgtable.h>
>>   #include <asm/pgalloc.h>
>>   #include <asm/tlb.h>
>> @@ -558,7 +559,8 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>>   		return radix__hugetlb_get_unmapped_area(file, addr, len,
>>   						       pgoff, flags);
>>   #endif
>> -	return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
>> +	return slice_get_unmapped_area(addr, len, flags, mmu_psize,
>> +				       flags & MAP_GROWSDOWN);
>>   }
>>   #endif
>>   
>> -- 
>> 2.13.3
Aneesh Kumar K.V Jan. 17, 2018, 3:19 a.m. | #3
On 01/16/2018 10:18 PM, Christophe LEROY wrote:
> 
> 
> Le 16/01/2018 à 17:03, Aneesh Kumar K.V a écrit :
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>
>>> An application running with libhugetlbfs fails to allocate
>>> additional pages to HEAP due to the hugemap being done
>>> inconditionally as topdown mapping:
>>>
>>> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, 
>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
>>> [...]
>>> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, 
>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
>>> munmap(0x73d80000, 1048576)             = 0
>>> [...]
>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>> munmap(0x73d00000, 1572864)             = 0
>>> [...]
>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>> munmap(0x73d00000, 1572864)             = 0
>>> [...]
>>>
>>
>> Can you explain the failure details above. I am not sure I understand
>> what to read from the above output.
> 
> libhugetlbfs first requests an area of size 1.5Mbytes, at address 
> 0x10080000
> mmap() returns an area at address 0x73e80000
> 
> Then libhugetlbfs requests an additional area on top of that, ie at 
> address 0x74000000, to expand the heap.
> But mmap() returns an area at address 0x73d80000, ie under the previous 
> area.
> 


Can you share the test details?. Why does it not fail on book3s64? We 
use topdown search with book3s64.

> This is not the behaviour when using the generic (ie without mm_slices) 
> hugepages code, and this is not what libhugetlbfs expects for expending 
> the heap.
> 
>

-aneesh
Christophe LEROY Jan. 17, 2018, 11:11 a.m. | #4
Le 17/01/2018 à 04:19, Aneesh Kumar K.V a écrit :
> 
> 
> On 01/16/2018 10:18 PM, Christophe LEROY wrote:
>>
>>
>> Le 16/01/2018 à 17:03, Aneesh Kumar K.V a écrit :
>>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>>
>>>> An application running with libhugetlbfs fails to allocate
>>>> additional pages to HEAP due to the hugemap being done
>>>> inconditionally as topdown mapping:
>>>>
>>>> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, 
>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
>>>> [...]
>>>> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, 
>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
>>>> munmap(0x73d80000, 1048576)             = 0
>>>> [...]
>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>> munmap(0x73d00000, 1572864)             = 0
>>>> [...]
>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>> munmap(0x73d00000, 1572864)             = 0
>>>> [...]
>>>>
>>>
>>> Can you explain the failure details above. I am not sure I understand
>>> what to read from the above output.
>>
>> libhugetlbfs first requests an area of size 1.5Mbytes, at address 
>> 0x10080000
>> mmap() returns an area at address 0x73e80000
>>
>> Then libhugetlbfs requests an additional area on top of that, ie at 
>> address 0x74000000, to expand the heap.
>> But mmap() returns an area at address 0x73d80000, ie under the 
>> previous area.
>>
> 
> 
> Can you share the test details?. Why does it not fail on book3s64? We 
> use topdown search with book3s64.

I don't know about book3s64, I only have 8xx.

Here is my test app:


#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
	char *p;
	char buf[16384];
	char filename[32];
	int fd, r;
	
	sprintf(filename, "/proc/%d/maps", getpid());

	fd = open(filename, O_RDONLY);
	r = read(fd, buf, sizeof(buf));
	close(fd);
	buf[r] = 0;
	fputs(buf, stderr);
	fputc('\n', stderr);
	
	p = malloc(1024*1024);
	fprintf(stderr, "\nAllocated 1Mbytes at %p\n\n", p);

	p = malloc(1024*1024);
	fprintf(stderr, "\nAllocated 1Mbytes at %p\n\n", p);

	p = malloc(1024*1024);
	fprintf(stderr, "\nAllocated 1Mbytes at %p\n\n", p);

	fd = open(filename, O_RDONLY);
	r = read(fd, buf, sizeof(buf));
	close(fd);
	buf[r] = 0;
	fputs(buf, stderr);
	fputc('\n', stderr);

	exit(0);
}

It is linked with -lhugetlbfs (version 2.20)
My 8xx board is configured with 64 huge pages, default size 512k:

root@vgoip:~# cat /proc/meminfo
MemTotal:         123664 kB
MemFree:           58464 kB
MemAvailable:      67904 kB
Buffers:               0 kB
Cached:            14480 kB
SwapCached:            0 kB
Active:            11616 kB
Inactive:           7872 kB
Active(anon):       7584 kB
Inactive(anon):      240 kB
Active(file):       4032 kB
Inactive(file):     7632 kB
Unevictable:        2560 kB
Mlocked:            2560 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          7568 kB
Mapped:             7456 kB
Shmem:               736 kB
Slab:               7456 kB
SReclaimable:       3120 kB
SUnreclaim:         4336 kB
KernelStack:         568 kB
PageTables:         1024 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       45440 kB
Committed_AS:      38880 kB
VmallocTotal:     866304 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HugePages_Total:      64
HugePages_Free:       64
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:        512 kB


Without the patch, my test app gives the following results: as you can 
see, second and third malloc returns address which is not in a hugepage. 
strace shows that libhugetlbfs fallsback on regular mmap because 
hugepage has not been allocated at the requested address.

root@vgoip:~# HUGETLB_MORECORE=yes ./huge_malloc_test
00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
77940000-77964000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
7797c000-77980000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
77980000-77984000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7fa58000-7fa7c000 rw-p 00000000 00:00 0          [stack]

libhugetlbfs: WARNING: Heap originates at 0x73e80000 instead of 0x10080000

Allocated 1Mbytes at 0x73e80008

libhugetlbfs: WARNING: New heap segment mapped at 0x73d80000 instead of 
0x74000000

Allocated 1Mbytes at 0x777fc008

libhugetlbfs: WARNING: New heap segment mapped at 0x73d00000 instead of 
0x74000000

Allocated 1Mbytes at 0x776b8008

00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
73e80000-74000000 rw-p 00000000 00:0b 98386      /anon_hugepage (deleted)
776b8000-77940000 rw-p 00000000 00:00 0
77940000-77964000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
7797c000-77980000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
77980000-77984000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7fa58000-7fa7c000 rw-p 00000000 00:00 0          [stack]


With the patch applied, it works properly, each malloc get an address 
within the hugepage space.

root@vgoip:~# HUGETLB_MORECORE=yes ./huge_malloc_test
00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
77884000-778a8000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
778c0000-778c4000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
778c4000-778c8000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7ff98000-7ffbc000 rw-p 00000000 00:00 0          [stack]

libhugetlbfs: WARNING: Heap originates at 0x30000000 instead of 0x10080000

Allocated 1Mbytes at 0x30000008


Allocated 1Mbytes at 0x30100010


Allocated 1Mbytes at 0x30200018

00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
30000000-30180000 rw-p 00000000 00:0b 7321       /anon_hugepage (deleted)
30180000-30280000 rw-p 00180000 00:0b 7322       /anon_hugepage (deleted)
30280000-30380000 rw-p 00280000 00:0b 7323       /anon_hugepage (deleted)
77884000-778a8000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
778c0000-778c4000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
778c4000-778c8000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7ff98000-7ffbc000 rw-p 00000000 00:00 0          [stack]



Christophe
Aneesh Kumar K.V Jan. 19, 2018, 10:05 a.m. | #5
Did a reply instead of reply-all.

Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> writes:

> Christophe LEROY <christophe.leroy@c-s.fr> writes:
>
>> Le 17/01/2018 à 04:19, Aneesh Kumar K.V a écrit :
>>> 
>>> 
>>> On 01/16/2018 10:18 PM, Christophe LEROY wrote:
>>>>
>>>>
>>>> Le 16/01/2018 à 17:03, Aneesh Kumar K.V a écrit :
>>>>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>>>>
>>>>>> An application running with libhugetlbfs fails to allocate
>>>>>> additional pages to HEAP due to the hugemap being done
>>>>>> inconditionally as topdown mapping:
>>>>>>
>>>>>> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE, 
>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
>>>>>> [...]
>>>>>> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE, 
>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
>>>>>> munmap(0x73d80000, 1048576)             = 0
>>>>>> [...]
>>>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>>>> munmap(0x73d00000, 1572864)             = 0
>>>>>> [...]
>>>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE, 
>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>>>> munmap(0x73d00000, 1572864)             = 0
>>>>>> [...]
>>>>>>
>>>>>
>>>>> Can you explain the failure details above. I am not sure I understand
>>>>> what to read from the above output.
>>>>
>>>> libhugetlbfs first requests an area of size 1.5Mbytes, at address 
>>>> 0x10080000
>>>> mmap() returns an area at address 0x73e80000
>>>>
>>>> Then libhugetlbfs requests an additional area on top of that, ie at 
>>>> address 0x74000000, to expand the heap.
>>>> But mmap() returns an area at address 0x73d80000, ie under the 
>>>> previous area.
>>>>
>>> 
>>> 
>>> Can you share the test details?. Why does it not fail on book3s64? We 
>>> use topdown search with book3s64.
>>
>> I don't know about book3s64, I only have 8xx.
>>
>> Here is my test app:
>>
>
> The test ran fine on ppc64.
>
> kvaneesh@ltctulc6a-p1:[~]$ HUGETLB_MORECORE=yes ./a.out 
> 10000000-10010000 r-xp 00000000 fc:00 9044312                            /home/kvaneesh/a.out
> 10010000-10020000 r--p 00000000 fc:00 9044312                            /home/kvaneesh/a.out
> 10020000-10030000 rw-p 00010000 fc:00 9044312                            /home/kvaneesh/a.out
> 7ffff7d60000-7ffff7f10000 r-xp 00000000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f10000-7ffff7f20000 r--p 001a0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f20000-7ffff7f30000 rw-p 001b0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f40000-7ffff7f60000 r-xp 00000000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f60000-7ffff7f70000 r--p 00010000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f70000-7ffff7f80000 rw-p 00020000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f80000-7ffff7fa0000 r-xp 00000000 00:00 0                          [vdso]
> 7ffff7fa0000-7ffff7fe0000 r-xp 00000000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffff7fe0000-7ffff7ff0000 r--p 00030000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffff7ff0000-7ffff8000000 rw-p 00040000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffffffd0000-800000000000 rw-p 00000000 00:00 0                          [stack]
>
>
> Allocated 1Mbytes at 0x10000000010
>
>
> Allocated 1Mbytes at 0x10002000020
>
>
> Allocated 1Mbytes at 0x10004000030
>
> 10000000-10010000 r-xp 00000000 fc:00 9044312                            /home/kvaneesh/a.out
> 10010000-10020000 r--p 00000000 fc:00 9044312                            /home/kvaneesh/a.out
> 10020000-10030000 rw-p 00010000 fc:00 9044312                            /home/kvaneesh/a.out
> 10000000000-10003000000 rw-p 00000000 00:0d 1041435                      /anon_hugepage (deleted)
> 10003000000-10005000000 rw-p 03000000 00:0d 1041436                      /anon_hugepage (deleted)
> 10005000000-10007000000 rw-p 05000000 00:0d 1041437                      /anon_hugepage (deleted)
> 7ffff7d60000-7ffff7f10000 r-xp 00000000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f10000-7ffff7f20000 r--p 001a0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f20000-7ffff7f30000 rw-p 001b0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
> 7ffff7f40000-7ffff7f60000 r-xp 00000000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f60000-7ffff7f70000 r--p 00010000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f70000-7ffff7f80000 rw-p 00020000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
> 7ffff7f80000-7ffff7fa0000 r-xp 00000000 00:00 0                          [vdso]
> 7ffff7fa0000-7ffff7fe0000 r-xp 00000000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffff7fe0000-7ffff7ff0000 r--p 00030000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffff7ff0000-7ffff8000000 rw-p 00040000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
> 7ffffffd0000-800000000000 rw-p 00000000 00:00 0                          [stack]
>
>
>
> So i am definitely missing something. I understand that generic hugetlb
> get unmapped area always search bottom up and 8xx used to depend on that
> callback. But on ppc64 slice based get unmapped area always did topdown
> and I am not sure whether we should change that. More over I don't think
> MAP_GROWSDOWN is the right flag for selecting topdown/bottom up search.
>
>
> Is it that libhugetlbfs does something specific for 32 bit? Other option
> is to add huget_get_unmapped_area for 8xx that does bottom up search?
>
> If you are on ppc64 irc on freenode we can discuss this there.
> -aneesh
Christophe LEROY Jan. 22, 2018, 8:22 a.m. | #6
Le 19/01/2018 à 11:05, Aneesh Kumar K.V a écrit :
> 
> Did a reply instead of reply-all.
> 
> Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
>> Christophe LEROY <christophe.leroy@c-s.fr> writes:
>>
>>> Le 17/01/2018 à 04:19, Aneesh Kumar K.V a écrit :
>>>>
>>>>
>>>> On 01/16/2018 10:18 PM, Christophe LEROY wrote:
>>>>>
>>>>>
>>>>> Le 16/01/2018 à 17:03, Aneesh Kumar K.V a écrit :
>>>>>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>>>>>
>>>>>>> An application running with libhugetlbfs fails to allocate
>>>>>>> additional pages to HEAP due to the hugemap being done
>>>>>>> inconditionally as topdown mapping:
>>>>>>>
>>>>>>> mmap(0x10080000, 1572864, PROT_READ|PROT_WRITE,
>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x73e80000
>>>>>>> [...]
>>>>>>> mmap(0x74000000, 1048576, PROT_READ|PROT_WRITE,
>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d80000
>>>>>>> munmap(0x73d80000, 1048576)             = 0
>>>>>>> [...]
>>>>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE,
>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>>>>> munmap(0x73d00000, 1572864)             = 0
>>>>>>> [...]
>>>>>>> mmap(0x74000000, 1572864, PROT_READ|PROT_WRITE,
>>>>>>> MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0x180000) = 0x73d00000
>>>>>>> munmap(0x73d00000, 1572864)             = 0
>>>>>>> [...]
>>>>>>>
>>>>>>
>>>>>> Can you explain the failure details above. I am not sure I understand
>>>>>> what to read from the above output.
>>>>>
>>>>> libhugetlbfs first requests an area of size 1.5Mbytes, at address
>>>>> 0x10080000
>>>>> mmap() returns an area at address 0x73e80000
>>>>>
>>>>> Then libhugetlbfs requests an additional area on top of that, ie at
>>>>> address 0x74000000, to expand the heap.
>>>>> But mmap() returns an area at address 0x73d80000, ie under the
>>>>> previous area.
>>>>>
>>>>
>>>>
>>>> Can you share the test details?. Why does it not fail on book3s64? We
>>>> use topdown search with book3s64.
>>>
>>> I don't know about book3s64, I only have 8xx.
>>>
>>> Here is my test app:
>>>
>>
>> The test ran fine on ppc64.
>>
>> kvaneesh@ltctulc6a-p1:[~]$ HUGETLB_MORECORE=yes ./a.out
>> 10000000-10010000 r-xp 00000000 fc:00 9044312                            /home/kvaneesh/a.out
>> 10010000-10020000 r--p 00000000 fc:00 9044312                            /home/kvaneesh/a.out
>> 10020000-10030000 rw-p 00010000 fc:00 9044312                            /home/kvaneesh/a.out
>> 7ffff7d60000-7ffff7f10000 r-xp 00000000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f10000-7ffff7f20000 r--p 001a0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f20000-7ffff7f30000 rw-p 001b0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f40000-7ffff7f60000 r-xp 00000000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f60000-7ffff7f70000 r--p 00010000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f70000-7ffff7f80000 rw-p 00020000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f80000-7ffff7fa0000 r-xp 00000000 00:00 0                          [vdso]
>> 7ffff7fa0000-7ffff7fe0000 r-xp 00000000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffff7fe0000-7ffff7ff0000 r--p 00030000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffff7ff0000-7ffff8000000 rw-p 00040000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffffffd0000-800000000000 rw-p 00000000 00:00 0                          [stack]
>>
>>
>> Allocated 1Mbytes at 0x10000000010
>>
>>
>> Allocated 1Mbytes at 0x10002000020
>>
>>
>> Allocated 1Mbytes at 0x10004000030
>>
>> 10000000-10010000 r-xp 00000000 fc:00 9044312                            /home/kvaneesh/a.out
>> 10010000-10020000 r--p 00000000 fc:00 9044312                            /home/kvaneesh/a.out
>> 10020000-10030000 rw-p 00010000 fc:00 9044312                            /home/kvaneesh/a.out
>> 10000000000-10003000000 rw-p 00000000 00:0d 1041435                      /anon_hugepage (deleted)
>> 10003000000-10005000000 rw-p 03000000 00:0d 1041436                      /anon_hugepage (deleted)
>> 10005000000-10007000000 rw-p 05000000 00:0d 1041437                      /anon_hugepage (deleted)
>> 7ffff7d60000-7ffff7f10000 r-xp 00000000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f10000-7ffff7f20000 r--p 001a0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f20000-7ffff7f30000 rw-p 001b0000 fc:00 9250090                    /lib/powerpc64le-linux-gnu/libc-2.23.so
>> 7ffff7f40000-7ffff7f60000 r-xp 00000000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f60000-7ffff7f70000 r--p 00010000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f70000-7ffff7f80000 rw-p 00020000 fc:00 10754812                   /usr/lib/libhugetlbfs.so.0
>> 7ffff7f80000-7ffff7fa0000 r-xp 00000000 00:00 0                          [vdso]
>> 7ffff7fa0000-7ffff7fe0000 r-xp 00000000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffff7fe0000-7ffff7ff0000 r--p 00030000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffff7ff0000-7ffff8000000 rw-p 00040000 fc:00 9250107                    /lib/powerpc64le-linux-gnu/ld-2.23.so
>> 7ffffffd0000-800000000000 rw-p 00000000 00:00 0                          [stack]
>>
>>
>>
>> So i am definitely missing something. I understand that generic hugetlb
>> get unmapped area always search bottom up and 8xx used to depend on that
>> callback. But on ppc64 slice based get unmapped area always did topdown
>> and I am not sure whether we should change that. More over I don't think
>> MAP_GROWSDOWN is the right flag for selecting topdown/bottom up search.
>>
>>
>> Is it that libhugetlbfs does something specific for 32 bit? Other option
>> is to add huget_get_unmapped_area for 8xx that does bottom up search?

I think I identified the difference. In my run you have the following 
warning:

libhugetlbfs: WARNING: Heap originates at 0x73e80000 instead of 0x10080000

In your run, there is no such warning.

I tried running the test with HUGETLB_MORECORE_HEAPBASE=0x30000000, and 
it works without the patch:

root@vgoip:~# HUGETLB_MORECORE=yes HUGETLB_MORECORE_HEAPBASE=0x30000000 
./huge_m
alloc_test
00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
77ee0000-77f04000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
77f1c000-77f20000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
77f20000-77f24000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7f830000-7f854000 rw-p 00000000 00:00 0          [stack]


Allocated 1Mbytes at 0x30000008


Allocated 1Mbytes at 0x30100010


Allocated 1Mbytes at 0x30200018

00100000-00108000 r-xp 00000000 00:00 0          [vdso]
0fde4000-0fde8000 r-xp 00000000 00:0f 168        /lib/libdl-2.23.so
0fde8000-0fe00000 ---p 00004000 00:0f 168        /lib/libdl-2.23.so
0fe00000-0fe04000 r--p 0000c000 00:0f 168        /lib/libdl-2.23.so
0fe04000-0fe08000 rwxp 00010000 00:0f 168        /lib/libdl-2.23.so
0fe18000-0ff88000 r-xp 00000000 00:0f 191        /lib/libc-2.23.so
0ff88000-0ffa4000 ---p 00170000 00:0f 191        /lib/libc-2.23.so
0ffa4000-0ffa8000 r--p 0017c000 00:0f 191        /lib/libc-2.23.so
0ffa8000-0ffac000 rwxp 00180000 00:0f 191        /lib/libc-2.23.so
0ffac000-0ffb0000 rwxp 00000000 00:00 0
0ffc0000-0ffd4000 r-xp 00000000 00:0f 90         /lib/libhugetlbfs.so
0ffd4000-0ffe0000 ---p 00014000 00:0f 90         /lib/libhugetlbfs.so
0ffe0000-0ffe4000 rwxp 00010000 00:0f 90         /lib/libhugetlbfs.so
0ffe4000-0fff0000 rwxp 00000000 00:00 0
10000000-10004000 r-xp 00000000 00:0f 3076       /root/huge_malloc_test
10010000-10014000 rwxp 00000000 00:0f 3076       /root/huge_malloc_test
30000000-30180000 rw-p 00000000 00:0b 7682       /anon_hugepage (deleted)
30180000-30280000 rw-p 00180000 00:0b 7683       /anon_hugepage (deleted)
30280000-30380000 rw-p 00280000 00:0b 7684       /anon_hugepage (deleted)
77ee0000-77f04000 r-xp 00000000 00:0f 171        /lib/ld-2.23.so
77f1c000-77f20000 r--p 0002c000 00:0f 171        /lib/ld-2.23.so
77f20000-77f24000 rwxp 00030000 00:0f 171        /lib/ld-2.23.so
7f830000-7f854000 rw-p 00000000 00:00 0          [stack]


On your side, could you try and see with 
HUGETLB_MORECORE_HEAPBASE=0x11000000 ?

Christophe


>>
>> If you are on ppc64 irc on freenode we can discuss this there.
>> -aneesh

Patch

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 79e1378ee303..0eadf9f199de 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -19,6 +19,7 @@ 
 #include <linux/moduleparam.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mman.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/tlb.h>
@@ -558,7 +559,8 @@  unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 		return radix__hugetlb_get_unmapped_area(file, addr, len,
 						       pgoff, flags);
 #endif
-	return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
+	return slice_get_unmapped_area(addr, len, flags, mmu_psize,
+				       flags & MAP_GROWSDOWN);
 }
 #endif