diff mbox series

powerpc/mm: Ensure Huge-page memory is free before allocation

Message ID 20190618044609.19997-1-vaibhav@linux.ibm.com (mailing list archive)
State Not Applicable
Headers show
Series powerpc/mm: Ensure Huge-page memory is free before allocation | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (e610a466d16a086e321f0bd421e2fc75cff28605)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 11 lines checked

Commit Message

Vaibhav Jain June 18, 2019, 4:46 a.m. UTC
We recently discovered an bug where physical memory meant for
allocation of Huge-pages was inadvertently allocated by another component
during early boot. The behavior of memblock_reserve() where it wont
indicate whether an existing reserved block overlaps with the
requested reservation only makes such bugs hard to investigate.

Hence this patch proposes adding a memblock reservation check in
htab_dt_scan_hugepage_blocks() just before call to memblock_reserve()
to ensure that the physical memory thats being reserved for is not
already reserved by someone else. In case this happens we panic the
the kernel to ensure that user of this huge-page doesn't accidentally
stomp on memory allocated to someone else.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/hash_utils.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Michael Ellerman June 18, 2019, 12:07 p.m. UTC | #1
Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> We recently discovered an bug where physical memory meant for
> allocation of Huge-pages was inadvertently allocated by another component
> during early boot.

Can you give me some more detail on what that was? You're seemingly the
only person who's ever hit this :)

> The behavior of memblock_reserve() where it wont
> indicate whether an existing reserved block overlaps with the
> requested reservation only makes such bugs hard to investigate.
>
> Hence this patch proposes adding a memblock reservation check in
> htab_dt_scan_hugepage_blocks() just before call to memblock_reserve()
> to ensure that the physical memory thats being reserved for is not
> already reserved by someone else. In case this happens we panic the
> the kernel to ensure that user of this huge-page doesn't accidentally
> stomp on memory allocated to someone else.

Do we really need to panic? Can't we just leave the block alone and not
register it as huge page memory? With a big warning obviously.

cheers

> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..a05be3adb8c9 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -516,6 +516,11 @@ static int __init htab_dt_scan_hugepage_blocks(unsigned long node,
>  	printk(KERN_INFO "Huge page(16GB) memory: "
>  			"addr = 0x%lX size = 0x%lX pages = %d\n",
>  			phys_addr, block_size, expected_pages);
> +
> +	/* Ensure no one else has reserved memory for huge pages before */
> +	BUG_ON(memblock_is_region_reserved(phys_addr,
> +					   block_size * expected_pages));
> +
>  	if (phys_addr + block_size * expected_pages <= memblock_end_of_DRAM()) {
>  		memblock_reserve(phys_addr, block_size * expected_pages);
>  		pseries_add_gpage(phys_addr, block_size, expected_pages);
> -- 
> 2.21.0
Aneesh Kumar K V June 18, 2019, 12:58 p.m. UTC | #2
On 6/18/19 5:37 PM, Michael Ellerman wrote:
> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>> We recently discovered an bug where physical memory meant for
>> allocation of Huge-pages was inadvertently allocated by another component
>> during early boot.
> 
> Can you give me some more detail on what that was? You're seemingly the
> only person who's ever hit this :)
> 
>> The behavior of memblock_reserve() where it wont
>> indicate whether an existing reserved block overlaps with the
>> requested reservation only makes such bugs hard to investigate.
>>
>> Hence this patch proposes adding a memblock reservation check in
>> htab_dt_scan_hugepage_blocks() just before call to memblock_reserve()
>> to ensure that the physical memory thats being reserved for is not
>> already reserved by someone else. In case this happens we panic the
>> the kernel to ensure that user of this huge-page doesn't accidentally
>> stomp on memory allocated to someone else.
> 
> Do we really need to panic? Can't we just leave the block alone and not
> register it as huge page memory? With a big warning obviously.
> 

I need to check this again with Paul. But IIRC we have issues w.r.t hash 
page table size, if we use 16G pages as 64K mapped pages. ie, hypervisor 
create hash page table size with the assumptions that these pfns will 
only be mapped as 16G pages. If we try to put 64K hash pte entries  for 
them we will not have enough space in hash page table.

That is one of the reason we never allowed freeing the hugetlb reserved 
pool with 16G pages back to buddy.


Note: We should switch that BUG to panic. I guess using BUG that early 
don't work.

-aneesh
Vaibhav Jain June 18, 2019, 1:27 p.m. UTC | #3
Michael Ellerman <mpe@ellerman.id.au> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>> We recently discovered an bug where physical memory meant for
>> allocation of Huge-pages was inadvertently allocated by another component
>> during early boot.
>
> Can you give me some more detail on what that was? You're seemingly the
> only person who's ever hit this :)

The specific bug I investigated was in fadump which was trying to
reserve a large chunk of physically contiguous memory from memblock and
inadvertently stomped into a memory region that was reserved for
allocation of 16G hugepages. This happened because fadump reservation
happens much earlier in prom-init compared to hugepage reservation.

The bug manifested as a panic seen when trying to 'cat
/proc/pagetypeinfo' that dumps the buddy stats for each
zone/migrate-type.

Incidentally fadump after reserving this memory, would carve out a CMA
region that was then entered into the buddy-allocater. This would cause
pagetypeinfo_showfree_print() to fail when it tries to iterate over the
free list of this CMA migrate type as the corresponding memmap for these
pages was never initialized.

>
>> The behavior of memblock_reserve() where it wont
>> indicate whether an existing reserved block overlaps with the
>> requested reservation only makes such bugs hard to investigate.
>>
>> Hence this patch proposes adding a memblock reservation check in
>> htab_dt_scan_hugepage_blocks() just before call to memblock_reserve()
>> to ensure that the physical memory thats being reserved for is not
>> already reserved by someone else. In case this happens we panic the
>> the kernel to ensure that user of this huge-page doesn't accidentally
>> stomp on memory allocated to someone else.
>
> Do we really need to panic? Can't we just leave the block alone and not
> register it as huge page memory? With a big warning obviously.
Possibly yes, but Aneesh pointed out that this memory is supposed to be backed
only by 16G pages mapping due to limitation in phyp for hash page table size. 

>
> cheers
>
>> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
>> index 28ced26f2a00..a05be3adb8c9 100644
>> --- a/arch/powerpc/mm/book3s64/hash_utils.c
>> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
>> @@ -516,6 +516,11 @@ static int __init htab_dt_scan_hugepage_blocks(unsigned long node,
>>  	printk(KERN_INFO "Huge page(16GB) memory: "
>>  			"addr = 0x%lX size = 0x%lX pages = %d\n",
>>  			phys_addr, block_size, expected_pages);
>> +
>> +	/* Ensure no one else has reserved memory for huge pages before */
>> +	BUG_ON(memblock_is_region_reserved(phys_addr,
>> +					   block_size * expected_pages));
>> +
>>  	if (phys_addr + block_size * expected_pages <= memblock_end_of_DRAM()) {
>>  		memblock_reserve(phys_addr, block_size * expected_pages);
>>  		pseries_add_gpage(phys_addr, block_size, expected_pages);
>> -- 
>> 2.21.0
>
diff mbox series

Patch

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 28ced26f2a00..a05be3adb8c9 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -516,6 +516,11 @@  static int __init htab_dt_scan_hugepage_blocks(unsigned long node,
 	printk(KERN_INFO "Huge page(16GB) memory: "
 			"addr = 0x%lX size = 0x%lX pages = %d\n",
 			phys_addr, block_size, expected_pages);
+
+	/* Ensure no one else has reserved memory for huge pages before */
+	BUG_ON(memblock_is_region_reserved(phys_addr,
+					   block_size * expected_pages));
+
 	if (phys_addr + block_size * expected_pages <= memblock_end_of_DRAM()) {
 		memblock_reserve(phys_addr, block_size * expected_pages);
 		pseries_add_gpage(phys_addr, block_size, expected_pages);