diff mbox series

[v2,04/69] mm/hugetlb: Initialize gigantic bootmem hugepage struct pages earlier

Message ID 20260513130542.35604-5-songmuchun@bytedance.com (mailing list archive)
State Handled Elsewhere
Headers show
Series mm: Generalize HVO for HugeTLB and device DAX | expand

Commit Message

Muchun Song May 13, 2026, 1:04 p.m. UTC
Gigantic bootmem HugeTLB pages are currently initialized from hugetlb_init(),
but page_alloc_init_late() runs earlier and walks pageblocks to determine
zone contiguity.

If a bootmem HugeTLB region is marked noinit, set_zone_contiguous() can
observe still-uninitialized struct pages through __pageblock_pfn_to_page().
This may not trigger an immediate failure, but it can make
set_zone_contiguous() compute the wrong zone contiguity state. If extra
poisoned-page checks are added in this path, such as PF_POISONED_CHECK()
in page_zone_id(), it can also trigger an early boot panic.

Initialize gigantic bootmem HugeTLB struct pages from page_alloc_init_late(),
before zone contiguity is evaluated, so later page allocator setup only
sees valid struct page state. This also makes the initialization order
more natural, as struct pages should be initialized before later code
inspects them.

Fixes: fde1c4ecf916 ("mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/hugetlb.h | 5 +++++
 mm/hugetlb.c            | 3 +--
 mm/mm_init.c            | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

Comments

Oscar Salvador May 14, 2026, 8:05 a.m. UTC | #1
On Wed, May 13, 2026 at 09:04:32PM +0800, Muchun Song wrote:
> Gigantic bootmem HugeTLB pages are currently initialized from hugetlb_init(),
> but page_alloc_init_late() runs earlier and walks pageblocks to determine
> zone contiguity.
> 
> If a bootmem HugeTLB region is marked noinit, set_zone_contiguous() can
> observe still-uninitialized struct pages through __pageblock_pfn_to_page().
> This may not trigger an immediate failure, but it can make
> set_zone_contiguous() compute the wrong zone contiguity state. If extra
> poisoned-page checks are added in this path, such as PF_POISONED_CHECK()
> in page_zone_id(), it can also trigger an early boot panic.
> 
> Initialize gigantic bootmem HugeTLB struct pages from page_alloc_init_late(),
> before zone contiguity is evaluated, so later page allocator setup only
> sees valid struct page state. This also makes the initialization order
> more natural, as struct pages should be initialized before later code
> inspects them.
> 
> Fixes: fde1c4ecf916 ("mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Oscar Salvador <osalvador@suse.de>

but

> ---
>  include/linux/hugetlb.h | 5 +++++
>  mm/hugetlb.c            | 3 +--
>  mm/mm_init.c            | 1 +
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 93418625d3c5..52a2c30f866c 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -173,6 +173,7 @@ extern int movable_gigantic_pages __read_mostly;
>  extern int sysctl_hugetlb_shm_group __read_mostly;
>  extern struct list_head huge_boot_pages[MAX_NUMNODES];
>  
> +void hugetlb_struct_page_init(void);

we lost the hint that this only takes care of bootmem pages.
So I think hugetlb_bootmem_struct_page_init or something like that would make
it more clear?
Muchun Song May 14, 2026, 8:16 a.m. UTC | #2
> On May 14, 2026, at 16:05, Oscar Salvador <osalvador@suse.de> wrote:
> 
> On Wed, May 13, 2026 at 09:04:32PM +0800, Muchun Song wrote:
>> Gigantic bootmem HugeTLB pages are currently initialized from hugetlb_init(),
>> but page_alloc_init_late() runs earlier and walks pageblocks to determine
>> zone contiguity.
>> 
>> If a bootmem HugeTLB region is marked noinit, set_zone_contiguous() can
>> observe still-uninitialized struct pages through __pageblock_pfn_to_page().
>> This may not trigger an immediate failure, but it can make
>> set_zone_contiguous() compute the wrong zone contiguity state. If extra
>> poisoned-page checks are added in this path, such as PF_POISONED_CHECK()
>> in page_zone_id(), it can also trigger an early boot panic.
>> 
>> Initialize gigantic bootmem HugeTLB struct pages from page_alloc_init_late(),
>> before zone contiguity is evaluated, so later page allocator setup only
>> sees valid struct page state. This also makes the initialization order
>> more natural, as struct pages should be initialized before later code
>> inspects them.
>> 
>> Fixes: fde1c4ecf916 ("mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> 
> Acked-by: Oscar Salvador <osalvador@suse.de>
> 

Thanks.

> but
> 
>> ---
>> include/linux/hugetlb.h | 5 +++++
>> mm/hugetlb.c            | 3 +--
>> mm/mm_init.c            | 1 +
>> 3 files changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 93418625d3c5..52a2c30f866c 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -173,6 +173,7 @@ extern int movable_gigantic_pages __read_mostly;
>> extern int sysctl_hugetlb_shm_group __read_mostly;
>> extern struct list_head huge_boot_pages[MAX_NUMNODES];
>> 
>> +void hugetlb_struct_page_init(void);
> 
> we lost the hint that this only takes care of bootmem pages.

Right.

> So I think hugetlb_bootmem_struct_page_init or something like that would make
> it more clear?

Yes. Make sense.

Thanks,
Muchun

> 
> 
> -- 
> Oscar Salvador
> SUSE Labs
Mike Rapoport May 22, 2026, 8:13 a.m. UTC | #3
On Wed, 13 May 2026 21:04:32 +0800, Muchun Song <songmuchun@bytedance.com> wrote:
> Gigantic bootmem HugeTLB pages are currently initialized from hugetlb_init(),
> but page_alloc_init_late() runs earlier and walks pageblocks to determine
> zone contiguity.
> 
> If a bootmem HugeTLB region is marked noinit, set_zone_contiguous() can
> observe still-uninitialized struct pages through __pageblock_pfn_to_page().
> This may not trigger an immediate failure, but it can make
> set_zone_contiguous() compute the wrong zone contiguity state. If extra
> poisoned-page checks are added in this path, such as PF_POISONED_CHECK()
> in page_zone_id(), it can also trigger an early boot panic.
> 
> [...]

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 93418625d3c5..52a2c30f866c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -173,6 +173,7 @@  extern int movable_gigantic_pages __read_mostly;
 extern int sysctl_hugetlb_shm_group __read_mostly;
 extern struct list_head huge_boot_pages[MAX_NUMNODES];
 
+void hugetlb_struct_page_init(void);
 void hugetlb_bootmem_alloc(void);
 extern nodemask_t hugetlb_bootmem_nodes;
 void hugetlb_bootmem_set_nodes(void);
@@ -1307,6 +1308,10 @@  static inline bool hugetlbfs_pagecache_present(
 static inline void hugetlb_bootmem_alloc(void)
 {
 }
+
+static inline void hugetlb_struct_page_init(void)
+{
+}
 #endif	/* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d22683ab30a1..b4999653a156 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3370,7 +3370,7 @@  static void __init gather_bootmem_prealloc_parallel(unsigned long start,
 		gather_bootmem_prealloc_node(nid);
 }
 
-static void __init gather_bootmem_prealloc(void)
+void __init hugetlb_struct_page_init(void)
 {
 	struct padata_mt_job job = {
 		.thread_fn	= gather_bootmem_prealloc_parallel,
@@ -4163,7 +4163,6 @@  static int __init hugetlb_init(void)
 	}
 
 	hugetlb_init_hstates();
-	gather_bootmem_prealloc();
 	report_hugepages();
 
 	hugetlb_sysfs_init();
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fde49f7bba6c..5a910cc5534c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2335,6 +2335,7 @@  void __init page_alloc_init_late(void)
 	/* Reinit limits that are based on free pages after the kernel is up */
 	files_maxfiles_init();
 #endif
+	hugetlb_struct_page_init();
 
 	/* Accounting of total+free memory is stable at this point. */
 	mem_init_print_info();