diff mbox

powerpc/mm/hugetlb: Filter out hugepage size not supported by page table layout

Message ID 1487499527-16166-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Aneesh Kumar K.V Feb. 19, 2017, 10:18 a.m. UTC
Without this if firmware reports 1MB page size support we will crash
trying to use 1MB as hugetlb page size.

echo 300 > /sys/kernel/mm/hugepages/hugepages-1024kB/nr_hugepages

kernel BUG at ./arch/powerpc/include/asm/hugetlb.h:19!
.....
....
[c0000000e2c27b30] c00000000029dae8 .hugetlb_fault+0x638/0xda0
[c0000000e2c27c30] c00000000026fb64 .handle_mm_fault+0x844/0x1d70
[c0000000e2c27d70] c00000000004805c .do_page_fault+0x3dc/0x7c0
[c0000000e2c27e30] c00000000000ac98 handle_page_fault+0x10/0x30

With fix, we don't enable 1MB as hugepage size.

bash-4.2# cd /sys/kernel/mm/hugepages/
bash-4.2# ls
hugepages-16384kB  hugepages-16777216kB

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hugetlbpage.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Benjamin Herrenschmidt Feb. 19, 2017, 9:05 p.m. UTC | #1
On Sun, 2017-02-19 at 15:48 +0530, Aneesh Kumar K.V wrote:
> +#ifdef CONFIG_PPC_BOOK3S_64
> +       /*
> +        * We need to make sure that for different page sizes reported by
> +        * firmware we only add hugetlb support for page sizes that can be
> +        * supported by linux page table layout.
> +        * For now we have
> +        * Radix: 2M
> +        * Hash: 16M and 16G
> +        */
> +       if (radix_enabled()) {
> +               if (mmu_psize != MMU_PAGE_2M)
> +                       return -EINVAL;
> +       } else {
> +               if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
> +                       return -EINVAL;
> +       }

Hash could support others... Same with radix and PUD level pages.

Why do we need that ? Won't FW provide separate properties for hash and
radix page sizes anyway ?

Ben.
Aneesh Kumar K.V Feb. 20, 2017, 3:32 a.m. UTC | #2
On Monday 20 February 2017 02:35 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2017-02-19 at 15:48 +0530, Aneesh Kumar K.V wrote:
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +       /*
>> +        * We need to make sure that for different page sizes reported by
>> +        * firmware we only add hugetlb support for page sizes that can be
>> +        * supported by linux page table layout.
>> +        * For now we have
>> +        * Radix: 2M
>> +        * Hash: 16M and 16G
>> +        */
>> +       if (radix_enabled()) {
>> +               if (mmu_psize != MMU_PAGE_2M)
>> +                       return -EINVAL;
>> +       } else {
>> +               if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
>> +                       return -EINVAL;
>> +       }
> Hash could support others...

On book3s 64 ? I had the above within #ifdef.

> Same with radix and PUD level pages.

Yes, but gigantic hugepage is not yet supported. Once we add that we will
add MMU_PAGE_1G here.


>
> Why do we need that ? Won't FW provide separate properties for hash and
> radix page sizes anyway ?
>

To avoid crashes like the one reported in the commit message due to 
buggy firmware ? Also
It can serve as an easy way to understand what hugepage sizes are 
supported by different platforms.
I am yet to figure out what the FSL_BOOK3E and PPC_8xx #ifdef above that 
hunk is all about. Having
the supported hugepage size clearly verified against makes it easy ?

-aneesh
Benjamin Herrenschmidt Feb. 20, 2017, 3:35 a.m. UTC | #3
On Mon, 2017-02-20 at 09:02 +0530, Aneesh Kumar K.V wrote:
> To avoid crashes like the one reported in the commit message due to 
> buggy firmware ? 

I don't want Linux to make those assumptions. We should fix the FW.

Think of backward compat for example.

> Also
> It can serve as an easy way to understand what hugepage sizes are 
> supported by different platforms.
> I am yet to figure out what the FSL_BOOK3E and PPC_8xx #ifdef above
> that 
> hunk is all about. Having
> the supported hugepage size clearly verified against makes it easy ?
> 
> -aneesh
Aneesh Kumar K.V Feb. 20, 2017, 5:36 a.m. UTC | #4
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2017-02-20 at 09:02 +0530, Aneesh Kumar K.V wrote:
>> To avoid crashes like the one reported in the commit message due to 
>> buggy firmware ? 
>
> I don't want Linux to make those assumptions. We should fix the FW.
>

I was not suggesting to not fix FW. The idea was two fold.

We cannot support different hugetlb page sizes. They need to
be supported at linux page table level. So a generic check like
is_power_of_2/4() may not be what we want. The second is to document
clearly what are the different page sizes supported by a platform.

-aneesh
diff mbox

Patch

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8c3389cbcd12..a4f33de4008e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -753,6 +753,24 @@  static int __init add_huge_page_size(unsigned long long size)
 	if ((mmu_psize = shift_to_mmu_psize(shift)) < 0)
 		return -EINVAL;
 
+#ifdef CONFIG_PPC_BOOK3S_64
+	/*
+	 * We need to make sure that for different page sizes reported by
+	 * firmware we only add hugetlb support for page sizes that can be
+	 * supported by linux page table layout.
+	 * For now we have
+	 * Radix: 2M
+	 * Hash: 16M and 16G
+	 */
+	if (radix_enabled()) {
+		if (mmu_psize != MMU_PAGE_2M)
+			return -EINVAL;
+	} else {
+		if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
+			return -EINVAL;
+	}
+#endif
+
 	BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
 	/* Return if huge page size has already been setup */