Message ID | 20190514010543.29896-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/mm: Handle page table allocation failures | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (8150a153c013aa2dd1ffae43370b89ac1347a7fb) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 1 warnings, 0 checks, 32 lines checked |
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > This fix the below crash that arise due to not handling page table allocation > failures while allocating hugetlb page table. > > BUG: Kernel NULL pointer dereference at 0x0000001c > Faulting instruction address: 0xc000000001d1e58c > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > > CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G W O 5.1.0-next-20190507-autotest #1 > NIP: c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000 > REGS: c000000004937890 TRAP: 0300 Tainted: G W O (5.1.0-next-20190507-autotest) > MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22424822 XER: 00000000 > CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0 > GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000 > GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700 > GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000 > GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460 > GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80 > GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0 > NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0 > LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0 > Call Trace: > [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable) > [c000000001901a80] huge_pte_alloc+0x580/0x950 > [c000000001cf7910] hugetlb_fault+0x9a0/0x1250 > [c000000001c94a80] handle_mm_fault+0x490/0x4a0 > [c0000000018d529c] __do_page_fault+0x77c/0x1f00 > [c0000000018d6a48] do_page_fault+0x28/0x50 > [c00000000183b0d4] handle_page_fault+0x18/0x38 > > Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format") > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > > Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table > allocation failure there. If we want to go to that old commit, then we may need. If we never checked for failure in that path, is there some reason we've only just noticed the crashes? Are we just testing under memory pressure more effectively than we used to? cheers
> On 14-May-2019, at 12:10 PM, Michael Ellerman <mpe@ellerman.id.au> wrote: > > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> This fix the below crash that arise due to not handling page table allocation >> failures while allocating hugetlb page table. >> >> BUG: Kernel NULL pointer dereference at 0x0000001c >> Faulting instruction address: 0xc000000001d1e58c >> Oops: Kernel access of bad area, sig: 11 [#1] >> LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries >> >> CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G W O 5.1.0-next-20190507-autotest #1 >> NIP: c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000 >> REGS: c000000004937890 TRAP: 0300 Tainted: G W O (5.1.0-next-20190507-autotest) >> MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22424822 XER: 00000000 >> CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0 >> GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000 >> GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700 >> GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000 >> GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000 >> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460 >> GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80 >> GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0 >> NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0 >> LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0 >> Call Trace: >> [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable) >> [c000000001901a80] huge_pte_alloc+0x580/0x950 >> [c000000001cf7910] hugetlb_fault+0x9a0/0x1250 >> [c000000001c94a80] handle_mm_fault+0x490/0x4a0 >> [c0000000018d529c] __do_page_fault+0x77c/0x1f00 >> [c0000000018d6a48] do_page_fault+0x28/0x50 >> [c00000000183b0d4] handle_page_fault+0x18/0x38 >> >> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format") >> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> >> Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table >> allocation failure there. If we want to go to that old commit, then we may need. > > If we never checked for failure in that path, is there some reason we've > only just noticed the crashes? Are we just testing under memory pressure > more effectively than we used to? > Actually the reported crash seems to be due to commit 723f268f19 723f268f19 - powerpc/mm: cleanup ifdef mess in add_huge_page_size() Reverting this patch allows the test case to execute correctly without a crash. Thanks -Sachin
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index c5c9ff2d7afc..ae9d71da5219 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PUD_SHIFT; pu = pud_alloc(mm, pg, addr); + if (!pu) + return NULL; if (pshift == PUD_SHIFT) return (pte_t *)pu; else if (pshift > PMD_SHIFT) { @@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PMD_SHIFT; pm = pmd_alloc(mm, pu, addr); + if (!pm) + return NULL; if (pshift == PMD_SHIFT) /* 16MB hugepage */ return (pte_t *)pm; @@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PUD_SHIFT; pu = pud_alloc(mm, pg, addr); + if (!pu) + return NULL; if (pshift >= PUD_SHIFT) { ptl = pud_lockptr(mm, pu); hpdp = (hugepd_t *)pu; } else { pdshift = PMD_SHIFT; pm = pmd_alloc(mm, pu, addr); + if (!pm) + return NULL; ptl = pmd_lockptr(mm, pm); hpdp = (hugepd_t *)pm; }
This fix the below crash that arise due to not handling page table allocation failures while allocating hugetlb page table. BUG: Kernel NULL pointer dereference at 0x0000001c Faulting instruction address: 0xc000000001d1e58c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G W O 5.1.0-next-20190507-autotest #1 NIP: c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000 REGS: c000000004937890 TRAP: 0300 Tainted: G W O (5.1.0-next-20190507-autotest) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22424822 XER: 00000000 CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0 GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000 GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700 GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000 GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460 GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80 GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0 NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0 LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0 Call Trace: [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable) [c000000001901a80] huge_pte_alloc+0x580/0x950 [c000000001cf7910] hugetlb_fault+0x9a0/0x1250 [c000000001c94a80] handle_mm_fault+0x490/0x4a0 [c0000000018d529c] __do_page_fault+0x77c/0x1f00 [c0000000018d6a48] do_page_fault+0x28/0x50 [c00000000183b0d4] handle_page_fault+0x18/0x38 Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table allocation failure there. If we want to go to that old commit, then we may need. Fixes: a4fe3ce7699b ("powerpc/mm: Allow more flexible layouts for hugepage pagetables") arch/powerpc/mm/hugetlbpage.c | 8 ++++++++ 1 file changed, 8 insertions(+)