diff mbox series

powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE

Message ID aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu (mailing list archive)
State Accepted
Headers show
Series powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu fail boot (ppc64le_guest_defconfig, pseries+p8+tcg, pseries+p9+tcg, qemu-system-ppc64, ppc64le-rootfs.... failed at step Run qemu-pseries+p8+tcg with fedora-34 build kernel.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 7 jobs.

Commit Message

Christophe Leroy Dec. 7, 2021, 6:10 a.m. UTC
Allthough kernel text is always mapped with BATs, we still have
inittext mapped with pages, so TLB miss handling is required
when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.

The final solution should be to set a BAT that also maps inittext
but that BAT then needs to be cleared at end of init, and it will
require more changes to be able to do it properly.

As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
deal so let's fix it simply for now to enable easy stable application.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Maxime Bizon Dec. 7, 2021, 10:34 a.m. UTC | #1
On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote:

Hello,

With the patch applied and

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_VM=y

I get tons of this during boot:

[    0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
[    0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442
[    0.000000] NIP:  80015ebc LR: 80016728 CTR: 800166e4
[    0.000000] REGS: 80751dd0 TRAP: 0700   Not tainted  (5.15.0+)
[    0.000000] MSR:  00021032 <ME,IR,DR,RI>  CR: 42228882  XER: 20000000
[    0.000000] 
[    0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000 
[    0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c 
[    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000 
[    0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180 
[    0.000000] NIP [80015ebc] set_pte_at+0x18/0x160
[    0.000000] LR [80016728] set_page_attr+0x44/0xc0
[    0.000000] Call Trace:
[    0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable)
[    0.000000] [80751ea0] [00000000] 0x0
[    0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8
[    0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64
[    0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440
[    0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274
[    0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0
[    0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4
[    0.000000] [80751ff0] [000033f0] 0x33f0
[    0.000000] Instruction dump:
[    0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c 
[    0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024
Christophe Leroy Dec. 7, 2021, 5:49 p.m. UTC | #2
Le 07/12/2021 à 11:34, Maxime Bizon a écrit :
> 
> On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote:
> 
> Hello,
> 
> With the patch applied and
> 
> CONFIG_DEBUG_PAGEALLOC=y
> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
> CONFIG_DEBUG_VM=y
> 
> I get tons of this during boot:
> 
> [    0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442
> [    0.000000] NIP:  80015ebc LR: 80016728 CTR: 800166e4
> [    0.000000] REGS: 80751dd0 TRAP: 0700   Not tainted  (5.15.0+)
> [    0.000000] MSR:  00021032 <ME,IR,DR,RI>  CR: 42228882  XER: 20000000
> [    0.000000]
> [    0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000
> [    0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c
> [    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000
> [    0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180
> [    0.000000] NIP [80015ebc] set_pte_at+0x18/0x160
> [    0.000000] LR [80016728] set_page_attr+0x44/0xc0
> [    0.000000] Call Trace:
> [    0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable)
> [    0.000000] [80751ea0] [00000000] 0x0
> [    0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8
> [    0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64
> [    0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440
> [    0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274
> [    0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0
> [    0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4
> [    0.000000] [80751ff0] [000033f0] 0x33f0
> [    0.000000] Instruction dump:
> [    0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c
> [    0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024
> 
> 

That's unrelated to this patch.

The problem is linked to patch c988cfd38e48 ("powerpc/32: use 
set_memory_attr()"), which changed from using __set_pte_at() to using 
set_memory_attr() which uses set_pte_at().

set_pte_at() has additional checks and shall not be used to updating an 
existing PTE.

Wondering if I should just use __set_pte_at() instead like in the past, 
or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against 
concurrent accesses") and use pte_update()

Michael, Aneesh, any suggestion ?

Thanks
Christophe
Michael Ellerman Dec. 9, 2021, 6:08 a.m. UTC | #3
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 07/12/2021 à 11:34, Maxime Bizon a écrit :
>> 
>> On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote:
>> 
>> Hello,
>> 
>> With the patch applied and
>> 
>> CONFIG_DEBUG_PAGEALLOC=y
>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
>> CONFIG_DEBUG_VM=y
>> 
>> I get tons of this during boot:
>> 
>> [    0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
>> [    0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442
>> [    0.000000] NIP:  80015ebc LR: 80016728 CTR: 800166e4
>> [    0.000000] REGS: 80751dd0 TRAP: 0700   Not tainted  (5.15.0+)
>> [    0.000000] MSR:  00021032 <ME,IR,DR,RI>  CR: 42228882  XER: 20000000
>> [    0.000000]
>> [    0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000
>> [    0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c
>> [    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000
>> [    0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180
>> [    0.000000] NIP [80015ebc] set_pte_at+0x18/0x160
>> [    0.000000] LR [80016728] set_page_attr+0x44/0xc0
>> [    0.000000] Call Trace:
>> [    0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable)
>> [    0.000000] [80751ea0] [00000000] 0x0
>> [    0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8
>> [    0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64
>> [    0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440
>> [    0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274
>> [    0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0
>> [    0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4
>> [    0.000000] [80751ff0] [000033f0] 0x33f0
>> [    0.000000] Instruction dump:
>> [    0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c
>> [    0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024
>> 
>> 
>
> That's unrelated to this patch.
>
> The problem is linked to patch c988cfd38e48 ("powerpc/32: use 
> set_memory_attr()"), which changed from using __set_pte_at() to using 
> set_memory_attr() which uses set_pte_at().
>
> set_pte_at() has additional checks and shall not be used to updating an 
> existing PTE.
>
> Wondering if I should just use __set_pte_at() instead like in the past, 
> or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against 
> concurrent accesses") and use pte_update()
>
> Michael, Aneesh, any suggestion ?

The motivation for using pte_update() in that commit is that it does the
update atomically and also handles flushing the HPTE for 64-bit Hash.

But the books/32 version of pte_update() doesn't do that. In fact
there's some HPTE handling in __set_pte_at(), but then also a comment
saying it's handling in a subsequent flush_tlb_xxx().

So that doesn't really help make a decision :)

On the other hand, could you convert those set_memory_attr() calls to
change_memory_attr() and then eventually drop the former?

cheers
Christophe Leroy Dec. 9, 2021, 5:26 p.m. UTC | #4
Le 09/12/2021 à 07:08, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 07/12/2021 à 11:34, Maxime Bizon a écrit :
>>>
>>> On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote:
>>>
>>> Hello,
>>>
>>> With the patch applied and
>>>
>>> CONFIG_DEBUG_PAGEALLOC=y
>>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
>>> CONFIG_DEBUG_VM=y
>>>
>>> I get tons of this during boot:
>>>
>>> [    0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear)
>>> [    0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>>> [    0.000000] ------------[ cut here ]------------
>>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442
>>> [    0.000000] NIP:  80015ebc LR: 80016728 CTR: 800166e4
>>> [    0.000000] REGS: 80751dd0 TRAP: 0700   Not tainted  (5.15.0+)
>>> [    0.000000] MSR:  00021032 <ME,IR,DR,RI>  CR: 42228882  XER: 20000000
>>> [    0.000000]
>>> [    0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000
>>> [    0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c
>>> [    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000
>>> [    0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180
>>> [    0.000000] NIP [80015ebc] set_pte_at+0x18/0x160
>>> [    0.000000] LR [80016728] set_page_attr+0x44/0xc0
>>> [    0.000000] Call Trace:
>>> [    0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable)
>>> [    0.000000] [80751ea0] [00000000] 0x0
>>> [    0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8
>>> [    0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64
>>> [    0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440
>>> [    0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274
>>> [    0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0
>>> [    0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4
>>> [    0.000000] [80751ff0] [000033f0] 0x33f0
>>> [    0.000000] Instruction dump:
>>> [    0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c
>>> [    0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024
>>>
>>>
>>
>> That's unrelated to this patch.
>>
>> The problem is linked to patch c988cfd38e48 ("powerpc/32: use
>> set_memory_attr()"), which changed from using __set_pte_at() to using
>> set_memory_attr() which uses set_pte_at().
>>
>> set_pte_at() has additional checks and shall not be used to updating an
>> existing PTE.
>>
>> Wondering if I should just use __set_pte_at() instead like in the past,
>> or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
>> concurrent accesses") and use pte_update()
>>
>> Michael, Aneesh, any suggestion ?
> 
> The motivation for using pte_update() in that commit is that it does the
> update atomically and also handles flushing the HPTE for 64-bit Hash.
> 
> But the books/32 version of pte_update() doesn't do that. In fact
> there's some HPTE handling in __set_pte_at(), but then also a comment
> saying it's handling in a subsequent flush_tlb_xxx().
> 
> So that doesn't really help make a decision :)
> 
> On the other hand, could you convert those set_memory_attr() calls to
> change_memory_attr() and then eventually drop the former?

Sure, that's probably the best.

Initially I had to implement that set_memory_attr() variant because 
change_memory_attr() was doing a pte_clear() that was "sawing off the 
branch we're sitting on". In extenso mark_rodata_ro() couldn't use 
change_memory_attr() to change the text section to read-only because 
mark_rodata_ro() is itself in the text section.

But now that change_memory_attr() is using pte_update() instead of going 
via a pte_clear(), it's possible to use it, so that's what I'll do.

Thanks for the idea.
Christophe
Christophe Leroy Jan. 19, 2022, 11:19 a.m. UTC | #5
Michael, ping.

Le 07/12/2021 à 07:10, Christophe Leroy a écrit :
> Allthough kernel text is always mapped with BATs, we still have
> inittext mapped with pages, so TLB miss handling is required
> when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.
> 
> The final solution should be to set a BAT that also maps inittext
> but that BAT then needs to be cleared at end of init, and it will
> require more changes to be able to do it properly.
> 
> As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
> deal so let's fix it simply for now to enable easy stable application.
> 
> Reported-by: Maxime Bizon <mbizon@freebox.fr>
> Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs")
> Cc: stable@vger.kernel.org
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>   arch/powerpc/kernel/head_book3s_32.S | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
> index 68e5c0a7e99d..2e2a8211b17b 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -421,14 +421,14 @@ InstructionTLBMiss:
>    */
>   	/* Get PTE (linux-style) and check access */
>   	mfspr	r3,SPRN_IMISS
> -#ifdef CONFIG_MODULES
> +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
>   	lis	r1, TASK_SIZE@h		/* check if kernel address */
>   	cmplw	0,r1,r3
>   #endif
>   	mfspr	r2, SPRN_SDR1
>   	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
>   	rlwinm	r2, r2, 28, 0xfffff000
> -#ifdef CONFIG_MODULES
> +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
>   	bgt-	112f
>   	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
>   	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
Michael Ellerman Feb. 18, 2022, 2:23 a.m. UTC | #6
On Tue, 7 Dec 2021 06:10:05 +0000, Christophe Leroy wrote:
> Allthough kernel text is always mapped with BATs, we still have
> inittext mapped with pages, so TLB miss handling is required
> when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.
> 
> The final solution should be to set a BAT that also maps inittext
> but that BAT then needs to be cleared at end of init, and it will
> require more changes to be able to do it properly.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
      https://git.kernel.org/powerpc/c/9bb162fa26ed76031ed0e7dbc77ccea0bf977758

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 68e5c0a7e99d..2e2a8211b17b 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -421,14 +421,14 @@  InstructionTLBMiss:
  */
 	/* Get PTE (linux-style) and check access */
 	mfspr	r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
 	lis	r1, TASK_SIZE@h		/* check if kernel address */
 	cmplw	0,r1,r3
 #endif
 	mfspr	r2, SPRN_SDR1
 	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
-#ifdef CONFIG_MODULES
+#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
 	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC