diff mbox series

[v2,1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state

Message ID 20210901165418.1412891-1-npiggin@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2,1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state | expand
Related show

Commit Message

Nicholas Piggin Sept. 1, 2021, 4:54 p.m. UTC
If a system call is made with a transaction active, the kernel
immediately aborts it and returns. scv system calls disable irqs even
earlier in their interrupt handler, and tabort_syscall does not fix this
up.

This can result in irq soft-mask state being messed up on the next
kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
the kernel exit handlers, or possibly worse.

Fix this by having tabort_syscall setting irq soft-mask back to enabled
(which requires MSR[EE] be disabled first).

Reported-by: Eirik Fuller <efuller@redhat.com>
Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

Tested the wrong kernel before sending v1 and missed a bug, sorry.

 arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Christophe Leroy Sept. 1, 2021, 5:21 p.m. UTC | #1
Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
> If a system call is made with a transaction active, the kernel
> immediately aborts it and returns. scv system calls disable irqs even
> earlier in their interrupt handler, and tabort_syscall does not fix this
> up.
> 
> This can result in irq soft-mask state being messed up on the next
> kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
> the kernel exit handlers, or possibly worse.
> 
> Fix this by having tabort_syscall setting irq soft-mask back to enabled
> (which requires MSR[EE] be disabled first).
> 
> Reported-by: Eirik Fuller <efuller@redhat.com>
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> Tested the wrong kernel before sending v1 and missed a bug, sorry.
> 
>   arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> index d4212d2ff0b5..9c31d65b4851 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -428,16 +428,22 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   tabort_syscall:
>   _ASM_NOKPROBE_SYMBOL(tabort_syscall)
> -	/* Firstly we need to enable TM in the kernel */
> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>   	mfmsr	r10
>   	li	r9, 1
>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> +	andc	r10, r10, r9

Why not use 'rlwinm' to mask out MSR_EE ?

Something like

	rlwinm	r10, r10, 0, ~MSR_EE

>   	mtmsrd	r10, 0
>   
>   	/* tabort, this dooms the transaction, nothing else */
>   	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
>   	TABORT(R9)
>   
> +	/* scv has disabled irqs so must re-enable. sc just remains enabled */
> +	li	r9,IRQS_ENABLED
> +	stb	r9,PACAIRQSOFTMASK(r13)
> +
>   	/*
>   	 * Return directly to userspace. We have corrupted user register state,
>   	 * but userspace will never see that register state. Execution will
>
Nicholas Piggin Sept. 2, 2021, 3:33 a.m. UTC | #2
Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> 
> 
> Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
>> If a system call is made with a transaction active, the kernel
>> immediately aborts it and returns. scv system calls disable irqs even
>> earlier in their interrupt handler, and tabort_syscall does not fix this
>> up.
>> 
>> This can result in irq soft-mask state being messed up on the next
>> kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
>> the kernel exit handlers, or possibly worse.
>> 
>> Fix this by having tabort_syscall setting irq soft-mask back to enabled
>> (which requires MSR[EE] be disabled first).
>> 
>> Reported-by: Eirik Fuller <efuller@redhat.com>
>> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>> 
>> Tested the wrong kernel before sending v1 and missed a bug, sorry.
>> 
>>   arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
>> index d4212d2ff0b5..9c31d65b4851 100644
>> --- a/arch/powerpc/kernel/interrupt_64.S
>> +++ b/arch/powerpc/kernel/interrupt_64.S
>> @@ -428,16 +428,22 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>   tabort_syscall:
>>   _ASM_NOKPROBE_SYMBOL(tabort_syscall)
>> -	/* Firstly we need to enable TM in the kernel */
>> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>>   	mfmsr	r10
>>   	li	r9, 1
>>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
>> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
>> +	andc	r10, r10, r9
> 
> Why not use 'rlwinm' to mask out MSR_EE ?
> 
> Something like
> 
> 	rlwinm	r10, r10, 0, ~MSR_EE

Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
to change as much as possible to C?

Actually there should really be no need for mfmsr either, I wanted to
rewrite the thing entirely as

	ld      r10,PACAKMSR(r13)
	LOAD_REG_IMMEDIATE(r9, MSR_TM)
	or	r10,r10,r9
	mtmsrd	r10

But I thought that's not a minimal bug fix.

Thanks,
Nick
> 
>>   	mtmsrd	r10, 0
>>   
>>   	/* tabort, this dooms the transaction, nothing else */
>>   	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
>>   	TABORT(R9)
>>   
>> +	/* scv has disabled irqs so must re-enable. sc just remains enabled */
>> +	li	r9,IRQS_ENABLED
>> +	stb	r9,PACAIRQSOFTMASK(r13)
>> +
>>   	/*
>>   	 * Return directly to userspace. We have corrupted user register state,
>>   	 * but userspace will never see that register state. Execution will
>> 
>
Segher Boessenkool Sept. 2, 2021, 9:52 p.m. UTC | #3
On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
> Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> >> -	/* Firstly we need to enable TM in the kernel */
> >> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
> >>   	mfmsr	r10
> >>   	li	r9, 1
> >>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> >> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> >> +	andc	r10, r10, r9
> > 
> > Why not use 'rlwinm' to mask out MSR_EE ?
> > 
> > Something like
> > 
> > 	rlwinm	r10, r10, 0, ~MSR_EE
> 
> Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
> to change as much as possible to C?

The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
with rlwinm (only the low 32 bits can).  There are many ways to do it
using two insns of course.

> Actually there should really be no need for mfmsr either, I wanted to
> rewrite the thing entirely as
> 
> 	ld      r10,PACAKMSR(r13)
> 	LOAD_REG_IMMEDIATE(r9, MSR_TM)
> 	or	r10,r10,r9
> 	mtmsrd	r10
> 
> But I thought that's not a minimal bug fix.

That (LOAD_REG_IMMEDIATE+or) can be done with just two insns, not three
as written:
  li r0,1
  rldimi r10,r0,32,31
(you can write that last insns as
  rldimi r10,r0,MSR_TM_LG,MSR_TM_LG-1
if you insist :-) )

It isn't like a few integer computational insns will kill you here of
course, there are much more important cycles to shave off :-)


Segher
Segher Boessenkool Sept. 2, 2021, 10:20 p.m. UTC | #4
On Thu, Sep 02, 2021 at 04:52:03PM -0500, Segher Boessenkool wrote:
> On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
> > Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> > >> -	/* Firstly we need to enable TM in the kernel */
> > >> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
> > >>   	mfmsr	r10
> > >>   	li	r9, 1
> > >>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> > >> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> > >> +	andc	r10, r10, r9
> > > 
> > > Why not use 'rlwinm' to mask out MSR_EE ?
> > > 
> > > Something like
> > > 
> > > 	rlwinm	r10, r10, 0, ~MSR_EE
> > 
> > Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
> > to change as much as possible to C?
> 
> The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
> with rlwinm (only the low 32 bits can).  There are many ways to do it
> using two insns of course.

Wow I misread that, you want to clear MSR[EE] really, not MSR[TM].

You cannot use rlwinm and keep the high 32 bits of the target register
intact.  You either clear all to 0 or set them to a copy of the rotated
value in the low 32 bits.


Segher
Christophe Leroy Sept. 3, 2021, 5:04 a.m. UTC | #5
Le 03/09/2021 à 00:20, Segher Boessenkool a écrit :
> On Thu, Sep 02, 2021 at 04:52:03PM -0500, Segher Boessenkool wrote:
>> On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
>>> Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
>>>>> -	/* Firstly we need to enable TM in the kernel */
>>>>> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>>>>>    	mfmsr	r10
>>>>>    	li	r9, 1
>>>>>    	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
>>>>> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
>>>>> +	andc	r10, r10, r9
>>>>
>>>> Why not use 'rlwinm' to mask out MSR_EE ?
>>>>
>>>> Something like
>>>>
>>>> 	rlwinm	r10, r10, 0, ~MSR_EE
>>>
>>> Mainly because I'm bad at powerpc assembly. Why do you think I'm trying
>>> to change as much as possible to C?
>>
>> The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
>> with rlwinm (only the low 32 bits can).  There are many ways to do it
>> using two insns of course.
> 
> Wow I misread that, you want to clear MSR[EE] really, not MSR[TM].
> 
> You cannot use rlwinm and keep the high 32 bits of the target register
> intact.  You either clear all to 0 or set them to a copy of the rotated
> value in the low 32 bits.
> 

Oops, my mistake. When I tested it in C to see what was generated by GCC I forgot the ~ so I got 
rlwinm r3,r3,0,16,16 and didn't realise it was different from rlwinm r3,r3,0,~(1<<15)

By the way it would be more explicit if objdump could display the mask instead of the mask 
boundaries. Is there a way to do that ?

Christophe
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index d4212d2ff0b5..9c31d65b4851 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -428,16 +428,22 @@  RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 tabort_syscall:
 _ASM_NOKPROBE_SYMBOL(tabort_syscall)
-	/* Firstly we need to enable TM in the kernel */
+	/* We need to enable TM in the kernel, and disable EE (for scv) */
 	mfmsr	r10
 	li	r9, 1
 	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
+	LOAD_REG_IMMEDIATE(r9, MSR_EE)
+	andc	r10, r10, r9
 	mtmsrd	r10, 0
 
 	/* tabort, this dooms the transaction, nothing else */
 	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
 	TABORT(R9)
 
+	/* scv has disabled irqs so must re-enable. sc just remains enabled */
+	li	r9,IRQS_ENABLED
+	stb	r9,PACAIRQSOFTMASK(r13)
+
 	/*
 	 * Return directly to userspace. We have corrupted user register state,
 	 * but userspace will never see that register state. Execution will