diff mbox series

[RFC,1/2] powerpc/64: remove system call instruction emulation

Message ID 20220118055821.3065241-2-npiggin@gmail.com (mailing list archive)
State RFC
Headers show
Series 2nd attempt at PR KVM + SCV and syscall | expand

Commit Message

Nicholas Piggin Jan. 18, 2022, 5:58 a.m. UTC
emulate_step instruction emulation including sc instruction emulation
initially appeared in xmon. It then emulation code was then moved into
sstep.c where kprobes could use it too, and later hw_breakpoint and
uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there was any upstream instructions or this was used by out of tree
modules. So system call emulation is not required by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. This type of system call emulation is broken
and it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the the interrupt
return code takes the kernel->kernel restore path, which does not restore
everything as the system call interrupt handler would expect coming from
userspace. regs->iamr appears to get lost for example, because the
kernel->kernel return does not restore the user iamr. Accounting such as
irqflags tracing and CPU accounting does not get flipped back to user
mode as the system call handler expects, so those appear to enter the
kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity to support this just for uprobes.

uprobes emulates the instruction rather than stepping for performance
reasons. System calls instructions should not be a significant source
of uprobes and already expensive, so skipping emulation should not be
a significant problem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/lib/sstep.c | 36 ------------------------------------
 1 file changed, 36 deletions(-)

Comments

Naveen N. Rao Jan. 19, 2022, 5:46 p.m. UTC | #1
Nicholas Piggin wrote:
> emulate_step instruction emulation including sc instruction emulation
> initially appeared in xmon. It then emulation code was then moved into
> sstep.c where kprobes could use it too, and later hw_breakpoint and
> uprobes started to use it.
> 
> Until uprobes, the only instruction emulation users were for kernel
> mode instructions.
> 
> - xmon only steps / breaks on kernel addresses.
> - kprobes is kernel only.
> - hw_breakpoint only emulates kernel instructions, single steps user.
> 
> At one point there was support for the kernel to execute sc
> instructions, although that is long removed and it's not clear whether
> there was any upstream instructions or this was used by out of tree
> modules. So system call emulation is not required by the above users.
> 
> uprobes uses emulate_step and it appears possible to emulate sc
> instruction in userspace. This type of system call emulation is broken
> and it's not clear it ever worked well.
> 
> The big complication is that userspace takes an interrupt to the kernel
> to emulate the instruction. The user->kernel interrupt sets up registers
> and interrupt stack frame expecting to return to userspace, then system
> call instruction emulation re-directs that stack frame to the kernel,
> early in the system call interrupt handler. This means the the interrupt
> return code takes the kernel->kernel restore path, which does not restore
> everything as the system call interrupt handler would expect coming from
> userspace. regs->iamr appears to get lost for example, because the
> kernel->kernel return does not restore the user iamr. Accounting such as
> irqflags tracing and CPU accounting does not get flipped back to user
> mode as the system call handler expects, so those appear to enter the
> kernel twice without returning to userspace.
> 
> These things may be individually fixable with various complication, but
> it is a big complexity to support this just for uprobes.
> 
> uprobes emulates the instruction rather than stepping for performance
> reasons. System calls instructions should not be a significant source
> of uprobes and already expensive, so skipping emulation should not be
> a significant problem.

I agree with that assessment, though I think we will need to disable 
probing on 'sc'/'scv' instructions. Per the ISA, section 6.5.15 on 
'Trace Interrupt', we can't single step a system call instruction:
    "Successful completion for an instruction means that the
    instruction caused no other interrupt and, if the thread
    is in Transactional state, did not cause the transaction
    to fail in such a way that the instruction did not com-
    plete (see Section 5.3.1 of Book II). Thus a Trace inter-
    rupt never occurs for a System Call or System Call
    Vectored instruction, or for a Trap instruction that traps,
    or for a dcbf that is executed in Transactional state.
    The instruction that causes a Trace interrupt is called
    the “traced instruction”."

I am not aware of any use case requiring probes on a system call 
instruction, so I think we can disable probing on system call 
instructions (patch further below).

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/lib/sstep.c | 36 ------------------------------------
>  1 file changed, 36 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index a94b0cd0bdc5..ee3bc45fb23b 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -15,9 +15,6 @@
>  #include <asm/cputable.h>
>  #include <asm/disassemble.h>
> 
> -extern char system_call_common[];
> -extern char system_call_vectored_emulate[];
> -
>  #ifdef CONFIG_PPC64
>  /* Bits in SRR1 that are copied from MSR */
>  #define MSR_MASK	0xffffffff87c0ffffUL
> @@ -3650,39 +3647,6 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
>  		goto instr_done;
> 
>  #ifdef CONFIG_PPC64
> -	case SYSCALL:	/* sc */
> -		/*
> -		 * N.B. this uses knowledge about how the syscall
> -		 * entry code works.  If that is changed, this will
> -		 * need to be changed also.
> -		 */
> -		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
> -				cpu_has_feature(CPU_FTR_REAL_LE) &&
> -				regs->gpr[0] == 0x1ebe) {
> -			regs_set_return_msr(regs, regs->msr ^ MSR_LE);
> -			goto instr_done;
> -		}
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_common);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	case SYSCALL_VECTORED_0:	/* scv 0 */
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -#endif
> -

Given that we should not be probing syscall instructions, I think it is 
better to return -1 for these two, similar to the RFI below. With that 
change, for this patch:
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

>  	case RFI:
>  		return -1;
>  #endif


Thanks,
Naveen

--
[PATCH] powerpc/uprobes: Reject uprobe on a system call instruction

Per the ISA, a Trace interrupt is not generated for a system call
[vectored] instruction. Reject uprobes on such instructions as we are
not emulating a system call [vectored] instruction anymore.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 1 +
 arch/powerpc/kernel/uprobes.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index efad07081cc0e5..fedf843bcdddeb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -411,6 +411,7 @@
 #define PPC_RAW_DCBFPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
 #define PPC_RAW_DCBSTPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
 #define PPC_RAW_SC()			(0x44000002)
+#define PPC_RAW_SCV()			(0x44000001)
 #define PPC_RAW_SYNC()			(0x7c0004ac)
 #define PPC_RAW_ISYNC()			(0x4c00012c)
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index c6975467d9ffdc..bedca31391d043 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
 	if (addr & 0x03)
 		return -EINVAL;
 
+	if (ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SC() ||
+	    ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SCV()) {
+		pr_info("Rejecting uprobe on system call instruction\n");
+		return -EINVAL;
+	}
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31) &&
 	    ppc_inst_prefixed(ppc_inst_read(auprobe->insn)) &&
 	    (addr & 0x3f) == 60) {

base-commit: 863a7c25c334ed368b4508fccae92d6bb61e71a4
diff mbox series

Patch

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index a94b0cd0bdc5..ee3bc45fb23b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -15,9 +15,6 @@ 
 #include <asm/cputable.h>
 #include <asm/disassemble.h>
 
-extern char system_call_common[];
-extern char system_call_vectored_emulate[];
-
 #ifdef CONFIG_PPC64
 /* Bits in SRR1 that are copied from MSR */
 #define MSR_MASK	0xffffffff87c0ffffUL
@@ -3650,39 +3647,6 @@  int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
 		goto instr_done;
 
 #ifdef CONFIG_PPC64
-	case SYSCALL:	/* sc */
-		/*
-		 * N.B. this uses knowledge about how the syscall
-		 * entry code works.  If that is changed, this will
-		 * need to be changed also.
-		 */
-		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
-				cpu_has_feature(CPU_FTR_REAL_LE) &&
-				regs->gpr[0] == 0x1ebe) {
-			regs_set_return_msr(regs, regs->msr ^ MSR_LE);
-			goto instr_done;
-		}
-		regs->gpr[9] = regs->gpr[13];
-		regs->gpr[10] = MSR_KERNEL;
-		regs->gpr[11] = regs->nip + 4;
-		regs->gpr[12] = regs->msr & MSR_MASK;
-		regs->gpr[13] = (unsigned long) get_paca();
-		regs_set_return_ip(regs, (unsigned long) &system_call_common);
-		regs_set_return_msr(regs, MSR_KERNEL);
-		return 1;
-
-#ifdef CONFIG_PPC_BOOK3S_64
-	case SYSCALL_VECTORED_0:	/* scv 0 */
-		regs->gpr[9] = regs->gpr[13];
-		regs->gpr[10] = MSR_KERNEL;
-		regs->gpr[11] = regs->nip + 4;
-		regs->gpr[12] = regs->msr & MSR_MASK;
-		regs->gpr[13] = (unsigned long) get_paca();
-		regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
-		regs_set_return_msr(regs, MSR_KERNEL);
-		return 1;
-#endif
-
 	case RFI:
 		return -1;
 #endif