diff mbox series

powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry

Message ID 20240125114228.353257-1-naveen@kernel.org (mailing list archive)
State Superseded
Headers show
Series powerpc/64: Set LR to a non-NULL value in task pt_regs on scv entry | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.

Commit Message

Naveen N Rao Jan. 25, 2024, 11:42 a.m. UTC
Nysal reported that userspace backtraces are missing in offcputime bcc
tool. As an example:
    $ sudo ./bcc/tools/offcputime.py -uU
    Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.

    ^C
	write
	-                python (9107)
	    8

	write
	-                sudo (9105)
	    9

	mmap
	-                python (9107)
	    16

	clock_nanosleep
	-                multipathd (697)
	    3001604

The offcputime bcc tool attaches a bpf program to a kprobe on
finish_task_switch(), which is usually hit on a syscall from userspace.
With the switch to system call vectored, we zero out LR value in user
pt_regs on syscall entry. BPF uses perf callchain infrastructure for
capturing stack traces, and this stores LR as the second entry in the
stack trace. Since this is NULL, userspace unwinders assume that there
are no further entries resulting in a truncated userspace stack trace.

Rather than fixing all userspace unwinders to ignore/skip past the
second entry, store NIP as LR so that there continues to be a valid,
though duplicate entry.

With this change:
    $ sudo ./bcc/tools/offcputime.py -uU
    Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.

    ^C
	write
	write
	[unknown]
	[unknown]
	[unknown]
	[unknown]
	[unknown]
	PyObject_VectorcallMethod
	[unknown]
	[unknown]
	PyObject_CallOneArg
	PyFile_WriteObject
	PyFile_WriteString
	[unknown]
	[unknown]
	PyObject_Vectorcall
	_PyEval_EvalFrameDefault
	PyEval_EvalCode
	[unknown]
	[unknown]
	[unknown]
	_PyRun_SimpleFileObject
	_PyRun_AnyFileObject
	Py_RunMain
	[unknown]
	Py_BytesMain
	[unknown]
	__libc_start_main
	-                python (1293)
	    7

	write
	write
	[unknown]
	sudo_ev_loop_v1
	sudo_ev_dispatch_v1
	[unknown]
	[unknown]
	[unknown]
	[unknown]
	__libc_start_main
	-                sudo (1291)
	    7

	syscall
	syscall
	bpf_open_perf_buffer_opts
	[unknown]
	[unknown]
	[unknown]
	[unknown]
	_PyObject_MakeTpCall
	PyObject_Vectorcall
	_PyEval_EvalFrameDefault
	PyEval_EvalCode
	[unknown]
	[unknown]
	[unknown]
	_PyRun_SimpleFileObject
	_PyRun_AnyFileObject
	Py_RunMain
	[unknown]
	Py_BytesMain
	[unknown]
	__libc_start_main
	-                python (1293)
	    11

	clock_nanosleep
	clock_nanosleep
	nanosleep
	sleep
	[unknown]
	[unknown]
	__clone
	-                multipathd (698)
	    3001661

Reported-by: Nysal Jan K.A <nysal@linux.ibm.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
---
 arch/powerpc/kernel/interrupt_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 414e92af226ede4935509b0b5e041810c92e003f

Comments

Segher Boessenkool Jan. 25, 2024, 2:24 p.m. UTC | #1
Hi!

On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> index bd863702d812..5cf3758a19d3 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
>  	ld	r1,PACAKSAVE(r13)
>  	std	r10,0(r1)
>  	std	r11,_NIP(r1)
> +	std	r11,_LINK(r1)

Please add a comment here then, saying what the store is for?


Segher
Michael Ellerman Feb. 2, 2024, 2:02 a.m. UTC | #2
Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
>> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
>> index bd863702d812..5cf3758a19d3 100644
>> --- a/arch/powerpc/kernel/interrupt_64.S
>> +++ b/arch/powerpc/kernel/interrupt_64.S
>> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
>>  	ld	r1,PACAKSAVE(r13)
>>  	std	r10,0(r1)
>>  	std	r11,_NIP(r1)
>> +	std	r11,_LINK(r1)
>
> Please add a comment here then, saying what the store is for?

Yeah a comment would be good. 

Also the r11 value comes from LR, so it's not that we're storing the NIP
value into the LR slot, rather the value we store in NIP is from LR, see:

EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
	/* SCV 0 */
	mr	r9,r13
	GET_PACA(r13)
	mflr	r11
        ...
	b	system_call_vectored_common

That's slightly pedantic, but I think it answers the question of why
it's OK to use the same value for NIP & LR, or why we don't have to do
mflr in system_call_vectored_common to get the actual LR value.

cheers
Naveen N Rao Feb. 2, 2024, 2:15 p.m. UTC | #3
On Fri, Feb 02, 2024 at 01:02:39PM +1100, Michael Ellerman wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > Hi!
> >
> > On Thu, Jan 25, 2024 at 05:12:28PM +0530, Naveen N Rao wrote:
> >> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> >> index bd863702d812..5cf3758a19d3 100644
> >> --- a/arch/powerpc/kernel/interrupt_64.S
> >> +++ b/arch/powerpc/kernel/interrupt_64.S
> >> @@ -53,6 +53,7 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
> >>  	ld	r1,PACAKSAVE(r13)
> >>  	std	r10,0(r1)
> >>  	std	r11,_NIP(r1)
> >> +	std	r11,_LINK(r1)
> >
> > Please add a comment here then, saying what the store is for?
> 
> Yeah a comment would be good. 
> 
> Also the r11 value comes from LR, so it's not that we're storing the NIP
> value into the LR slot, rather the value we store in NIP is from LR, see:
> 
> EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000)
> 	/* SCV 0 */
> 	mr	r9,r13
> 	GET_PACA(r13)
> 	mflr	r11
>         ...
> 	b	system_call_vectored_common
> 
> That's slightly pedantic, but I think it answers the question of why
> it's OK to use the same value for NIP & LR, or why we don't have to do
> mflr in system_call_vectored_common to get the actual LR value.

Thanks for clarifying that. I should have done a better job describing 
that in the commit log. I'll update that, add a comment here and send a 
v2.


- Naveen
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index bd863702d812..5cf3758a19d3 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -53,6 +53,7 @@  _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
 	ld	r1,PACAKSAVE(r13)
 	std	r10,0(r1)
 	std	r11,_NIP(r1)
+	std	r11,_LINK(r1)
 	std	r12,_MSR(r1)
 	std	r0,GPR0(r1)
 	std	r10,GPR1(r1)
@@ -70,7 +71,6 @@  _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
 	std	r9,GPR13(r1)
 	SAVE_NVGPRS(r1)
 	std	r11,_XER(r1)
-	std	r11,_LINK(r1)
 	std	r11,_CTR(r1)
 
 	li	r11,\trapnr