diff mbox series

[RFC,1/1] powerpc/ftrace: Exclude real mode code from

Message ID ecbf7e8d6895a7d56b4eb60cbf7461bd463086b3.1520435958.git.naveen.n.rao@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show
Series Exclude real mode code from ftrace | expand

Commit Message

Naveen N. Rao March 7, 2018, 4:46 p.m. UTC
We can't take a trap in most parts of real mode code. Instead of adding
the 'notrace' annotation to all C functions that can be invoked from
real mode, detect that we are in real mode on ftrace entry and return
back.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
This RFC only handles -mprofile-kernel to demonstrate the approach being 
considered. We will need to handle other ftrace entry if we decide to 
continue down this path.

- Naveen


 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Steven Rostedt March 7, 2018, 5:45 p.m. UTC | #1
On Wed,  7 Mar 2018 22:16:19 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> We can't take a trap in most parts of real mode code. Instead of adding
> the 'notrace' annotation to all C functions that can be invoked from
> real mode, detect that we are in real mode on ftrace entry and return
> back.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
> This RFC only handles -mprofile-kernel to demonstrate the approach being 
> considered. We will need to handle other ftrace entry if we decide to 
> continue down this path.

I do prefer this trade off.


> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> index 3f3e81852422..ecc0e8e38ead 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> @@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
>  
>  	/* Load special regs for save below */
>  	mfmsr   r8
> +
> +	/* Only proceed if we are not in real mode and can take interrupts */
> +	andi.	r9, r8, MSR_IR|MSR_DR|MSR_RI
> +	cmpdi	r9, MSR_IR|MSR_DR|MSR_RI
> +	beq	1f

OK, I assume this check and branch is negligible compared to the mfmsr
call?

-- Steve


> +	mflr	r8
> +	mtctr	r8
> +	REST_GPR(9, r1)
> +	REST_GPR(8, r1)
> +	addi	r1, r1, SWITCH_FRAME_SIZE
> +	ld	r0, LRSAVE(r1)
> +	mtlr	r0
> +	bctr
> +
> +1:
>  	mfctr   r9
>  	mfxer   r10
>  	mfcr	r11
Naveen N. Rao March 7, 2018, 6:37 p.m. UTC | #2
Hi Steve,

Steven Rostedt wrote:
> On Wed,  7 Mar 2018 22:16:19 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> We can't take a trap in most parts of real mode code. Instead of adding
>> the 'notrace' annotation to all C functions that can be invoked from
>> real mode, detect that we are in real mode on ftrace entry and return
>> back.
>> 
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>> This RFC only handles -mprofile-kernel to demonstrate the approach being 
>> considered. We will need to handle other ftrace entry if we decide to 
>> continue down this path.
> 
> I do prefer this trade off.

Great, thanks!

> 
> 
>> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> index 3f3e81852422..ecc0e8e38ead 100644
>> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
>> @@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
>>  
>>  	/* Load special regs for save below */
>>  	mfmsr   r8
>> +
>> +	/* Only proceed if we are not in real mode and can take interrupts */
>> +	andi.	r9, r8, MSR_IR|MSR_DR|MSR_RI
>> +	cmpdi	r9, MSR_IR|MSR_DR|MSR_RI
>> +	beq	1f
> 
> OK, I assume this check and branch is negligible compared to the mfmsr
> call?

Yes, that's negligible.
Though, to be honest, I will have to introduce a 'mfmsr' for the older 
-pg variant. I still think that the improved reliability far outweighs 
the minor slowdown there.

- Naveen
Steven Rostedt March 7, 2018, 7:02 p.m. UTC | #3
On Thu, 08 Mar 2018 00:07:07 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:

> Yes, that's negligible.
> Though, to be honest, I will have to introduce a 'mfmsr' for the older 
> -pg variant. I still think that the improved reliability far outweighs 
> the minor slowdown there.

In that case, can you introduce a read_mostly variable that can be
tested before calling the mfmsr. Why punish normal ftrace tracing if
kvm is not enabled or running?

Both should probably have an #ifdef CONFIG_KVM encapsulating the code.

-- Steve
Michael Ellerman March 8, 2018, 3:03 a.m. UTC | #4
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> We can't take a trap in most parts of real mode code. Instead of adding
> the 'notrace' annotation to all C functions that can be invoked from
> real mode, detect that we are in real mode on ftrace entry and return
> back.
>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
> This RFC only handles -mprofile-kernel to demonstrate the approach being 
> considered. We will need to handle other ftrace entry if we decide to 
> continue down this path.

Paul and I were talking about having a paca flag for this, ie.
paca->safe_to_ftrace (or whatever). I'm not sure if you've talked to
him and decided this is a better approach.

I guess I'm 50/50 on which is better, they both have pluses and minuses.

cheers

> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> index 3f3e81852422..ecc0e8e38ead 100644
> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
> @@ -56,6 +56,21 @@ _GLOBAL(ftrace_caller)
>  
>  	/* Load special regs for save below */
>  	mfmsr   r8
> +
> +	/* Only proceed if we are not in real mode and can take interrupts */
> +	andi.	r9, r8, MSR_IR|MSR_DR|MSR_RI
> +	cmpdi	r9, MSR_IR|MSR_DR|MSR_RI
> +	beq	1f
> +	mflr	r8
> +	mtctr	r8
> +	REST_GPR(9, r1)
> +	REST_GPR(8, r1)
> +	addi	r1, r1, SWITCH_FRAME_SIZE
> +	ld	r0, LRSAVE(r1)
> +	mtlr	r0
> +	bctr
> +
> +1:
>  	mfctr   r9
>  	mfxer   r10
>  	mfcr	r11
> -- 
> 2.16.1
Naveen N. Rao March 9, 2018, 8:15 a.m. UTC | #5
Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> 
>> We can't take a trap in most parts of real mode code. Instead of adding
>> the 'notrace' annotation to all C functions that can be invoked from
>> real mode, detect that we are in real mode on ftrace entry and return
>> back.
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>> This RFC only handles -mprofile-kernel to demonstrate the approach being 
>> considered. We will need to handle other ftrace entry if we decide to 
>> continue down this path.
> 
> Paul and I were talking about having a paca flag for this, ie.
> paca->safe_to_ftrace (or whatever). I'm not sure if you've talked to
> him and decided this is a better approach.
> 
> I guess I'm 50/50 on which is better, they both have pluses and minuses.

Thanks, I hadn't spoken to Paul, but I now think that this is probably 
the better approach to take.

My earlier assumption was that we have other scenarios when we are in 
realmode (specifically with MSR_RI unset) where we won't be able to 
recover from a trap, during function tracing (*). I did a set of 
experiments yesterday to verify that, but I was not able to uncover any 
such scenarios with my brief testing. So, we seem to be functioning just 
fine while tracing realmode C code, except for KVM.

As such, rather than blacklisting all realmode code, I think it is 
better to be selective and just disable the tracer for KVM since we know 
we can't take a trap there. We will be able to use the same approach if 
we uncover additional scenarios where we can't use function tracing. I 
will look at implementing a paca field for this purpose.

I also noticed that even with an unexpected timebase, we still seem to 
recover just fine with a simple change:

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2629,8 +2629,8 @@ static noinline void
 rb_handle_timestamp(struct ring_buffer_per_cpu *cpu_buffer,
                    struct rb_event_info *info)
 {
-       WARN_ONCE(info->delta > (1ULL << 59),
-                 KERN_WARNING "Delta way too big! %llu ts=%llu write stamp = %llu\n%s",
+       if (info->delta > (1ULL << 59))
+               pr_warn_once("Delta way too big! %llu ts=%llu write stamp = %llu\n%s",
                  (unsigned long long)info->delta,
                  (unsigned long long)info->ts,
                  (unsigned long long)cpu_buffer->write_stamp,


This allowed the virtual machine to boot and we were able to trace the 
rest of KVM C code. I only just did a boot test, so I'm not sure if 
there are other scenarios where things can go wrong.

Steve,
Would you be willing to accept a patch like the above? Since we seem to 
handle the larger delta just fine, I think the above change should be 
fine?

I will still work on excluding KVM C code from being traced, but the 
advantage with the above patch is that we will be able to trace KVM C 
code with a small change if necessary.


- Naveen

---
(*) putting on my kprobe hat
Naveen N. Rao March 9, 2018, 8:17 a.m. UTC | #6
Steven Rostedt wrote:
> On Thu, 08 Mar 2018 00:07:07 +0530
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote:
> 
>> Yes, that's negligible.
>> Though, to be honest, I will have to introduce a 'mfmsr' for the older 
>> -pg variant. I still think that the improved reliability far outweighs 
>> the minor slowdown there.
> 
> In that case, can you introduce a read_mostly variable that can be
> tested before calling the mfmsr. Why punish normal ftrace tracing if
> kvm is not enabled or running?
> 
> Both should probably have an #ifdef CONFIG_KVM encapsulating the code.

Agreed. My previous intent was to exclude all realmode code from ftrace, 
but I now think it is better to only exclude the KVM path. So, I will 
make it specific to KVM.

- Naveen
Michael Ellerman March 9, 2018, 10:27 a.m. UTC | #7
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> Michael Ellerman wrote:
>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>> 
>>> We can't take a trap in most parts of real mode code. Instead of adding
>>> the 'notrace' annotation to all C functions that can be invoked from
>>> real mode, detect that we are in real mode on ftrace entry and return
>>> back.
>>>
>>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>>> ---
>>> This RFC only handles -mprofile-kernel to demonstrate the approach being 
>>> considered. We will need to handle other ftrace entry if we decide to 
>>> continue down this path.
>> 
>> Paul and I were talking about having a paca flag for this, ie.
>> paca->safe_to_ftrace (or whatever). I'm not sure if you've talked to
>> him and decided this is a better approach.
>> 
>> I guess I'm 50/50 on which is better, they both have pluses and minuses.
>
> Thanks, I hadn't spoken to Paul, but I now think that this is probably 
> the better approach to take.

OK.

> My earlier assumption was that we have other scenarios when we are in 
> realmode (specifically with MSR_RI unset) where we won't be able to 
> recover from a trap, during function tracing (*). I did a set of 
> experiments yesterday to verify that, but I was not able to uncover any 
> such scenarios with my brief testing. So, we seem to be functioning just 
> fine while tracing realmode C code, except for KVM.

Hmm. If MSR_RI is clear then that should indicate that you can't recover
from an interrupt, typically because you'd lose state in SRR0/1. So I
would expect things to go badly in that case.

But that's sort of orthogonal to real mode. Real mode is different and
we do need to be careful, but a blanket ban on tracing in real mode
might be too broad.

The problem with KVM is that you're running in real mode (MMU off), in
the host kernel, but with the MMU context of the guest loaded. So if you
do anything that turns the MMU on, like a WARN_ON(), then all of a
sudden you're running guest kernel code.

> As such, rather than blacklisting all realmode code, I think it is 
> better to be selective and just disable the tracer for KVM since we know 
> we can't take a trap there. We will be able to use the same approach if 
> we uncover additional scenarios where we can't use function tracing. I 
> will look at implementing a paca field for this purpose.

Thanks. I think it could work well.

We could also use it during early boot, ie. the flag starts out false,
and in the kexec/kdump path as well.

cheers
Naveen N. Rao March 9, 2018, 12:05 p.m. UTC | #8
Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>> My earlier assumption was that we have other scenarios when we are in 
>> realmode (specifically with MSR_RI unset) where we won't be able to 
>> recover from a trap, during function tracing (*). I did a set of 
>> experiments yesterday to verify that, but I was not able to uncover 
>> any such scenarios with my brief testing. So, we seem to be 
>> functioning just fine while tracing realmode C code, except for KVM.
> 
> Hmm. If MSR_RI is clear then that should indicate that you can't recover
> from an interrupt, typically because you'd lose state in SRR0/1. So I
> would expect things to go badly in that case.

Yes, so it looks like we aren't calling into any C code (that isn't 
already annotated with 'notrace') with MSR_RI unset. At least, with the 
usual interrupt handling on powernv. I tested this by putting a 'trap' 
in the function tracer callback after the recursion test. This forces a 
trap for each function that we trace non-recursively.

There may be other paths where we do so, but it isn't as pervasive as I 
previously thought. So, we should be able to exclude those paths using 
the paca field, as and when we find them.

- Naveen
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
index 3f3e81852422..ecc0e8e38ead 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
+++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S
@@ -56,6 +56,21 @@  _GLOBAL(ftrace_caller)
 
 	/* Load special regs for save below */
 	mfmsr   r8
+
+	/* Only proceed if we are not in real mode and can take interrupts */
+	andi.	r9, r8, MSR_IR|MSR_DR|MSR_RI
+	cmpdi	r9, MSR_IR|MSR_DR|MSR_RI
+	beq	1f
+	mflr	r8
+	mtctr	r8
+	REST_GPR(9, r1)
+	REST_GPR(8, r1)
+	addi	r1, r1, SWITCH_FRAME_SIZE
+	ld	r0, LRSAVE(r1)
+	mtlr	r0
+	bctr
+
+1:
 	mfctr   r9
 	mfxer   r10
 	mfcr	r11