diff mbox series

powernv: Avoid calling trace tlbie in kexec path.

Message ID 151137178794.14123.18003389448303728066.stgit@jupiter.in.ibm.com (mailing list archive)
State Accepted
Commit a3961f824cdbe7eb431254dc7d8f6f6767f474aa
Headers show
Series powernv: Avoid calling trace tlbie in kexec path. | expand

Commit Message

Mahesh J Salgaonkar Nov. 22, 2017, 5:32 p.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Rebooting into a new kernel with kexec fails in trace_tlbie() which is
called from native_hpte_clear(). This happens if the running kernel has
CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
execute few RCU checks regardless of whether tracing is on or off.
We are already in the last phase of kexec sequence in real mode with
HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
causes kexec to fail.

Fix this by not calling trace_tlbie() from native_hpte_clear().

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/mm/hash_native_64.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Naveen N. Rao Nov. 22, 2017, 7:07 p.m. UTC | #1
Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
> called from native_hpte_clear(). This happens if the running kernel has
> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
> execute few RCU checks regardless of whether tracing is on or off.
> We are already in the last phase of kexec sequence in real mode with
> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
> causes kexec to fail.
> 
> Fix this by not calling trace_tlbie() from native_hpte_clear().
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/mm/hash_native_64.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> index 3848af1..640cf56 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -47,7 +47,8 @@
> 
>  DEFINE_RAW_SPINLOCK(native_tlbie_lock);
> 
> -static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
> +static inline unsigned long  ___tlbie(unsigned long vpn, int psize,
> +						int apsize, int ssize)
>  {
>  	unsigned long va;
>  	unsigned int penc;
> @@ -100,7 +101,15 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
>  			     : "memory");
>  		break;
>  	}
> -	trace_tlbie(0, 0, va, 0, 0, 0, 0);

Does it help if you use the _rcuidle variant instead, to turn off all 
rcu checks for tracing __tlbie()?
	trace_tlbie_rcuidle(0, 0, va, 0, 0, 0, 0);


- Naveen
Balbir Singh Nov. 22, 2017, 10:56 p.m. UTC | #2
On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
<mahesh@linux.vnet.ibm.com> wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
> called from native_hpte_clear(). This happens if the running kernel has
> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
> execute few RCU checks regardless of whether tracing is on or off.
> We are already in the last phase of kexec sequence in real mode with
> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
> causes kexec to fail.
>

Effectively we can't enter the trace point code after we've set
HILE_BE.  Do we need
a fixes tag? Or is this a side-effect of a new generic change?

I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
and have the condition do the right thing, but what you have for now is good.

Balbir Singh.
Mahesh J Salgaonkar Nov. 23, 2017, 9:38 a.m. UTC | #3
On 11/23/2017 12:37 AM, Naveen N. Rao wrote:
> Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>> called from native_hpte_clear(). This happens if the running kernel has
>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>> execute few RCU checks regardless of whether tracing is on or off.
>> We are already in the last phase of kexec sequence in real mode with
>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>> causes kexec to fail.
>>
>> Fix this by not calling trace_tlbie() from native_hpte_clear().
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
>> ---
>>  arch/powerpc/mm/hash_native_64.c |   15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c
>> b/arch/powerpc/mm/hash_native_64.c
>> index 3848af1..640cf56 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -47,7 +47,8 @@
>>
>>  DEFINE_RAW_SPINLOCK(native_tlbie_lock);
>>
>> -static inline void __tlbie(unsigned long vpn, int psize, int apsize,
>> int ssize)
>> +static inline unsigned long  ___tlbie(unsigned long vpn, int psize,
>> +                        int apsize, int ssize)
>>  {
>>      unsigned long va;
>>      unsigned int penc;
>> @@ -100,7 +101,15 @@ static inline void __tlbie(unsigned long vpn, int
>> psize, int apsize, int ssize)
>>                   : "memory");
>>          break;
>>      }
>> -    trace_tlbie(0, 0, va, 0, 0, 0, 0);
> 
> Does it help if you use the _rcuidle variant instead, to turn off all
> rcu checks for tracing __tlbie()?
>     trace_tlbie_rcuidle(0, 0, va, 0, 0, 0, 0);

It helps if tracepoint is not enabled. But with tracepoint enabled kexec
still fails. I think we should not have tracepoint in kexec path at all.
If someone enables it, kexec will definitely fail regardless of
CONFIG_LOCKDEP.

Thanks,
-Mahesh.
Naveen N. Rao Nov. 23, 2017, 10:02 a.m. UTC | #4
Mahesh Jagannath Salgaonkar wrote:
> On 11/23/2017 12:37 AM, Naveen N. Rao wrote:
>> Mahesh J Salgaonkar wrote:
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>>> called from native_hpte_clear(). This happens if the running kernel has
>>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>>> execute few RCU checks regardless of whether tracing is on or off.
>>> We are already in the last phase of kexec sequence in real mode with
>>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>>> causes kexec to fail.
>>>
>>> Fix this by not calling trace_tlbie() from native_hpte_clear().
>>>
>>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
>>> ---
>>>  arch/powerpc/mm/hash_native_64.c |   15 ++++++++++++---
>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>

[snip]

>>> @@ -100,7 +101,15 @@ static inline void __tlbie(unsigned long vpn, 
>>> int
>>> psize, int apsize, int ssize)
>>>                   : "memory");
>>>          break;
>>>      }
>>> -    trace_tlbie(0, 0, va, 0, 0, 0, 0);
>> 
>> Does it help if you use the _rcuidle variant instead, to turn off all
>> rcu checks for tracing __tlbie()?
>>     trace_tlbie_rcuidle(0, 0, va, 0, 0, 0, 0);
> 
> It helps if tracepoint is not enabled. But with tracepoint enabled kexec
> still fails. I think we should not have tracepoint in kexec path at all.
> If someone enables it, kexec will definitely fail regardless of
> CONFIG_LOCKDEP.

Ok, thanks for confirming that other tracepoints don't interfere with 
kexec. As Balbir points out, moving to TRACE_EVENT_CONDITION() with a 
global in_kexec variable may be the other option, but is probably 
overkill for a single tracepoint.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>


Thanks,
Naveen
Mahesh J Salgaonkar Nov. 23, 2017, 11:18 a.m. UTC | #5
On 11/23/2017 04:26 AM, Balbir Singh wrote:
> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
> <mahesh@linux.vnet.ibm.com> wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>> called from native_hpte_clear(). This happens if the running kernel has
>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>> execute few RCU checks regardless of whether tracing is on or off.
>> We are already in the last phase of kexec sequence in real mode with
>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>> causes kexec to fail.
>>
> 
> Effectively we can't enter the trace point code after we've set
> HILE_BE.  Do we need
> a fixes tag? Or is this a side-effect of a new generic change?

Yup. I missed it. Will resend the patch with fixes tag

Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")

> 
> I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
> and have the condition do the right thing, but what you have for now is good.
> 
> Balbir Singh.
>
Michael Ellerman Nov. 23, 2017, 12:15 p.m. UTC | #6
Balbir Singh <bsingharora@gmail.com> writes:

> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
> <mahesh@linux.vnet.ibm.com> wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>> called from native_hpte_clear(). This happens if the running kernel has
>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>> execute few RCU checks regardless of whether tracing is on or off.
>> We are already in the last phase of kexec sequence in real mode with
>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>> causes kexec to fail.
>>
>
> Effectively we can't enter the trace point code after we've set
> HILE_BE.  Do we need
> a fixes tag? Or is this a side-effect of a new generic change?

Yes I added:

Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+

> I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
> and have the condition do the right thing, but what you have for now is good.

No I think the right thing is to not call trace points from kexec code,
it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
RCU breakage.

cheers
Naveen N. Rao Nov. 23, 2017, 1:11 p.m. UTC | #7
Michael Ellerman wrote:
> Balbir Singh <bsingharora@gmail.com> writes:
> 
>> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
>> <mahesh@linux.vnet.ibm.com> wrote:
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>>> called from native_hpte_clear(). This happens if the running kernel has
>>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>>> execute few RCU checks regardless of whether tracing is on or off.
>>> We are already in the last phase of kexec sequence in real mode with
>>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>>> causes kexec to fail.
>>>
>>
>> Effectively we can't enter the trace point code after we've set
>> HILE_BE.  Do we need
>> a fixes tag? Or is this a side-effect of a new generic change?
> 
> Yes I added:
> 
> Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
> Cc: stable@vger.kernel.org # v4.13+
> 
>> I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
>> and have the condition do the right thing, but what you have for now is good.
> 
> No I think the right thing is to not call trace points from kexec code,
> it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
> RCU breakage.

I agree on the fragile part, though it appears to me that a 
TRACE_EVENT_CONDITION() with a check for is_kexec (that needs to be 
added) will prevent breakage since both the LOCKDEP block as well as the 
tracepoint itself are guarded by the condition. So, none of the rcu code 
should be executed as long as we set is_kexec at the right time.  
However, since there are all of 1 tracepoint(s) affecting kexec, it is 
probably not worth the effort at the moment.

- Naveen
Balbir Singh Nov. 24, 2017, 4:13 a.m. UTC | #8
On Fri, Nov 24, 2017 at 12:11 AM, Naveen N. Rao
<naveen.n.rao@linux.vnet.ibm.com> wrote:
> Michael Ellerman wrote:
>>
>> Balbir Singh <bsingharora@gmail.com> writes:
>>
>>> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
>>> <mahesh@linux.vnet.ibm.com> wrote:
>>>>
>>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>>
>>>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>>>> called from native_hpte_clear(). This happens if the running kernel has
>>>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>>>> execute few RCU checks regardless of whether tracing is on or off.
>>>> We are already in the last phase of kexec sequence in real mode with
>>>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>>>> causes kexec to fail.
>>>>
>>>
>>> Effectively we can't enter the trace point code after we've set
>>> HILE_BE.  Do we need
>>> a fixes tag? Or is this a side-effect of a new generic change?
>>
>>
>> Yes I added:
>>
>> Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
>> Cc: stable@vger.kernel.org # v4.13+
>>
>>> I think the right thing in the longer run might be to do a
>>> TRACE_EVENT_CONDITION
>>> and have the condition do the right thing, but what you have for now is
>>> good.
>>
>>
>> No I think the right thing is to not call trace points from kexec code,
>> it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
>> RCU breakage.
>
>
> I agree on the fragile part, though it appears to me that a
> TRACE_EVENT_CONDITION() with a check for is_kexec (that needs to be added)
> will prevent breakage since both the LOCKDEP block as well as the tracepoint
> itself are guarded by the condition. So, none of the rcu code should be
> executed as long as we set is_kexec at the right time.  However, since there
> are all of 1 tracepoint(s) affecting kexec, it is probably not worth the
> effort at the moment.
>

+1, I am good with this change for now. I agree that we should not
call trace points
from kexec code, the tracepoint was for other paths, but we should
definitely avoid
this path.

Mahesh, is this path specific to hash or do we have similar issues in radix?

Balbir Singh.
Michael Ellerman Nov. 24, 2017, 6:10 a.m. UTC | #9
Balbir Singh <bsingharora@gmail.com> writes:
> On Fri, Nov 24, 2017 at 12:11 AM, Naveen N. Rao
> <naveen.n.rao@linux.vnet.ibm.com> wrote:
>> Michael Ellerman wrote:
>>> Balbir Singh <bsingharora@gmail.com> writes:
...
>>>
>>>> I think the right thing in the longer run might be to do a
>>>> TRACE_EVENT_CONDITION
>>>> and have the condition do the right thing, but what you have for now is
>>>> good.
>>>
>>>
>>> No I think the right thing is to not call trace points from kexec code,
>>> it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
>>> RCU breakage.
>>
>> I agree on the fragile part, though it appears to me that a
>> TRACE_EVENT_CONDITION() with a check for is_kexec (that needs to be added)
>> will prevent breakage since both the LOCKDEP block as well as the tracepoint
>> itself are guarded by the condition. So, none of the rcu code should be
>> executed as long as we set is_kexec at the right time.  However, since there
>> are all of 1 tracepoint(s) affecting kexec, it is probably not worth the
>> effort at the moment.
>
> +1, I am good with this change for now. I agree that we should not
> call trace points
> from kexec code, the tracepoint was for other paths, but we should
> definitely avoid
> this path.
>
> Mahesh, is this path specific to hash or do we have similar issues in radix?

The radix code will trigger the tracepoints, so we should fix that. It
may not actually crash but that's secondary.

See:
  mmu_cleanup_all()
  -> radix__mmu_cleanup_all()
     -> radix__flush_tlb_all()
        -> trace_tlbie()

cheers
Michael Ellerman Nov. 24, 2017, 6:10 a.m. UTC | #10
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:

> Michael Ellerman wrote:
>> Balbir Singh <bsingharora@gmail.com> writes:
>> 
>>> On Thu, Nov 23, 2017 at 4:32 AM, Mahesh J Salgaonkar
>>> <mahesh@linux.vnet.ibm.com> wrote:
>>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>>
>>>> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
>>>> called from native_hpte_clear(). This happens if the running kernel has
>>>> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
>>>> execute few RCU checks regardless of whether tracing is on or off.
>>>> We are already in the last phase of kexec sequence in real mode with
>>>> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
>>>> causes kexec to fail.
>>>>
>>>
>>> Effectively we can't enter the trace point code after we've set
>>> HILE_BE.  Do we need
>>> a fixes tag? Or is this a side-effect of a new generic change?
>> 
>> Yes I added:
>> 
>> Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
>> Cc: stable@vger.kernel.org # v4.13+
>> 
>>> I think the right thing in the longer run might be to do a TRACE_EVENT_CONDITION
>>> and have the condition do the right thing, but what you have for now is good.
>> 
>> No I think the right thing is to not call trace points from kexec code,
>> it's too fragile. TRACE_EVENT_CONDITION wouldn't have saved us from this
>> RCU breakage.
>
> I agree on the fragile part, though it appears to me that a 
> TRACE_EVENT_CONDITION() with a check for is_kexec (that needs to be 
> added) will prevent breakage since both the LOCKDEP block as well as the 
> tracepoint itself are guarded by the condition. So, none of the rcu code 
> should be executed as long as we set is_kexec at the right time.  

Yes you're right, I misread that.

So maybe that is an option. But it still makes me nervous :)

cheers
Michael Ellerman Nov. 24, 2017, 9:46 a.m. UTC | #11
On Wed, 2017-11-22 at 17:32:07 UTC, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Rebooting into a new kernel with kexec fails in trace_tlbie() which is
> called from native_hpte_clear(). This happens if the running kernel has
> CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always
> execute few RCU checks regardless of whether tracing is on or off.
> We are already in the last phase of kexec sequence in real mode with
> HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and
> causes kexec to fail.
> 
> Fix this by not calling trace_tlbie() from native_hpte_clear().
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/a3961f824cdbe7eb431254dc7d8f6f

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 3848af1..640cf56 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -47,7 +47,8 @@ 
 
 DEFINE_RAW_SPINLOCK(native_tlbie_lock);
 
-static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
+static inline unsigned long  ___tlbie(unsigned long vpn, int psize,
+						int apsize, int ssize)
 {
 	unsigned long va;
 	unsigned int penc;
@@ -100,7 +101,15 @@  static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
 			     : "memory");
 		break;
 	}
-	trace_tlbie(0, 0, va, 0, 0, 0, 0);
+	return va;
+}
+
+static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize)
+{
+	unsigned long rb;
+
+	rb = ___tlbie(vpn, psize, apsize, ssize);
+	trace_tlbie(0, 0, rb, 0, 0, 0, 0);
 }
 
 static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize)
@@ -652,7 +661,7 @@  static void native_hpte_clear(void)
 		if (hpte_v & HPTE_V_VALID) {
 			hpte_decode(hptep, slot, &psize, &apsize, &ssize, &vpn);
 			hptep->v = 0;
-			__tlbie(vpn, psize, apsize, ssize);
+			___tlbie(vpn, psize, apsize, ssize);
 		}
 	}