diff mbox

[RFC] Fix Oops in rtas_stop_self()

Message ID 1398418381.2805.168.camel@ThinkPad-T5421.cn.ibm.com (mailing list archive)
State Accepted
Commit 4fb8d027dca0236c811272d342cf185569d91311
Headers show

Commit Message

Li Zhong April 25, 2014, 9:33 a.m. UTC
When trying offline cpus, I noticed following Oops in rtas_stop_self(),
and it seems caused by commit 41dd03a9. The Oops disappears after
reverting this commit.

After reading the code, I guess it might be caused by moving the
rtas_args to stack. Still need some more time to read enter_rtas to
understand why it happens, but the problem seems could be solved by
moving the rtas_args away from stack by adding static before it.

[  247.194623] cpu 28 (hwid 28) Ready to die...
[  247.194641] Unable to handle kernel paging request for data at address 0xecb213c50
[  247.194642] Faulting instruction address: 0x0f34157c
[  247.194645] Oops: Kernel access of bad area, sig: 11 [#1]
[  247.194648] SMP NR_CPUS=1024 NUMA pSeries
[  247.194664] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables ehea ibmvscsi scsi_transport_srp scsi_tgt
[  247.194667] CPU: 28 PID: 0 Comm: swapper/28 Not tainted 3.15.0-rc1+ #10
[  247.194670] CPU: 27 PID: 1171 Comm: drmgr Not tainted 3.15.0-rc1+ #10
[  247.194673] task: c0000007ca2f0ba0 ti: c000000ecb210000 task.ti: c000000ecb210000
[  247.194679] NIP: 000000000f34157c LR: 000000000000a84c CTR: 0000000000000151
[  247.194680] REGS: c000000ecb213780 TRAP: 0300   Not tainted  (3.15.0-rc1+)
[  247.194684] MSR: 8000000000001000 <SF,ME>  CR: 00000000  XER: 00000000
[  247.194707] CFAR: c00000000000a844 DAR: 0000000ecb213c50 DSISR: 40000000 SOFTE: 0 
[  247.194707] GPR00: 8000000000001030 c000000ecb213a00 c0000000014c7880 0000000ecb213c50 
[  247.194707] GPR04: 000000000f340000 0000000ecb213c50 0000000000000000 8000000000001032 
[  247.194707] GPR08: 0000000000001000 c000000ecb213a00 0000000000000000 0000000000000000 
[  247.194707] GPR12: 0000000000000000 c00000000f227e00 c00000000091f5b0 000000000f394e2c 
[  247.194707] GPR16: 000000000000001c 0000000000000001 000000000000001c c0000000013ad5b0 
[  247.194707] GPR20: 0000000000000001 0000000000000000 c0000000013bfb89 c000000000d57688 
[  247.194707] GPR24: 000000000000001c c000000000d44e2c c000000ecb210000 c0000000013c0f48 
[  247.194707] GPR28: c000000000d44e28 c00000000150ca68 00000000000000e0 c000000ecb210000 
[  247.194710] NIP [000000000f34157c] 0xf34157c
[  247.194712] LR [000000000000a84c] 0xa84c
[  247.194714] Call Trace:
[  247.194716] [c000000ecb213a00] [000000000000001c] 0x1c (unreliable)
[  247.194722] [c000000ecb213be0] [c00000000007ffe0] .pseries_mach_cpu_die+0x150/0x350
[  247.194725] [c000000ecb213cf0] [c0000000000420cc] .cpu_die+0x3c/0x60
[  247.194727] [c000000ecb213d60] [c000000000017e58] .arch_cpu_idle_dead+0x28/0x40
[  247.194732] [c000000ecb213dd0] [c000000000104f40] .cpu_startup_entry+0x570/0x590
[  247.194735] [c000000ecb213ed0] [c000000000041d4c] .start_secondary+0x2dc/0x310
[  247.194738] [c000000ecb213f90] [c00000000000996c] .start_secondary_prolog+0x10/0x14
[  247.194740] Instruction dump:
[  247.194744] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[  247.194748] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[  247.194751] ---[ end trace d1d21584135396ba ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Anton Blanchard April 25, 2014, 12:18 p.m. UTC | #1
Hi,

> When trying offline cpus, I noticed following Oops in
> rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops
> disappears after reverting this commit.
> 
> After reading the code, I guess it might be caused by moving the
> rtas_args to stack. Still need some more time to read enter_rtas to
> understand why it happens, but the problem seems could be solved by
> moving the rtas_args away from stack by adding static before it.

Nice catch. RTAS is 32bit and if your box has more than 4GB RAM then
your stack could easily be outside 32bit range.

You can add:

Signed-off-by: Anton Blanchard <anton@samba.org>

And also:

Cc: stable@vger.kernel.org # 3.14+

> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297
> 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu)
>  
>  static void rtas_stop_self(void)
>  {
> -	struct rtas_args args = {
> -		.token = cpu_to_be32(rtas_stop_self_token),
> +	static struct rtas_args args = {
>  		.nargs = 0,
>  		.nret = 1,
>  		.rets = &args.args[0],
>  	};
>  
> +	args.token = cpu_to_be32(rtas_stop_self_token);
> +
>  	local_irq_disable();
>  
>  	BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Li Zhong April 28, 2014, 12:26 a.m. UTC | #2
On Fri, 2014-04-25 at 22:18 +1000, Anton Blanchard wrote:
> Hi,
> 
> > When trying offline cpus, I noticed following Oops in
> > rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops
> > disappears after reverting this commit.
> > 
> > After reading the code, I guess it might be caused by moving the
> > rtas_args to stack. Still need some more time to read enter_rtas to
> > understand why it happens, but the problem seems could be solved by
> > moving the rtas_args away from stack by adding static before it.
> 
> Nice catch. RTAS is 32bit and if your box has more than 4GB RAM then
> your stack could easily be outside 32bit range.

Ah, yes, the stack here is obviously at a much higher address than 4GB.

> 
> You can add:
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> 
> And also:
> 
> Cc: stable@vger.kernel.org # 3.14+

OK, 

Thanks, Zhong
> 
> > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 9b8e050..20d6297
> > 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -88,13 +88,14 @@ void set_default_offline_state(int cpu)
> >  
> >  static void rtas_stop_self(void)
> >  {
> > -	struct rtas_args args = {
> > -		.token = cpu_to_be32(rtas_stop_self_token),
> > +	static struct rtas_args args = {
> >  		.nargs = 0,
> >  		.nret = 1,
> >  		.rets = &args.args[0],
> >  	};
> >  
> > +	args.token = cpu_to_be32(rtas_stop_self_token);
> > +
> >  	local_irq_disable();
> >  
> >  	BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);
> > 
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>
David Laight April 28, 2014, 9:06 a.m. UTC | #3
From: Li Zhong

> On Fri, 2014-04-25 at 22:18 +1000, Anton Blanchard wrote:

> > Hi,

> >

> > > When trying offline cpus, I noticed following Oops in

> > > rtas_stop_self(), and it seems caused by commit 41dd03a9. The Oops

> > > disappears after reverting this commit.

> > >

> > > After reading the code, I guess it might be caused by moving the

> > > rtas_args to stack. Still need some more time to read enter_rtas to

> > > understand why it happens, but the problem seems could be solved by

> > > moving the rtas_args away from stack by adding static before it.

> >

> > Nice catch. RTAS is 32bit and if your box has more than 4GB RAM then

> > your stack could easily be outside 32bit range.

> 

> Ah, yes, the stack here is obviously at a much higher address than 4GB.


Are we talking of physical or virtual addresses here?
(or even user?)

Is there a re-entrancy problem using kernel static data?

	David
Benjamin Herrenschmidt April 28, 2014, 9:17 a.m. UTC | #4
On Mon, 2014-04-28 at 09:06 +0000, David Laight wrote:
> > Ah, yes, the stack here is obviously at a much higher address than
> 4GB.
> 
> Are we talking of physical or virtual addresses here?
> (or even user?)

Real.

> Is there a re-entrancy problem using kernel static data?

Not for this code. This is called as the last thing a CPU does
before returning to firmware.

Cheers,
Ben.
diff mbox

Patch

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 9b8e050..20d6297 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -88,13 +88,14 @@  void set_default_offline_state(int cpu)
 
 static void rtas_stop_self(void)
 {
-	struct rtas_args args = {
-		.token = cpu_to_be32(rtas_stop_self_token),
+	static struct rtas_args args = {
 		.nargs = 0,
 		.nret = 1,
 		.rets = &args.args[0],
 	};
 
+	args.token = cpu_to_be32(rtas_stop_self_token);
+
 	local_irq_disable();
 
 	BUG_ON(rtas_stop_self_token == RTAS_UNKNOWN_SERVICE);