diff mbox

powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at powerpc platform

Message ID 1395992312-23035-1-git-send-email-dongsheng.wang@freescale.com (mailing list archive)
State Rejected
Headers show

Commit Message

Dongsheng Wang March 28, 2014, 7:38 a.m. UTC
From: Wang Dongsheng <dongsheng.wang@freescale.com>

If softirq use hardirq stack, we will get kernel painc when a hard irq coming again
during __do_softirq enable local irq to deal with softirq action. So we need to switch
satck into softirq stack when invoke soft irq.

 Task--->
	| Task stack
	|
	Interrput->EXCEPTION->do_IRQ->
	^			     | Hard irq stack
	|			     |
	|			     irq_exit->__do_softirq->local_irq_enable--   -->local_irq_disable
	|									| Hard irq stack
	|									|
	|									Interrupt coming again
	|		There will get a Interrupt nesting			|
	------------------------------------------------------------------------

Trace 1: Trap 900

Kernel stack overflow in process e8152f40, r1=e8e05ec0
CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432
task: e8152f40 ti: c080a000 task.ti: ef176000
NIP: c05bec04 LR: c0305590 CTR: 00000010
REGS: e8e05e10 TRAP: 0901   Not tainted  (3.13.0-rc3-03475-g2e3f85b)
MSR: 00029000 <CE,EE,ME>  CR: 22f22722  XER: 20000000

GPR00: c0305590 e8e05ec0 e8152f40 c07e1e2c 00029000 000000ec fffffffc 00000010
GPR08: 0000007f 00000000 00000000 b02539f3 a00ae278
NIP [c05bec04] _raw_spin_unlock_irqrestore+0x10/0x14
LR [c0305590] add_timer_randomness+0x60/0xfc
Call Trace:
[e8e05ec0] [c0305590] add_timer_randomness+0x60/0xfc (unreliable)
[e8e05ee0] [c026c9a8] blk_update_bidi_request+0x64/0x94
[e8e05f00] [c026cd00] blk_end_bidi_request+0x20/0x7c
[e8e05f20] [c032f21c] scsi_io_completion+0xe0/0x5e8
[e8e05f70] [c0272b84] blk_done_softirq+0x98/0xb8
[e8e05f90] [c004893c] __do_softirq+0xf8/0x1f8
[e8e05fe0] [c0048dbc] irq_exit+0xa4/0xc8
[e8e05ff0] [c000d5f4] call_do_irq+0x24/0x3c
[ef177d50] [c00046ec] do_IRQ+0x8c/0xf8
[ef177d70] [c000f6dc] ret_from_except+0x0/0x18
--- Exception: 501 at lzo1x_1_do_compress+0x248/0x40c
    LR = lzo1x_1_compress+0x98/0x268
[ef177e30] [c07c7440] runqueues+0x0/0x540 (unreliable)
[ef177e60] [00000000]   (null)
[ef177ea0] [c0085a9c] lzo_compress_threadfn+0x6c/0x138
[ef177ef0] [c0062a00] kthread+0xc4/0xd8
[ef177f40] [c000f158] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
40a2fff0 4c00012c 2f890000 419e000c 38600000 4e800020 38600001 4e800020
7c0004ac 39200000 91230000 7c800106 <4e800020> 7d201828 35290001 40810010
Kernel panic - not syncing: kernel stack overflow
CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432
Call Trace:
Rebooting in 180 seconds..

Trace 2: Trap 500

VFS: Mounted root (ext2 filesystem) on device 1:0.
devtmpfs: mounted
Freeing unused kernel memory: 268K (c079a000 - c07dd000)
INIT: version 2.88 booting
Starting udev
udevd[1423]: starting version 182
random: nonblocking pool is initialized
Kernel stack overflow in process e829ca80, r1=e8badf90
CPU: 0 PID: 1553 Comm: mount.sh Not tainted 3.13.0-rc1-148228-gea7ca7c #21
task: e829ca80 ti: c081c000 task.ti: e9d28000
NIP: c00434bc LR: c0043444 CTR: c0018cec
REGS: e8badee0 TRAP: 0501   Not tainted  (3.13.0-rc1-148228-gea7ca7c)
MSR: 00029000 <CE,EE,ME>  CR: 48222422  XER: 20000000

GPR00: c00439a0 e8badf90 e829ca80 00000001 e80cc780 00000001 b92f44af 00000000
GPR08: 00000001 010ba000 010ba000 ddd3e6d1 48222422
NIP [c00434bc] __do_softirq+0x94/0x1f8
LR [c0043444] __do_softirq+0x1c/0x1f8
Call Trace:
[e8badf90] [100f0000] 0x100f0000 (unreliable)
[e8badfe0] [c00439a0] irq_exit+0xa4/0xc8
[e8badff0] [c000ccd8] call_do_irq+0x24/0x3c
[e9d29f20] [c000479c] do_IRQ+0x8c/0xf8
[e9d29f40] [c000eb54] ret_from_except+0x0/0x18
--- Exception: 501 at 0x1003d540
    LR = 0x10041974
Instruction dump:
3e80c082 3f40c07e 3b60000a 3a941040 3aa00000 3b5a9388 7f16c378 812f0008
5529103a 7d3c482e 7eb8492e 7c008146 <3ba00000> 7e9ea378 48000014 57fff87f
Kernel panic - not syncing: kernel stack overflow
CPU: 0 PID: 1553 Comm: mount.sh Not tainted 3.13.0-rc1-148228-gea7ca7c #21
Call Trace:
Rebooting in 180 seconds..

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>

Comments

Kevin Hao March 28, 2014, 8:18 a.m. UTC | #1
On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> If softirq use hardirq stack, we will get kernel painc when a hard irq coming again
> during __do_softirq enable local irq to deal with softirq action. So we need to switch
> satck into softirq stack when invoke soft irq.
> 
>  Task--->
> 	| Task stack
> 	|
> 	Interrput->EXCEPTION->do_IRQ->
> 	^			     | Hard irq stack
> 	|			     |
> 	|			     irq_exit->__do_softirq->local_irq_enable--   -->local_irq_disable
> 	|									| Hard irq stack
> 	|									|
> 	|									Interrupt coming again
> 	|		There will get a Interrupt nesting			|
> 	------------------------------------------------------------------------
> 
> Trace 1: Trap 900
> 
> Kernel stack overflow in process e8152f40, r1=e8e05ec0
> CPU: 0 PID: 2399 Comm: image_compress/ Not tainted 3.13.0-rc3-03475-g2e3f85b #432
> task: e8152f40 ti: c080a000 task.ti: ef176000
> NIP: c05bec04 LR: c0305590 CTR: 00000010
> REGS: e8e05e10 TRAP: 0901   Not tainted  (3.13.0-rc3-03475-g2e3f85b)

Could you double check if you got the following patch applied?

commit 1a18a66446f3f289b05b634f18012424d82aa63a
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Jan 17 12:25:28 2014 +0800

    powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
    
    Guenter Roeck has got the following call trace on a p2020 board:
      Kernel stack overflow in process eb3e5a00, r1=eb79df90
      CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
      task: eb3e5a00 ti: c0616000 task.ti: ef440000
      NIP: c003a420 LR: c003a410 CTR: c0017518
      REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
      MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
      GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
      GPR08: 00000000 020b8000 00000000 00000000 44008442
      NIP [c003a420] __do_softirq+0x94/0x1ec
      LR [c003a410] __do_softirq+0x84/0x1ec
      Call Trace:
      [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
      [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
      [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
      [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
      [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
      --- Exception: 501 at 0xfcda524
          LR = 0x10024900
      Instruction dump:
      7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
      5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
      Kernel panic - not syncing: kernel stack overflow
      CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
      Call Trace:
    
    The reason is that we have used the wrong register to calculate the
    ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
    Just fix it.
    
    As suggested by Benjamin Herrenschmidt, also add the C prototype of the
    function in the comment in order to avoid such kind of errors in the
    future.
    
    Cc: stable@vger.kernel.org # 3.12
    Reported-by: Guenter Roeck <linux@roeck-us.net>
    Tested-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Thanks,
Kevin
Dongsheng Wang March 28, 2014, 9 a.m. UTC | #2
Thanks Kevin. Your patch works normal. :)

I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it?

Regards,
-Dongsheng

> -----Original Message-----
> From: Kevin Hao [mailto:haokexin@gmail.com]
> Sent: Friday, March 28, 2014 4:18 PM
> To: Wang Dongsheng-B40534
> Cc: fweisbec@gmail.com; James Hogan; Andrew Morton; David S. Miller; Peter
> Zijlstra; Helge Deller; H. Peter Anvin; Heiko Carstens; linux-
> kernel@vger.kernel.org; Paul Mackerras; James E.J. Bottomley; Linus Torvalds;
> Jin Zhengxiong-R64188; Wood Scott-B07421; Thomas Gleixner; linuxppc-
> dev@lists.ozlabs.org; Ingo Molnar; Martin Schwidefsky
> Subject: Re: [PATCH] powerpc/irq: Remove HAVE_IRQ_EXIT_ON_IRQ_STACK feature at
> powerpc platform
> 
> On Fri, Mar 28, 2014 at 03:38:32PM +0800, Dongsheng Wang wrote:
> > From: Wang Dongsheng <dongsheng.wang@freescale.com>
> >
> > If softirq use hardirq stack, we will get kernel painc when a hard irq
> > coming again during __do_softirq enable local irq to deal with softirq
> > action. So we need to switch satck into softirq stack when invoke soft irq.
> >
> >  Task--->
> > 	| Task stack
> > 	|
> > 	Interrput->EXCEPTION->do_IRQ->
> > 	^			     | Hard irq stack
> > 	|			     |
> > 	|			     irq_exit->__do_softirq->local_irq_enable--   --
> >local_irq_disable
> > 	|									| Hard irq
> stack
> > 	|									|
> > 	|									Interrupt
> coming again
> > 	|		There will get a Interrupt nesting			|
> >
> > ----------------------------------------------------------------------
> > --
> >
> > Trace 1: Trap 900
> >
> > Kernel stack overflow in process e8152f40, r1=e8e05ec0
> > CPU: 0 PID: 2399 Comm: image_compress/ Not tainted
> > 3.13.0-rc3-03475-g2e3f85b #432
> > task: e8152f40 ti: c080a000 task.ti: ef176000
> > NIP: c05bec04 LR: c0305590 CTR: 00000010
> > REGS: e8e05e10 TRAP: 0901   Not tainted  (3.13.0-rc3-03475-g2e3f85b)
> 
> Could you double check if you got the following patch applied?
> 
> commit 1a18a66446f3f289b05b634f18012424d82aa63a
> Author: Kevin Hao <haokexin@gmail.com>
> Date:   Fri Jan 17 12:25:28 2014 +0800
> 
>     powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
> 
>     Guenter Roeck has got the following call trace on a p2020 board:
>       Kernel stack overflow in process eb3e5a00, r1=eb79df90
>       CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00
> #4
>       task: eb3e5a00 ti: c0616000 task.ti: ef440000
>       NIP: c003a420 LR: c003a410 CTR: c0017518
>       REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
>       MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
>       GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646
> 00000000
>       GPR08: 00000000 020b8000 00000000 00000000 44008442
>       NIP [c003a420] __do_softirq+0x94/0x1ec
>       LR [c003a410] __do_softirq+0x84/0x1ec
>       Call Trace:
>       [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
>       [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
>       [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
>       [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
>       [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
>       --- Exception: 501 at 0xfcda524
>           LR = 0x10024900
>       Instruction dump:
>       7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
>       5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
>       Kernel panic - not syncing: kernel stack overflow
>       CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00
> #4
>       Call Trace:
> 
>     The reason is that we have used the wrong register to calculate the
>     ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
>     Just fix it.
> 
>     As suggested by Benjamin Herrenschmidt, also add the C prototype of the
>     function in the comment in order to avoid such kind of errors in the
>     future.
> 
>     Cc: stable@vger.kernel.org # 3.12
>     Reported-by: Guenter Roeck <linux@roeck-us.net>
>     Tested-by: Guenter Roeck <linux@roeck-us.net>
>     Signed-off-by: Kevin Hao <haokexin@gmail.com>
>     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Thanks,
> Kevin
Kevin Hao March 28, 2014, 9:54 a.m. UTC | #3
On Fri, Mar 28, 2014 at 09:00:13AM +0000, Dongsheng.Wang@freescale.com wrote:
> Thanks Kevin. Your patch works normal. :)
> 
> I still have some confused. I think when __do_softirq always get a interrupt, the hard stack will be run out, isn't it?

No, it won't. Please see the explanation in the following commit log.

commit cc1f027454929924471bea2f362431072e3c71be
Author: Frederic Weisbecker <fweisbec@gmail.com>
Date:   Tue Sep 24 17:17:47 2013 +0200

    irq: Optimize softirq stack selection in irq exit
    
    If irq_exit() is called on the arch's specified irq stack,
    it should be safe to run softirqs inline under that same
    irq stack as it is near empty by the time we call irq_exit().
    
    For example if we use the same stack for both hard and soft irqs here,
    the worst case scenario is:
    hardirq -> softirq -> hardirq. But then the softirq supersedes the
    first hardirq as the stack user since irq_exit() is called in
    a mostly empty stack. So the stack merge in this case looks acceptable.
    
    Stack overrun still have a chance to happen if hardirqs have more
    opportunities to nest, but then it's another problem to solve.
    
    So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
    that can be defined when irq_exit() runs on the irq stack. That way
    we can spare some stack switch on irq processing and all the cache
    issues that come along.
    
Thanks,
Kevin
Benjamin Herrenschmidt March 28, 2014, 9:25 p.m. UTC | #4
On Fri, 2014-03-28 at 15:38 +0800, Dongsheng Wang wrote:
> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> If softirq use hardirq stack, we will get kernel painc when a hard irq coming again
> during __do_softirq enable local irq to deal with softirq action. So we need to switch
> satck into softirq stack when invoke soft irq.

Yes, an interrupt can potentially nest but we should be near the top of
the stack at that point, as the comment says in softirq.c, it should
be fine. And your backtrace doesn't seem to indicate a major overflow.

The code in do_IRQ() will make sure we don't switch stack again if
we were already on either hard or softirq stack.

I need a better analysis of your problem. Is that really a stack
overflow ? Or is it a false positive due to a bug in the overflow
detection ?

I moved around the code that updates KSP_LIMIT in 32-bit to asm in
misc_32.S a while ago since we don't do that on 64-bit, maybe we are
getting it wrong...

Cheers,
Ben.
Benjamin Herrenschmidt March 28, 2014, 9:27 p.m. UTC | #5
On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote:

>     powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
>    

Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but
not 3.13 ... can you fix that up ?

Cheers,
Ben.

>     Guenter Roeck has got the following call trace on a p2020 board:
>       Kernel stack overflow in process eb3e5a00, r1=eb79df90
>       CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
>       task: eb3e5a00 ti: c0616000 task.ti: ef440000
>       NIP: c003a420 LR: c003a410 CTR: c0017518
>       REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
>       MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
>       GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
>       GPR08: 00000000 020b8000 00000000 00000000 44008442
>       NIP [c003a420] __do_softirq+0x94/0x1ec
>       LR [c003a410] __do_softirq+0x84/0x1ec
>       Call Trace:
>       [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
>       [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
>       [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
>       [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
>       [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
>       --- Exception: 501 at 0xfcda524
>           LR = 0x10024900
>       Instruction dump:
>       7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
>       5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
>       Kernel panic - not syncing: kernel stack overflow
>       CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
>       Call Trace:
>     
>     The reason is that we have used the wrong register to calculate the
>     ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
>     Just fix it.
>     
>     As suggested by Benjamin Herrenschmidt, also add the C prototype of the
>     function in the comment in order to avoid such kind of errors in the
>     future.
>     
>     Cc: stable@vger.kernel.org # 3.12
>     Reported-by: Guenter Roeck <linux@roeck-us.net>
>     Tested-by: Guenter Roeck <linux@roeck-us.net>
>     Signed-off-by: Kevin Hao <haokexin@gmail.com>
>     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Thanks,
> Kevin
Kevin Hao March 29, 2014, 1:20 a.m. UTC | #6
On Sat, Mar 29, 2014 at 08:27:07AM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2014-03-28 at 16:18 +0800, Kevin Hao wrote:
> 
> >     powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
> >    
> 
> Kevin. It looks like it was applied to 3.14 and sent to 3.12 stable but
> not 3.13 ... can you fix that up ?

It was already merged into 3.13 stable since 3.13.6:
  https://lkml.org/lkml/2014/3/4/787

I guess that Dongsheng didn't use the latest 3.13 stable tree.

Thanks,
Kevin
diff mbox

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 957bf34..ffde3fb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -139,7 +139,6 @@  config PPC
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION if PPC32
 	select HAVE_DEBUG_STACKOVERFLOW
-	select HAVE_IRQ_EXIT_ON_IRQ_STACK
 	select ARCH_USE_CMPXCHG_LOCKREF if PPC64
 
 config GENERIC_CSUM