Patchwork [RFC,powerpc] Fix a lazy irq related WARING in arch_local_irq_restore()

login
register
mail settings
Submitter Li Zhong
Date Sept. 24, 2012, 3:56 a.m.
Message ID <1348458962.2475.8.camel@ThinkPad-T420>
Download mbox | patch
Permalink /patch/186294/
State Superseded
Headers show

Comments

Li Zhong - Sept. 24, 2012, 3:56 a.m.
This patch tries to fix a WARNING in irq.c(below), when doing cpu offline/online. 

The reason is that if the preferred offline state is CPU_STATE_INACTIVE, when
cpu offline, pseries_mach_cpu_die() calls extended_cede_processor() to cede
the processor. After the hv call returns, the MSR_EE is enabled, while
the irq_happened in paca should already be set as PACA_IRQ_HARD_DIS.

Then when the cpu is put online again, the warning is reported when
start_secondary() tries to enable irq. [ Sometimes, we don't see this warning,
that's because when we come to local_irq_enable(), there might already have been
some interrupts occurred, e.g. PACA_IRQ_DEC is already set. ]

The patch tries to clear MSR_EE by calling __hard_irq_disable() after cede
returns, and before calling start_secondary_resume(). 
 
[   56.618846] WARNING: at arch/powerpc/kernel/irq.c:240
[   56.618851] Modules linked in: rcutorture ipv6 dm_mod ext3 jbd mbcache sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt ibmveth
[   56.618883] NIP: c00000000000ff94 LR: c00000000067a5e0 CTR: 0000000000000001
[   56.618889] REGS: c0000001fef6bbe0 TRAP: 0700   Not tainted  (3.6.0-rc1-autokern1)
[   56.618894] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42000082  XER: 20000000
[   56.618913] SOFTE: 1
[   56.618916] CFAR: c00000000067a5dc
[   56.618920] TASK = c0000001feed79a0[0] 'swapper/5' THREAD: c0000001fef68000 CPU: 5
GPR00: 0000000000000001 c0000001fef6be60 c000000000f9ca08 0000000000000001 
GPR04: 0000000000000001 0000000000000008 0000000000000001 0000000000000000 
GPR08: 0000000000000000 c0000001feed79a0 0008a80000000000 0000000000000000 
GPR12: 0000000022000082 c00000000f330f00 c0000001fef6bf90 000000000f394b4c 
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
GPR24: c000000000fe8f80 0000000000000008 0000000000000028 0000000000000000 
GPR28: 0000000000000000 0000000000000020 c000000000f1ab40 0000000000000001 
[   56.619014] NIP [c00000000000ff94] .arch_local_irq_restore+0x34/0xa0
[   56.619020] LR [c00000000067a5e0] .start_secondary+0x368/0x37c
[   56.619025] Call Trace:
[   56.619030] [c0000001fef6be60] [c000000001ba0500] 0xc000000001ba0500 (unreliable)
[   56.619038] [c0000001fef6bed0] [c00000000067a5e0] .start_secondary+0x368/0x37c
[   56.619046] [c0000001fef6bf90] [c000000000009380] .start_secondary_resume+0x10/0x14
[   56.619052] Instruction dump:
[   56.619056] f8010010 f821ff91 986d022a 2fa30000 419e0054 880d022b 78000621 41820048 
[   56.619071] 2f800001 40de0064 7c0000a6 78008fe2 <0b000000> 2fa00000 40de0050 38000000 
[   56.619088] ---[ end trace 0199c0d783d7f9ba ]---

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
Benjamin Herrenschmidt - Sept. 24, 2012, 5:30 a.m.
On Mon, 2012-09-24 at 11:56 +0800, Li Zhong wrote:
> This patch tries to fix a WARNING in irq.c(below), when doing cpu offline/online. 
> 
> The reason is that if the preferred offline state is CPU_STATE_INACTIVE, when
> cpu offline, pseries_mach_cpu_die() calls extended_cede_processor() to cede
> the processor. After the hv call returns, the MSR_EE is enabled, while
> the irq_happened in paca should already be set as PACA_IRQ_HARD_DIS.
> 
> Then when the cpu is put online again, the warning is reported when
> start_secondary() tries to enable irq. [ Sometimes, we don't see this warning,
> that's because when we come to local_irq_enable(), there might already have been
> some interrupts occurred, e.g. PACA_IRQ_DEC is already set. ]
> 
> The patch tries to clear MSR_EE by calling __hard_irq_disable() after cede
> returns, and before calling start_secondary_resume(). 

Is that enough ? Do we know for sure we won't get a stray IPI or
interrupt which then will reach us with an inconsistent HW/SW enable
state ?

We might need to "sanitize" the enable state in the PACA before we
actually enter NAP or in the return from NAP code, like we do for normal
idle code...

Cheers,
Ben.

> [   56.618846] WARNING: at arch/powerpc/kernel/irq.c:240
> [   56.618851] Modules linked in: rcutorture ipv6 dm_mod ext3 jbd mbcache sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt ibmveth
> [   56.618883] NIP: c00000000000ff94 LR: c00000000067a5e0 CTR: 0000000000000001
> [   56.618889] REGS: c0000001fef6bbe0 TRAP: 0700   Not tainted  (3.6.0-rc1-autokern1)
> [   56.618894] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42000082  XER: 20000000
> [   56.618913] SOFTE: 1
> [   56.618916] CFAR: c00000000067a5dc
> [   56.618920] TASK = c0000001feed79a0[0] 'swapper/5' THREAD: c0000001fef68000 CPU: 5
> GPR00: 0000000000000001 c0000001fef6be60 c000000000f9ca08 0000000000000001 
> GPR04: 0000000000000001 0000000000000008 0000000000000001 0000000000000000 
> GPR08: 0000000000000000 c0000001feed79a0 0008a80000000000 0000000000000000 
> GPR12: 0000000022000082 c00000000f330f00 c0000001fef6bf90 000000000f394b4c 
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> GPR24: c000000000fe8f80 0000000000000008 0000000000000028 0000000000000000 
> GPR28: 0000000000000000 0000000000000020 c000000000f1ab40 0000000000000001 
> [   56.619014] NIP [c00000000000ff94] .arch_local_irq_restore+0x34/0xa0
> [   56.619020] LR [c00000000067a5e0] .start_secondary+0x368/0x37c
> [   56.619025] Call Trace:
> [   56.619030] [c0000001fef6be60] [c000000001ba0500] 0xc000000001ba0500 (unreliable)
> [   56.619038] [c0000001fef6bed0] [c00000000067a5e0] .start_secondary+0x368/0x37c
> [   56.619046] [c0000001fef6bf90] [c000000000009380] .start_secondary_resume+0x10/0x14
> [   56.619052] Instruction dump:
> [   56.619056] f8010010 f821ff91 986d022a 2fa30000 419e0054 880d022b 78000621 41820048 
> [   56.619071] 2f800001 40de0064 7c0000a6 78008fe2 <0b000000> 2fa00000 40de0050 38000000 
> [   56.619088] ---[ end trace 0199c0d783d7f9ba ]---
> 
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 64c97d8..8de539a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -137,6 +137,8 @@ static void pseries_mach_cpu_die(void)
>  		if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
>  			unregister_slb_shadow(hwcpu);
>  
> +			__hard_irq_disable();
> +
>  			/*
>  			 * Call to start_secondary_resume() will not return.
>  			 * Kernel stack will be reset and start_secondary()
Li Zhong - Sept. 24, 2012, 6:42 a.m.
On Mon, 2012-09-24 at 15:30 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2012-09-24 at 11:56 +0800, Li Zhong wrote:
> > This patch tries to fix a WARNING in irq.c(below), when doing cpu offline/online. 
> > 
> > The reason is that if the preferred offline state is CPU_STATE_INACTIVE, when
> > cpu offline, pseries_mach_cpu_die() calls extended_cede_processor() to cede
> > the processor. After the hv call returns, the MSR_EE is enabled, while
> > the irq_happened in paca should already be set as PACA_IRQ_HARD_DIS.
> > 
> > Then when the cpu is put online again, the warning is reported when
> > start_secondary() tries to enable irq. [ Sometimes, we don't see this warning,
> > that's because when we come to local_irq_enable(), there might already have been
> > some interrupts occurred, e.g. PACA_IRQ_DEC is already set. ]
> > 
> > The patch tries to clear MSR_EE by calling __hard_irq_disable() after cede
> > returns, and before calling start_secondary_resume(). 
> 
Hi Ben,

> Is that enough ? Do we know for sure we won't get a stray IPI or
> interrupt which then will reach us with an inconsistent HW/SW enable
> state ?

I thought when coming to here, there should be no IPI or interrupt for
this cpu ...

> We might need to "sanitize" the enable state in the PACA before we
> actually enter NAP or in the return from NAP code, like we do for normal
> idle code...

Do you mean something like this in extended_cede_processor() to make
sure HW/SW enable state consistent? 

+	prep_irq_for_idle();
        rc = cede_processor();
+#ifdef CONFIG_TRACE_IRQFLAGS
+                /* Ensure that H_CEDE returns with IRQs on */
+                if (WARN_ON(!(mfmsr() & MSR_EE)))
+                        __hard_irq_enable();
+#endif  


So, normally, we will have irqs SW/HW enabled consistently. But if the
above happens (some missing IPI/interrupt ), I'm not sure whether we can
leave it to be handled after cpu online again? 

Please correct me if I misunderstood something. 

Thanks, Zhong

> Cheers,
> Ben.
> 
> > [   56.618846] WARNING: at arch/powerpc/kernel/irq.c:240
> > [   56.618851] Modules linked in: rcutorture ipv6 dm_mod ext3 jbd mbcache sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt ibmveth
> > [   56.618883] NIP: c00000000000ff94 LR: c00000000067a5e0 CTR: 0000000000000001
> > [   56.618889] REGS: c0000001fef6bbe0 TRAP: 0700   Not tainted  (3.6.0-rc1-autokern1)
> > [   56.618894] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42000082  XER: 20000000
> > [   56.618913] SOFTE: 1
> > [   56.618916] CFAR: c00000000067a5dc
> > [   56.618920] TASK = c0000001feed79a0[0] 'swapper/5' THREAD: c0000001fef68000 CPU: 5
> > GPR00: 0000000000000001 c0000001fef6be60 c000000000f9ca08 0000000000000001 
> > GPR04: 0000000000000001 0000000000000008 0000000000000001 0000000000000000 
> > GPR08: 0000000000000000 c0000001feed79a0 0008a80000000000 0000000000000000 
> > GPR12: 0000000022000082 c00000000f330f00 c0000001fef6bf90 000000000f394b4c 
> > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> > GPR24: c000000000fe8f80 0000000000000008 0000000000000028 0000000000000000 
> > GPR28: 0000000000000000 0000000000000020 c000000000f1ab40 0000000000000001 
> > [   56.619014] NIP [c00000000000ff94] .arch_local_irq_restore+0x34/0xa0
> > [   56.619020] LR [c00000000067a5e0] .start_secondary+0x368/0x37c
> > [   56.619025] Call Trace:
> > [   56.619030] [c0000001fef6be60] [c000000001ba0500] 0xc000000001ba0500 (unreliable)
> > [   56.619038] [c0000001fef6bed0] [c00000000067a5e0] .start_secondary+0x368/0x37c
> > [   56.619046] [c0000001fef6bf90] [c000000000009380] .start_secondary_resume+0x10/0x14
> > [   56.619052] Instruction dump:
> > [   56.619056] f8010010 f821ff91 986d022a 2fa30000 419e0054 880d022b 78000621 41820048 
> > [   56.619071] 2f800001 40de0064 7c0000a6 78008fe2 <0b000000> 2fa00000 40de0050 38000000 
> > [   56.619088] ---[ end trace 0199c0d783d7f9ba ]---
> > 
> > Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/platforms/pseries/hotplug-cpu.c |    2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > index 64c97d8..8de539a 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -137,6 +137,8 @@ static void pseries_mach_cpu_die(void)
> >  		if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
> >  			unregister_slb_shadow(hwcpu);
> >  
> > +			__hard_irq_disable();
> > +
> >  			/*
> >  			 * Call to start_secondary_resume() will not return.
> >  			 * Kernel stack will be reset and start_secondary()
> 
>

Patch

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 64c97d8..8de539a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -137,6 +137,8 @@  static void pseries_mach_cpu_die(void)
 		if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
 			unregister_slb_shadow(hwcpu);
 
+			__hard_irq_disable();
+
 			/*
 			 * Call to start_secondary_resume() will not return.
 			 * Kernel stack will be reset and start_secondary()