Message ID | 20171116160052.18672-3-npiggin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | acb1feab320e38588fccc568e3767761f494976f |
Headers | show |
Series | interrupt tracing fixes | expand |
Nicholas Piggin <npiggin@gmail.com> writes: > When an interrupt is returning to a soft-disabled context (which can > happen for non-maskable interrupts or synchronous interrupts), it goes > through the motions of soft-disabling again, including calling > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). > > This is not necessary, because we must already be soft-disabled in the > interrupt context, it also may be causing crashes in the irq tracing > code to re-enter as an nmi. Replace it with a warning to ensure that > soft-interrupts are still disabled. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > arch/powerpc/kernel/entry_64.S | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) So this patch is the core of the bug fix I gather. Git blames says: Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") Cc: stable@vger.kernel.org # v3.4+ But I'm wondering how this has been broken that long without us noticing? You hit it doing some sort of perf stress test I think - so is it just that we've never pushed hard enough? Or did something change to expose this? Or we're just not sure? cheers > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > index 3320bcac7192..36878b6ee8b8 100644 > --- a/arch/powerpc/kernel/entry_64.S > +++ b/arch/powerpc/kernel/entry_64.S > @@ -911,9 +911,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > beq 1f > rlwinm r7,r7,0,~PACA_IRQ_HARD_DIS > stb r7,PACAIRQHAPPENED(r13) > -1: li r0,0 > - stb r0,PACASOFTIRQEN(r13); > - TRACE_DISABLE_INTS > +1: > +#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG) > + /* The interrupt should not have soft enabled. */ > + lbz r7,PACASOFTIRQEN(r13) > +1: tdnei r7,0 > + EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING > +#endif > b .Ldo_restore > > /* > -- > 2.15.0
On Mon, 04 Dec 2017 16:09:57 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > When an interrupt is returning to a soft-disabled context (which can > > happen for non-maskable interrupts or synchronous interrupts), it goes > > through the motions of soft-disabling again, including calling > > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). > > > > This is not necessary, because we must already be soft-disabled in the > > interrupt context, it also may be causing crashes in the irq tracing > > code to re-enter as an nmi. Replace it with a warning to ensure that > > soft-interrupts are still disabled. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > arch/powerpc/kernel/entry_64.S | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > So this patch is the core of the bug fix I gather. > > Git blames says: > > Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") > Cc: stable@vger.kernel.org # v3.4+ > > But I'm wondering how this has been broken that long without us > noticing? You hit it doing some sort of perf stress test I think - so is > it just that we've never pushed hard enough? Or did something change to > expose this? Or we're just not sure? I'm not really sure. A customer hit it, during either a stress test or long running workload with lockdep irq tracing and perf running at the same time. I don't have a lot more details but we might be able to get some offline if necessary. Thanks, Nick
Nicholas Piggin <npiggin@gmail.com> writes: > On Mon, 04 Dec 2017 16:09:57 +1100 > Michael Ellerman <mpe@ellerman.id.au> wrote: > >> Nicholas Piggin <npiggin@gmail.com> writes: >> >> > When an interrupt is returning to a soft-disabled context (which can >> > happen for non-maskable interrupts or synchronous interrupts), it goes >> > through the motions of soft-disabling again, including calling >> > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). >> > >> > This is not necessary, because we must already be soft-disabled in the >> > interrupt context, it also may be causing crashes in the irq tracing >> > code to re-enter as an nmi. Replace it with a warning to ensure that >> > soft-interrupts are still disabled. >> > >> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> > --- >> > arch/powerpc/kernel/entry_64.S | 10 +++++++--- >> > 1 file changed, 7 insertions(+), 3 deletions(-) >> >> So this patch is the core of the bug fix I gather. >> >> Git blames says: >> >> Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") >> Cc: stable@vger.kernel.org # v3.4+ >> >> But I'm wondering how this has been broken that long without us >> noticing? You hit it doing some sort of perf stress test I think - so is >> it just that we've never pushed hard enough? Or did something change to >> expose this? Or we're just not sure? > > I'm not really sure. A customer hit it, during either a stress test or > long running workload with lockdep irq tracing and perf running at the > same time. I don't have a lot more details but we might be able to get > some offline if necessary. No worries. I'll put it in and add it to my maybe-for-stable list. We can let it soak a bit and then send it to stable if we think it's necessary and has no issues. cheers
On Mon, 2017-12-04 at 16:09 +1100, Michael Ellerman wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > When an interrupt is returning to a soft-disabled context (which can > > happen for non-maskable interrupts or synchronous interrupts), it goes > > through the motions of soft-disabling again, including calling > > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). > > > > This is not necessary, because we must already be soft-disabled in the > > interrupt context, it also may be causing crashes in the irq tracing > > code to re-enter as an nmi. Replace it with a warning to ensure that > > soft-interrupts are still disabled. > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > arch/powerpc/kernel/entry_64.S | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > So this patch is the core of the bug fix I gather. > > Git blames says: > > Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") > Cc: stable@vger.kernel.org # v3.4+ > > But I'm wondering how this has been broken that long without us > noticing? You hit it doing some sort of perf stress test I think - so is > it just that we've never pushed hard enough? Or did something change to > expose this? Or we're just not sure? We have some traps that do local_irq_enable ... you may want to double check instruction emu, page faults, alignment etc... I wouldn't be surprised if we have case where an interrupt "returns" soft enabled. > cheers > > > diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S > > index 3320bcac7192..36878b6ee8b8 100644 > > --- a/arch/powerpc/kernel/entry_64.S > > +++ b/arch/powerpc/kernel/entry_64.S > > @@ -911,9 +911,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > > beq 1f > > rlwinm r7,r7,0,~PACA_IRQ_HARD_DIS > > stb r7,PACAIRQHAPPENED(r13) > > -1: li r0,0 > > - stb r0,PACASOFTIRQEN(r13); > > - TRACE_DISABLE_INTS > > +1: > > +#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG) > > + /* The interrupt should not have soft enabled. */ > > + lbz r7,PACASOFTIRQEN(r13) > > +1: tdnei r7,0 > > + EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING > > +#endif > > b .Ldo_restore > > > > /* > > -- > > 2.15.0
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Mon, 2017-12-04 at 16:09 +1100, Michael Ellerman wrote: >> Nicholas Piggin <npiggin@gmail.com> writes: >> >> > When an interrupt is returning to a soft-disabled context (which can >> > happen for non-maskable interrupts or synchronous interrupts), it goes >> > through the motions of soft-disabling again, including calling >> > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). >> > >> > This is not necessary, because we must already be soft-disabled in the >> > interrupt context, it also may be causing crashes in the irq tracing >> > code to re-enter as an nmi. Replace it with a warning to ensure that >> > soft-interrupts are still disabled. >> > >> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> >> > --- >> > arch/powerpc/kernel/entry_64.S | 10 +++++++--- >> > 1 file changed, 7 insertions(+), 3 deletions(-) >> >> So this patch is the core of the bug fix I gather. >> >> Git blames says: >> >> Fixes: 7c0482e3d055 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") >> Cc: stable@vger.kernel.org # v3.4+ >> >> But I'm wondering how this has been broken that long without us >> noticing? You hit it doing some sort of perf stress test I think - so is >> it just that we've never pushed hard enough? Or did something change to >> expose this? Or we're just not sure? > > We have some traps that do local_irq_enable ... you may want to double > check instruction emu, page faults, alignment etc... I wouldn't be > surprised if we have case where an interrupt "returns" soft enabled. AFAICT those all check that they're coming from a soft-enabled context: if (!arch_irq_disabled_regs(regs)) local_irq_enable(); And the code Nick is patching is the case where we're returning to a soft-disabled context. So I think the patch is good. cheers
On Thu, 2017-11-16 at 16:00:50 UTC, Nicholas Piggin wrote: > When an interrupt is returning to a soft-disabled context (which can > happen for non-maskable interrupts or synchronous interrupts), it goes > through the motions of soft-disabling again, including calling > TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). > > This is not necessary, because we must already be soft-disabled in the > interrupt context, it also may be causing crashes in the irq tracing > code to re-enter as an nmi. Replace it with a warning to ensure that > soft-interrupts are still disabled. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/acb1feab320e38588fccc568e37677 cheers
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 3320bcac7192..36878b6ee8b8 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -911,9 +911,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) beq 1f rlwinm r7,r7,0,~PACA_IRQ_HARD_DIS stb r7,PACAIRQHAPPENED(r13) -1: li r0,0 - stb r0,PACASOFTIRQEN(r13); - TRACE_DISABLE_INTS +1: +#if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG) + /* The interrupt should not have soft enabled. */ + lbz r7,PACASOFTIRQEN(r13) +1: tdnei r7,0 + EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING +#endif b .Ldo_restore /*
When an interrupt is returning to a soft-disabled context (which can happen for non-maskable interrupts or synchronous interrupts), it goes through the motions of soft-disabling again, including calling TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()). This is not necessary, because we must already be soft-disabled in the interrupt context, it also may be causing crashes in the irq tracing code to re-enter as an nmi. Replace it with a warning to ensure that soft-interrupts are still disabled. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/kernel/entry_64.S | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)