Patchwork genirq: Fix race condition in ONESHOT irq handler

login
register
mail settings
Submitter Lothar Waßmann
Date Feb. 7, 2012, 1:38 p.m.
Message ID <1328621921-17404-1-git-send-email-LW@KARO-electronics.de>
Download mbox | patch
Permalink /patch/139930/
State New
Headers show

Comments

Lothar Waßmann - Feb. 7, 2012, 1:38 p.m.
There is a race condition in the threaded IRQ handler code for oneshot
interrupts that may lead to disabling an IRQ indefinitely. IRQs are
masked before calling the hard-irq handler and are unmasked only after
the soft-irq handler has been run. Thus if the hard-irq handler
returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
will not be called, the interrupt will remain masked forever.

This can happen due to a short pulse on the interrupt line, that
triggers the interrupt logic, but goes undetected by the hard-irq
handler. The problem can be reproduced with the TSC2007 touch
controller driver that uses ONESHOT interrupts.

The problem arises also with interrupt controllers that latch a level
triggered IRQ until it is acknowledged (like the i.MX28 does).
In this case the IRQ status bit will remain asserted after the
soft-irq finishes and retrigger the interrupt while the interrupt line
is already deasserted.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
---
 kernel/irq/chip.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)
Lothar Waßmann - Feb. 8, 2012, 6:05 a.m.
Hi,

Thomas Gleixner writes:
> On Tue, 7 Feb 2012, Lothar Waßmann wrote:
> 
> > There is a race condition in the threaded IRQ handler code for oneshot
> > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > masked before calling the hard-irq handler and are unmasked only after
> > the soft-irq handler has been run. Thus if the hard-irq handler
> > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> 
> Well, oneshot mode interrupts always had the semantics that the
> threaded handler needs to run unconditionally. In fact the oneshot
> mode was implemented to handle hardware which cannot do anything in
> hard interrupt context to avoid the ugliness of a primary handler
> calling disable_irq_nosync().
> 
> So it looks like driver developers decided that the oneshot mode might
> be interesting with a primary handler as well. I can see the reason
> why the tsc2007 driver uses it, but that does not make it a bug in the
> core code in the first place.
> 
Then maybe the core code should not check the return value
of the primary handler for IRQ_WAKE_THREAD but call the secondary
handler unconditionally for ONESHOT interrupts.
Or it should be at least documented somewhere that primary handlers
have to return IRQ_WAKE_THREAD in any case for oneshot interrupts.

> > The problem arises also with interrupt controllers that latch a level
> > triggered IRQ until it is acknowledged (like the i.MX28 does).
> > In this case the IRQ status bit will remain asserted after the
> > soft-irq finishes and retrigger the interrupt while the interrupt line
> > is already deasserted.
> 
> This does not make sense. We acknowledge interrupts via mask_ack_irq()
> right on entry of handle_level_irq(). So either the interrupt
> 
That's right. But at that point the IRQ line is still asserted and
since it is a level IRQ this will not actually clear the interrupt
status bit. Normally the IRQ status bit would self-clear when the IRQ
line is being deasserted (in this case by removing the finger from the
touch panel). But the i.MX28 leaves the IRQ status bit set and it
takes another write to the IRQ status register to remove the bogus IRQ
status.

> controller is completely hosed or this explanation is bogus.
>
The first is the case.


Lothar Waßmann
Thomas Gleixner - Feb. 8, 2012, 10:38 a.m.
On Wed, 8 Feb 2012, Lothar Waßmann wrote:
> > So it looks like driver developers decided that the oneshot mode might
> > be interesting with a primary handler as well. I can see the reason
> > why the tsc2007 driver uses it, but that does not make it a bug in the
> > core code in the first place.
> > 
> Then maybe the core code should not check the return value
> of the primary handler for IRQ_WAKE_THREAD but call the secondary
> handler unconditionally for ONESHOT interrupts.
> Or it should be at least documented somewhere that primary handlers
> have to return IRQ_WAKE_THREAD in any case for oneshot interrupts.

Well, you know how good we are with documentation :)
 
> > > The problem arises also with interrupt controllers that latch a level
> > > triggered IRQ until it is acknowledged (like the i.MX28 does).
> > > In this case the IRQ status bit will remain asserted after the
> > > soft-irq finishes and retrigger the interrupt while the interrupt line
> > > is already deasserted.
> > 
> > This does not make sense. We acknowledge interrupts via mask_ack_irq()
> > right on entry of handle_level_irq(). So either the interrupt
> > 
> That's right. But at that point the IRQ line is still asserted and
> since it is a level IRQ this will not actually clear the interrupt
> status bit. Normally the IRQ status bit would self-clear when the IRQ
> line is being deasserted (in this case by removing the finger from the
> touch panel). But the i.MX28 leaves the IRQ status bit set and it
> takes another write to the IRQ status register to remove the bogus IRQ
> status.

So the question is whether the imx irq chip implementation should
write to the status register on unmask for level type irqs to avoid
spurious interrupts being generated in the first place. This is not
only an optimization for threaded interrupts, afaict this spurious
effect should happen with non threaded interrupts as well.

Did my patch work for you ?

Thanks,

	tglx
Lothar Waßmann - Feb. 9, 2012, 8:40 a.m.
Hi,

Thomas Gleixner writes:
> On Wed, 8 Feb 2012, Lothar Waßmann wrote:
> > > > The problem arises also with interrupt controllers that latch a level
> > > > triggered IRQ until it is acknowledged (like the i.MX28 does).
> > > > In this case the IRQ status bit will remain asserted after the
> > > > soft-irq finishes and retrigger the interrupt while the interrupt line
> > > > is already deasserted.
> > > 
> > > This does not make sense. We acknowledge interrupts via mask_ack_irq()
> > > right on entry of handle_level_irq(). So either the interrupt
> > > 
> > That's right. But at that point the IRQ line is still asserted and
> > since it is a level IRQ this will not actually clear the interrupt
> > status bit. Normally the IRQ status bit would self-clear when the IRQ
> > line is being deasserted (in this case by removing the finger from the
> > touch panel). But the i.MX28 leaves the IRQ status bit set and it
> > takes another write to the IRQ status register to remove the bogus IRQ
> > status.
> 
> So the question is whether the imx irq chip implementation should
> write to the status register on unmask for level type irqs to avoid
> spurious interrupts being generated in the first place. This is not
> only an optimization for threaded interrupts, afaict this spurious
> effect should happen with non threaded interrupts as well.
> 
> Did my patch work for you ?
> 
Sorry, I couldn't test it earlier. Yes, it works.

Lothar Waßmann

Patch

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index f7c543a..74fdef9 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -343,6 +343,8 @@  EXPORT_SYMBOL_GPL(handle_simple_irq);
 void
 handle_level_irq(unsigned int irq, struct irq_desc *desc)
 {
+	irqreturn_t ret;
+
 	raw_spin_lock(&desc->lock);
 	mask_ack_irq(desc);
 
@@ -360,10 +362,13 @@  handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data)))
 		goto out_unlock;
 
-	handle_irq_event(desc);
+	ret = handle_irq_event(desc);
 
-	if (!irqd_irq_disabled(&desc->irq_data) && !(desc->istate & IRQS_ONESHOT))
+	if (!irqd_irq_disabled(&desc->irq_data) &&
+			(!(desc->istate & IRQS_ONESHOT) ||
+				!(ret & IRQ_WAKE_THREAD)))
 		unmask_irq(desc);
+
 out_unlock:
 	raw_spin_unlock(&desc->lock);
 }