Patchwork [BUG] genirq: Race condition in ONESHOT IRQ handler disabling IRQ forever

login
register
mail settings
Submitter Lothar Waßmann
Date Feb. 6, 2012, 8:14 a.m.
Message ID <20271.35831.227679.177366@ipc1.ka-ro>
Download mbox | patch
Permalink /patch/139687/
State New
Headers show

Comments

Lothar Waßmann - Feb. 6, 2012, 8:14 a.m.
Hi,

I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012
but did not get any response there. So resending to a wider audience
with improved subject line:

there is a race condition in the threaded IRQ handler code for oneshot
interrupts that may lead to disabling an IRQ indefinitely. IRQs are
masked before calling the hard-irq handler and are unmasked only after
the soft-irq handler has been run. Thus if the hard-irq handler
returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
will not be called, the interrupt will remain masked forever.

This can happen due to a short pulse on the interrupt line, that
triggers the interrupt logic, but goes undetected by the hard-irq
handler. The problem can be reproduced with the TSC2007 touch
controller driver that uses ONESHOT interrupts.

The problem arises also with interrupt controllers that latch a level
triggered IRQ until it is acknowledged (like the i.MX28 does).
In this case the IRQ status bit will remain asserted after the
soft-irq finishes and retrigger the interrupt while the interrupt line
is already deasserted.

The following patch would solve the problem, but I'm not sure whether
it's the Right Thing(TM) to do. Especially wrt. shared interrupts.


Best regards,

Lothar Wassmann
Lars-Peter Clausen - Feb. 6, 2012, 10:42 a.m.
On 02/06/2012 09:14 AM, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote:
> Hi,
> 
> I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012
> but did not get any response there. So resending to a wider audience
> with improved subject line:
> 
> there is a race condition in the threaded IRQ handler code for oneshot
> interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> masked before calling the hard-irq handler and are unmasked only after
> the soft-irq handler has been run. Thus if the hard-irq handler
> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> will not be called, the interrupt will remain masked forever.
> 
> This can happen due to a short pulse on the interrupt line, that
> triggers the interrupt logic, but goes undetected by the hard-irq
> handler. The problem can be reproduced with the TSC2007 touch
> controller driver that uses ONESHOT interrupts.
> 
> The problem arises also with interrupt controllers that latch a level
> triggered IRQ until it is acknowledged (like the i.MX28 does).
> In this case the IRQ status bit will remain asserted after the
> soft-irq finishes and retrigger the interrupt while the interrupt line
> is already deasserted.
> 
> The following patch would solve the problem, but I'm not sure whether
> it's the Right Thing(TM) to do. Especially wrt. shared interrupts.
> 
> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> index 470d08c..93beadb 100644
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
>  			/* Fall through to add to randomness */
>  		case IRQ_HANDLED:
>  			random |= action->flags;
> +			/* unmask the IRQ that has been left masked
> +			 * due to race condition
> +			 */
> +			if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
> +				unmask_irq(desc);
>  			break;
>  
>  		default:

I think a better fix is to check the return value of handle_irq_event in
handle_level_irq and if the IRQ_WAKE_THREADED bit is not set unmask the irq.

The same should probably also be done for handle_fasteoi_irq.
Yong Zhang - Feb. 7, 2012, 9:03 a.m.
On Mon, Feb 06, 2012 at 09:14:47AM +0100, =?utf-8?Q?Lothar_Wa=C3=9Fmann?= wrote:
> Hi,
> 
> I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012
> but did not get any response there. So resending to a wider audience
> with improved subject line:
> 
> there is a race condition in the threaded IRQ handler code for oneshot
> interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> masked before calling the hard-irq handler and are unmasked only after
> the soft-irq handler has been run. Thus if the hard-irq handler
> returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> will not be called, the interrupt will remain masked forever.
> 
> This can happen due to a short pulse on the interrupt line, that
> triggers the interrupt logic, but goes undetected by the hard-irq
> handler. The problem can be reproduced with the TSC2007 touch
> controller driver that uses ONESHOT interrupts.

Isn't it the responsibility of the driver (say TSC2007)?

In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.

Thanks,
Yong


> 
> The problem arises also with interrupt controllers that latch a level
> triggered IRQ until it is acknowledged (like the i.MX28 does).
> In this case the IRQ status bit will remain asserted after the
> soft-irq finishes and retrigger the interrupt while the interrupt line
> is already deasserted.
> 
> The following patch would solve the problem, but I'm not sure whether
> it's the Right Thing(TM) to do. Especially wrt. shared interrupts.
> 
> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> index 470d08c..93beadb 100644
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -146,6 +146,11 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
>  			/* Fall through to add to randomness */
>  		case IRQ_HANDLED:
>  			random |= action->flags;
> +			/* unmask the IRQ that has been left masked
> +			 * due to race condition
> +			 */
> +			if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
> +				unmask_irq(desc);
>  			break;
>  
>  		default:
> 
> Best regards,
> 
> Lothar Wassmann
> -- 
> ___________________________________________________________
> 
> Ka-Ro electronics GmbH | Pascalstraße 22 | D - 52076 Aachen
> Phone: +49 2408 1402-0 | Fax: +49 2408 1402-10
> Geschäftsführer: Matthias Kaussen
> Handelsregistereintrag: Amtsgericht Aachen, HRB 4996
> 
> www.karo-electronics.de | info@karo-electronics.de
> ___________________________________________________________
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Lothar Waßmann - Feb. 7, 2012, 10:01 a.m.
Hi,

> On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
> > Hi,
> > 
> > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012
> > but did not get any response there. So resending to a wider audience
> > with improved subject line:
> > 
> > there is a race condition in the threaded IRQ handler code for oneshot
> > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > masked before calling the hard-irq handler and are unmasked only after
> > the soft-irq handler has been run. Thus if the hard-irq handler
> > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> > will not be called, the interrupt will remain masked forever.
> > 
> > This can happen due to a short pulse on the interrupt line, that
> > triggers the interrupt logic, but goes undetected by the hard-irq
> > handler. The problem can be reproduced with the TSC2007 touch
> > controller driver that uses ONESHOT interrupts.
> 
> Isn't it the responsibility of the driver (say TSC2007)?
> 
> In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
> 
That would mean it had to return IRQ_WAKE_THREAD unconditionally
making the return code useless.
And it would cause an extra useless loop through the softirq
handler.


Lothar Waßmann
Yong Zhang - Feb. 7, 2012, 12:34 p.m.
On Tue, Feb 07, 2012 at 11:01:06AM +0100, Lothar Waßmann wrote:
> Hi,
> 
> > On Mon, Feb 06, 2012 at 09:14:47AM +0100, Lothar Waßmann wrote:
> > > Hi,
> > > 
> > > I already sent this to <linux-kernel@vger.kernel.org> on Feb. 1, 2012
> > > but did not get any response there. So resending to a wider audience
> > > with improved subject line:
> > > 
> > > there is a race condition in the threaded IRQ handler code for oneshot
> > > interrupts that may lead to disabling an IRQ indefinitely. IRQs are
> > > masked before calling the hard-irq handler and are unmasked only after
> > > the soft-irq handler has been run. Thus if the hard-irq handler
> > > returns IRQ_HANDLED instead of IRQ_WAKE_THREAD, meaning the soft-irq
> > > will not be called, the interrupt will remain masked forever.
> > > 
> > > This can happen due to a short pulse on the interrupt line, that
> > > triggers the interrupt logic, but goes undetected by the hard-irq
> > > handler. The problem can be reproduced with the TSC2007 touch
> > > controller driver that uses ONESHOT interrupts.
> > 
> > Isn't it the responsibility of the driver (say TSC2007)?
> > 
> > In this case, TSC2007 should return IRQ_WAKE_THREAD IMHO.
> > 
> That would mean it had to return IRQ_WAKE_THREAD unconditionally
> making the return code useless.
> And it would cause an extra useless loop through the softirq
> handler.

Yeah, it's the default behavior when we introduce 'theadirqs',
and it's safe.

You know in your patch unmask_irq() is called locklessly and
it will introduce other race.

Thanks,
Yong

Patch

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 470d08c..93beadb 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -146,6 +146,11 @@  handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
 			/* Fall through to add to randomness */
 		case IRQ_HANDLED:
 			random |= action->flags;
+			/* unmask the IRQ that has been left masked
+			 * due to race condition
+			 */
+			if (res == IRQ_HANDLED && (action->flags & IRQF_ONESHOT))
+				unmask_irq(desc);
 			break;
 
 		default: