diff mbox

Re: [PATCH/RFC] rtc: rtc-twl: Fixed nested IRQ handling in resume from suspend

Message ID alpine.DEB.2.10.1409161318130.4247@nanos
State Not Applicable
Headers show

Commit Message

Thomas Gleixner Sept. 17, 2014, 9:57 p.m. UTC
On Sat, 13 Sep 2014, Laurent Pinchart wrote:

> The TWL RTC interrupt is a double-nested threaded interrupt, handled
> through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> Interrupt Handler).
> 
> When the system is woken up from suspend by a TWL RTC alarm interrupt,
> the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> before the TWL RTC interrupt gets enabled. This results on the interrupt
> being processed by the TWL primary interrupt handler, forwarded to the
> nested SIH, and then marked as pending for the RTC by handle_nested_irq
> called from the SIH.
> 
> The RTC interrupt then eventually gets reenabled the kernel, which will
> try to call its top half interrupt handler. As the interrupt is a nested
> threaded IRQ, its primary handler has been set to the
> irq_nested_primary_handler function, which is never supposed to be
> called and generates a WARN_ON, without waking the IRQ thread up.

Right. It CANNOT wake up the thread, because there is no thread
associated to that particular interrupt. It's handler is called in the
context of the parent (demultiplexing) interrupt thread. Of course
twl4030 does not call irq_set_parent() for the nested
interrupts. That's there so the resend of a nested thread irq will be
targeted to its parent.
 
Using IRQF_EARLY_RESUME is really, really wrong for device drivers
simply because at the point where early resume is called the devices
have not yet been resumed, so a interrupt delivered at this point
might run into a stale device and cause a machine stall or any other
undesired side effect. It was added for a special case with Xen where
Xen needs the IPIs working early in resume. And it's definitely not
meant to solve ordering issues of interrupts on resume.

That said, looking at that twl4030 driver, there seems to be a double
nesting issue. So also the simple one level parent redirection of the
irq resend wont work. I really wonder why this only triggers in the
context of resume.

Now the resend issue is simple to fix. The resume time ordering
constraints is a bit more involved, but it should be possible w/o
inflicting anything more complex on drivers than requiring them to use
irq_set_parent(), which should be name irq_set_nested_parent().

Completely untested patch below. It applies on top of

 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm

So what twl4030 needs on top of that are calls to
irq_set_nested_parent() for the nested interrupts.

That will automatically establish the nesting depth, redirect the
retrigger to the top most interrupt and solve the resume ordering.

The resume ordering is the reverse of the nesting:

    top-irq1 - nested irq10   - nested irq20 (parent = 10)
    	     | (parent = 1)   - nested irq21 (parent = 10)
	     |
    	     - nested irq11   - nested irq22 (parent = 11)	
	     | (parent = 1)   - nested irq23 (parent = 11)
	     |
	     - nested irq12   - nested irq24 (parent = 12)
	       (parent = 1)   - nested irq25 (parent = 12)

So the resume ordering is

   20-21-22-23-24-25 - 10-11-12 - 1

Thanks,

	tglx

---
 drivers/mfd/twl6030-irq.c |    3 --
 include/linux/irq.h       |    9 ------
 include/linux/irqdesc.h   |    5 +++
 kernel/irq/manage.c       |   28 ++++++++++++++++++---
 kernel/irq/pm.c           |   60 +++++++++++++++++++++++++++++++++++++++++-----
 kernel/irq/resend.c       |    4 +--
 kernel/irq/settings.h     |    5 +++
 7 files changed, 92 insertions(+), 22 deletions(-)

Comments

Rafael J. Wysocki Sept. 21, 2014, 1:04 a.m. UTC | #1
On Wednesday, September 17, 2014 02:57:11 PM Thomas Gleixner wrote:
> On Sat, 13 Sep 2014, Laurent Pinchart wrote:
> 
> > The TWL RTC interrupt is a double-nested threaded interrupt, handled
> > through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
> > Interrupt Handler).
> > 
> > When the system is woken up from suspend by a TWL RTC alarm interrupt,
> > the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
> > sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
> > before the TWL RTC interrupt gets enabled. This results on the interrupt
> > being processed by the TWL primary interrupt handler, forwarded to the
> > nested SIH, and then marked as pending for the RTC by handle_nested_irq
> > called from the SIH.
> > 
> > The RTC interrupt then eventually gets reenabled the kernel, which will
> > try to call its top half interrupt handler. As the interrupt is a nested
> > threaded IRQ, its primary handler has been set to the
> > irq_nested_primary_handler function, which is never supposed to be
> > called and generates a WARN_ON, without waking the IRQ thread up.
> 
> Right. It CANNOT wake up the thread, because there is no thread
> associated to that particular interrupt. It's handler is called in the
> context of the parent (demultiplexing) interrupt thread. Of course
> twl4030 does not call irq_set_parent() for the nested
> interrupts. That's there so the resend of a nested thread irq will be
> targeted to its parent.
>  
> Using IRQF_EARLY_RESUME is really, really wrong for device drivers
> simply because at the point where early resume is called the devices
> have not yet been resumed, so a interrupt delivered at this point
> might run into a stale device and cause a machine stall or any other
> undesired side effect. It was added for a special case with Xen where
> Xen needs the IPIs working early in resume. And it's definitely not
> meant to solve ordering issues of interrupts on resume.
> 
> That said, looking at that twl4030 driver, there seems to be a double
> nesting issue. So also the simple one level parent redirection of the
> irq resend wont work. I really wonder why this only triggers in the
> context of resume.
> 
> Now the resend issue is simple to fix. The resume time ordering
> constraints is a bit more involved, but it should be possible w/o
> inflicting anything more complex on drivers than requiring them to use
> irq_set_parent(), which should be name irq_set_nested_parent().
> 
> Completely untested patch below. It applies on top of
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm
> 
> So what twl4030 needs on top of that are calls to
> irq_set_nested_parent() for the nested interrupts.
> 
> That will automatically establish the nesting depth, redirect the
> retrigger to the top most interrupt and solve the resume ordering.
> 
> The resume ordering is the reverse of the nesting:
> 
>     top-irq1 - nested irq10   - nested irq20 (parent = 10)
>     	     | (parent = 1)   - nested irq21 (parent = 10)
> 	     |
>     	     - nested irq11   - nested irq22 (parent = 11)	
> 	     | (parent = 1)   - nested irq23 (parent = 11)
> 	     |
> 	     - nested irq12   - nested irq24 (parent = 12)
> 	       (parent = 1)   - nested irq25 (parent = 12)
> 
> So the resume ordering is
> 
>    20-21-22-23-24-25 - 10-11-12 - 1

I was thinking about this earlier today.

Can't we do something like this (pseudo code) during resume:

resume_irq(irq) {
	if (has parent_irq)
		resume_irq(parent_irq);

	do stuff;
}

which will get us the right ordering without using bitmaps and visiting
the same one twice (once from the child and once from the main resume loop)
doesn't matter?

Rafael
Gerlando Falauto Feb. 26, 2015, 1:05 p.m. UTC | #2
Hi Thomas,

did anything ever come out of this?

I'm encountering a similar but different problem, where a nested 
interrupt handler is called directly from the resend tasklet (and 
therefore -- if I'm not mistaken -- in an atomic context, which is 
unexpected). This issue however appears during system startup, when the 
interrupt is first enabled, and is very, very hard to reproduce. So I 
don't have much room for further investigation.

Do you have any clue what might be the sequence of events leading to 
this behavior?

While I reckon that using irq_set_parent() on the nested IRQ might as 
well fix the issue (or just as well hide it), I just don't understand 
how this all works. Plus, I wonder why it is (almost) never used... I 
only found two occurrences of irq_set_parent() on 3.19-rc5 (gpiolib.c 
and twl6030-irq.c), whereas exactly zero on 3.10.60.
Is there any known way to trigger/test interrupt resending (which, I 
believe, has been around for a very long time)?

Thank you!
Gerlando

On 09/17/2014 11:57 PM, Thomas Gleixner wrote:
> On Sat, 13 Sep 2014, Laurent Pinchart wrote:
>
>> The TWL RTC interrupt is a double-nested threaded interrupt, handled
>> through the TWL SIH (Secondary Interrupt Handler) and PIH (Primary
>> Interrupt Handler).
>>
>> When the system is woken up from suspend by a TWL RTC alarm interrupt,
>> the TWL PIH and SIH are enabled first (due to the normal IRQ enabling
>> sequence for the PIH and to the IRQF_EARLY_RESUME flag for the SIH)
>> before the TWL RTC interrupt gets enabled. This results on the interrupt
>> being processed by the TWL primary interrupt handler, forwarded to the
>> nested SIH, and then marked as pending for the RTC by handle_nested_irq
>> called from the SIH.
>>
>> The RTC interrupt then eventually gets reenabled the kernel, which will
>> try to call its top half interrupt handler. As the interrupt is a nested
>> threaded IRQ, its primary handler has been set to the
>> irq_nested_primary_handler function, which is never supposed to be
>> called and generates a WARN_ON, without waking the IRQ thread up.
>
> Right. It CANNOT wake up the thread, because there is no thread
> associated to that particular interrupt. It's handler is called in the
> context of the parent (demultiplexing) interrupt thread. Of course
> twl4030 does not call irq_set_parent() for the nested
> interrupts. That's there so the resend of a nested thread irq will be
> targeted to its parent.
>
> Using IRQF_EARLY_RESUME is really, really wrong for device drivers
> simply because at the point where early resume is called the devices
> have not yet been resumed, so a interrupt delivered at this point
> might run into a stale device and cause a machine stall or any other
> undesired side effect. It was added for a special case with Xen where
> Xen needs the IPIs working early in resume. And it's definitely not
> meant to solve ordering issues of interrupts on resume.
>
> That said, looking at that twl4030 driver, there seems to be a double
> nesting issue. So also the simple one level parent redirection of the
> irq resend wont work. I really wonder why this only triggers in the
> context of resume.
>
> Now the resend issue is simple to fix. The resume time ordering
> constraints is a bit more involved, but it should be possible w/o
> inflicting anything more complex on drivers than requiring them to use
> irq_set_parent(), which should be name irq_set_nested_parent().
>
> Completely untested patch below. It applies on top of
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/pm
>
> So what twl4030 needs on top of that are calls to
> irq_set_nested_parent() for the nested interrupts.
>
> That will automatically establish the nesting depth, redirect the
> retrigger to the top most interrupt and solve the resume ordering.
>
> The resume ordering is the reverse of the nesting:
>
>      top-irq1 - nested irq10   - nested irq20 (parent = 10)
>      	     | (parent = 1)   - nested irq21 (parent = 10)
> 	     |
>      	     - nested irq11   - nested irq22 (parent = 11)	
> 	     | (parent = 1)   - nested irq23 (parent = 11)
> 	     |
> 	     - nested irq12   - nested irq24 (parent = 12)
> 	       (parent = 1)   - nested irq25 (parent = 12)
>
> So the resume ordering is
>
>     20-21-22-23-24-25 - 10-11-12 - 1
>
> Thanks,
>
> 	tglx
>
> ---
>   drivers/mfd/twl6030-irq.c |    3 --
>   include/linux/irq.h       |    9 ------
>   include/linux/irqdesc.h   |    5 +++
>   kernel/irq/manage.c       |   28 ++++++++++++++++++---
>   kernel/irq/pm.c           |   60 +++++++++++++++++++++++++++++++++++++++++-----
>   kernel/irq/resend.c       |    4 +--
>   kernel/irq/settings.h     |    5 +++
>   7 files changed, 92 insertions(+), 22 deletions(-)
>
> Index: tip/drivers/mfd/twl6030-irq.c
> ===================================================================
> --- tip.orig/drivers/mfd/twl6030-irq.c
> +++ tip/drivers/mfd/twl6030-irq.c
> @@ -350,8 +350,7 @@ static int twl6030_irq_map(struct irq_do
>
>   	irq_set_chip_data(virq, pdata);
>   	irq_set_chip_and_handler(virq,  &pdata->irq_chip, handle_simple_irq);
> -	irq_set_nested_thread(virq, true);
> -	irq_set_parent(virq, pdata->twl_irq);
> +	irq_set_nested_parent(virq, pdata->twl_irq);
>
>   #ifdef CONFIG_ARM
>   	/*
> Index: tip/include/linux/irq.h
> ===================================================================
> --- tip.orig/include/linux/irq.h
> +++ tip/include/linux/irq.h
> @@ -415,14 +415,7 @@ static inline void irq_move_masked_irq(s
>
>   extern int no_irq_affinity;
>
> -#ifdef CONFIG_HARDIRQS_SW_RESEND
> -int irq_set_parent(int irq, int parent_irq);
> -#else
> -static inline int irq_set_parent(int irq, int parent_irq)
> -{
> -	return 0;
> -}
> -#endif
> +int irq_set_nested_parent(int irq, int parent_irq);
>
>   /*
>    * Built-in IRQ handlers for various IRQ types,
> Index: tip/include/linux/irqdesc.h
> ===================================================================
> --- tip.orig/include/linux/irqdesc.h
> +++ tip/include/linux/irqdesc.h
> @@ -41,6 +41,9 @@ struct irq_desc;
>    *			IRQF_NO_SUSPEND set
>    * @force_resume_depth:	number of irqactions on a irq descriptor with
>    *			IRQF_FORCE_RESUME set
> + * @parent_irq:		Parent interrupt in case of nested threads
> + * @parent_top:		Top parent interrupt in case of multiple nested threads
> + * @parent_depth:	Nest level of multiple nested threads for resume ordering
>    * @dir:		/proc/irq/ procfs entry
>    * @name:		flow handler name for /proc/interrupts output
>    */
> @@ -82,6 +85,8 @@ struct irq_desc {
>   	struct proc_dir_entry	*dir;
>   #endif
>   	int			parent_irq;
> +	int			parent_depth;
> +	int			parent_top;
>   	struct module		*owner;
>   	const char		*name;
>   } ____cacheline_internodealigned_in_smp;
> Index: tip/kernel/irq/manage.c
> ===================================================================
> --- tip.orig/kernel/irq/manage.c
> +++ tip/kernel/irq/manage.c
> @@ -624,21 +624,41 @@ int __irq_set_trigger(struct irq_desc *d
>   	return ret;
>   }
>
> -#ifdef CONFIG_HARDIRQS_SW_RESEND
> -int irq_set_parent(int irq, int parent_irq)
> +int irq_set_nested_parent(int irq, int parent_irq)
>   {
> +	struct irq_desc *desc;
>   	unsigned long flags;
> -	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
> +	int parent_depth, parent_top;
>
> +	/*
> +	 * Get the parent irq first to retrieve the parent depth and
> +	 * the top level parent irq number. The depth is required for
> +	 * resume ordering, the top level irq number for software
> +	 * resend.
> +	 */
> +	desc = irq_get_desc_lock(parent_irq, &flags, 0);
> +	if (!desc)
> +		return -EINVAL;
> +	parent_depth = desc->parent_depth;
> +	parent_top = desc->parent_top;
> +	irq_put_desc_unlock(desc, flags);
> +
> +	/* Setup the info in the child */
> +	desc = irq_get_desc_lock(parent_irq, &flags, 0);
>   	if (!desc)
>   		return -EINVAL;
>
>   	desc->parent_irq = parent_irq;
> +	desc->parent_top = parent_top ? parent_top : parent_irq;
> +	desc->parent_depth = parent_depth + 1;
> +
> +	/* Set the nested thread bit as well */
> +	irq_settings_set_nested_thread(desc);
>
>   	irq_put_desc_unlock(desc, flags);
> +
>   	return 0;
>   }
> -#endif
>
>   /*
>    * Default primary interrupt handler for threaded interrupts. Is
> Index: tip/kernel/irq/pm.c
> ===================================================================
> --- tip.orig/kernel/irq/pm.c
> +++ tip/kernel/irq/pm.c
> @@ -8,6 +8,7 @@
>
>   #include <linux/irq.h>
>   #include <linux/module.h>
> +#include <linux/bitmap.h>
>   #include <linux/interrupt.h>
>   #include <linux/suspend.h>
>   #include <linux/syscore_ops.h>
> @@ -93,6 +94,37 @@ static bool suspend_device_irq(struct ir
>   	return true;
>   }
>
> +#define IRQ_NEST_DEPTH 4
> +
> +struct resume_level {
> +	unsigned long	map[BITS_TO_LONGS(IRQ_BITMAP_BITS)];
> +	int		top;
> +};
> +
> +static struct resume_level resume_order_lvls[IRQ_NEST_DEPTH];
> +static struct resume_level resume_early_lvl;
> +static int resume_order_max_depth;
> +
> +static void update_resume_order(struct irq_desc *desc, int irq)
> +{
> +	bool early = desc->action && desc->action->flags & IRQF_EARLY_RESUME;
> +	int depth = desc->parent_depth;
> +	struct resume_level *l;
> +
> +	if (!early) {
> +		if (WARN_ON_ONCE(depth >= IRQ_NEST_DEPTH))
> +			depth = IRQ_NEST_DEPTH - 1;
> +		l = resume_order_lvls + depth;
> +		if (depth > resume_order_max_depth)
> +			resume_order_max_depth = depth;
> +	} else {
> +		WARN_ON_ONCE(depth != 0);
> +		l = &resume_early_lvl;
> +	}
> +	set_bit(irq, l->map);
> +	l->top = irq;
> +}
> +
>   /**
>    * suspend_device_irqs - disable all currently enabled interrupt lines
>    *
> @@ -114,11 +146,16 @@ void suspend_device_irqs(void)
>   	struct irq_desc *desc;
>   	int irq;
>
> +	memset(resume_order_lvls, 0, sizeof(resume_order_lvls));
> +	memset(resume_early_lvl, 0, sizeof(resume_early_lvl));
> +	resume_order_max_depth = 0;
> +
>   	for_each_irq_desc(irq, desc) {
>   		unsigned long flags;
>   		bool sync;
>
>   		raw_spin_lock_irqsave(&desc->lock, flags);
> +		update_resume_order(desc, irq);
>   		sync = suspend_device_irq(desc, irq);
>   		raw_spin_unlock_irqrestore(&desc->lock, flags);
>
> @@ -146,17 +183,16 @@ resume:
>   	__enable_irq(desc, irq);
>   }
>
> -static void resume_irqs(bool want_early)
> +static void resume_irq_lvl(struct resume_level *l)
>   {
>   	struct irq_desc *desc;
> +	unsigned long flags;
>   	int irq;
>
> -	for_each_irq_desc(irq, desc) {
> -		unsigned long flags;
> -		bool is_early = desc->action &&
> -			desc->action->flags & IRQF_EARLY_RESUME;
> +	for_each_set_bit(irq, l->map, l->top + 1) {
>
> -		if (!is_early && want_early)
> +		desc = irq_to_desc(irq);
> +		if (WARN_ON_ONCE(!desc))
>   			continue;
>
>   		raw_spin_lock_irqsave(&desc->lock, flags);
> @@ -165,6 +201,18 @@ static void resume_irqs(bool want_early)
>   	}
>   }
>
> +static void resume_irqs(bool early)
> +{
> +	int i;
> +
> +	if (early) {
> +		resume_irq_lvl(&resume_early_lvl);
> +	} else {
> +		for (i = resume_order_max_depth; i >= 0; i--)
> +			resume_irq_lvl(resume_order_lvls + i);
> +	}
> +}
> +
>   /**
>    * irq_pm_syscore_ops - enable interrupt lines early
>    *
> Index: tip/kernel/irq/resend.c
> ===================================================================
> --- tip.orig/kernel/irq/resend.c
> +++ tip/kernel/irq/resend.c
> @@ -79,9 +79,9 @@ void check_irq_resend(struct irq_desc *d
>   			 * in the thread context of the parent irq,
>   			 * retrigger the parent.
>   			 */
> -			if (desc->parent_irq &&
> +			if (desc->parent_top &&
>   			    irq_settings_is_nested_thread(desc))
> -				irq = desc->parent_irq;
> +				irq = desc->parent_top;
>   			/* Set it pending and activate the softirq: */
>   			set_bit(irq, irqs_resend);
>   			tasklet_schedule(&resend_tasklet);
> Index: tip/kernel/irq/settings.h
> ===================================================================
> --- tip.orig/kernel/irq/settings.h
> +++ tip/kernel/irq/settings.h
> @@ -53,6 +53,11 @@ static inline void irq_settings_set_per_
>   	desc->status_use_accessors |= _IRQ_PER_CPU;
>   }
>
> +static inline void irq_settings_set_nested_thread(struct irq_desc *desc)
> +{
> +	desc->status_use_accessors |= _IRQ_NESTED_THREAD;
> +}
> +
>   static inline void irq_settings_set_no_balancing(struct irq_desc *desc)
>   {
>   	desc->status_use_accessors |= _IRQ_NO_BALANCING;
>
>
>
diff mbox

Patch

Index: tip/drivers/mfd/twl6030-irq.c
===================================================================
--- tip.orig/drivers/mfd/twl6030-irq.c
+++ tip/drivers/mfd/twl6030-irq.c
@@ -350,8 +350,7 @@  static int twl6030_irq_map(struct irq_do
 
 	irq_set_chip_data(virq, pdata);
 	irq_set_chip_and_handler(virq,  &pdata->irq_chip, handle_simple_irq);
-	irq_set_nested_thread(virq, true);
-	irq_set_parent(virq, pdata->twl_irq);
+	irq_set_nested_parent(virq, pdata->twl_irq);
 
 #ifdef CONFIG_ARM
 	/*
Index: tip/include/linux/irq.h
===================================================================
--- tip.orig/include/linux/irq.h
+++ tip/include/linux/irq.h
@@ -415,14 +415,7 @@  static inline void irq_move_masked_irq(s
 
 extern int no_irq_affinity;
 
-#ifdef CONFIG_HARDIRQS_SW_RESEND
-int irq_set_parent(int irq, int parent_irq);
-#else
-static inline int irq_set_parent(int irq, int parent_irq)
-{
-	return 0;
-}
-#endif
+int irq_set_nested_parent(int irq, int parent_irq);
 
 /*
  * Built-in IRQ handlers for various IRQ types,
Index: tip/include/linux/irqdesc.h
===================================================================
--- tip.orig/include/linux/irqdesc.h
+++ tip/include/linux/irqdesc.h
@@ -41,6 +41,9 @@  struct irq_desc;
  *			IRQF_NO_SUSPEND set
  * @force_resume_depth:	number of irqactions on a irq descriptor with
  *			IRQF_FORCE_RESUME set
+ * @parent_irq:		Parent interrupt in case of nested threads
+ * @parent_top:		Top parent interrupt in case of multiple nested threads
+ * @parent_depth:	Nest level of multiple nested threads for resume ordering
  * @dir:		/proc/irq/ procfs entry
  * @name:		flow handler name for /proc/interrupts output
  */
@@ -82,6 +85,8 @@  struct irq_desc {
 	struct proc_dir_entry	*dir;
 #endif
 	int			parent_irq;
+	int			parent_depth;
+	int			parent_top;
 	struct module		*owner;
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
Index: tip/kernel/irq/manage.c
===================================================================
--- tip.orig/kernel/irq/manage.c
+++ tip/kernel/irq/manage.c
@@ -624,21 +624,41 @@  int __irq_set_trigger(struct irq_desc *d
 	return ret;
 }
 
-#ifdef CONFIG_HARDIRQS_SW_RESEND
-int irq_set_parent(int irq, int parent_irq)
+int irq_set_nested_parent(int irq, int parent_irq)
 {
+	struct irq_desc *desc;
 	unsigned long flags;
-	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
+	int parent_depth, parent_top;
 
+	/*
+	 * Get the parent irq first to retrieve the parent depth and
+	 * the top level parent irq number. The depth is required for
+	 * resume ordering, the top level irq number for software
+	 * resend.
+	 */
+	desc = irq_get_desc_lock(parent_irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+	parent_depth = desc->parent_depth;
+	parent_top = desc->parent_top;
+	irq_put_desc_unlock(desc, flags);
+
+	/* Setup the info in the child */
+	desc = irq_get_desc_lock(parent_irq, &flags, 0);
 	if (!desc)
 		return -EINVAL;
 
 	desc->parent_irq = parent_irq;
+	desc->parent_top = parent_top ? parent_top : parent_irq;
+	desc->parent_depth = parent_depth + 1;
+
+	/* Set the nested thread bit as well */
+	irq_settings_set_nested_thread(desc);
 
 	irq_put_desc_unlock(desc, flags);
+
 	return 0;
 }
-#endif
 
 /*
  * Default primary interrupt handler for threaded interrupts. Is
Index: tip/kernel/irq/pm.c
===================================================================
--- tip.orig/kernel/irq/pm.c
+++ tip/kernel/irq/pm.c
@@ -8,6 +8,7 @@ 
 
 #include <linux/irq.h>
 #include <linux/module.h>
+#include <linux/bitmap.h>
 #include <linux/interrupt.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
@@ -93,6 +94,37 @@  static bool suspend_device_irq(struct ir
 	return true;
 }
 
+#define IRQ_NEST_DEPTH 4
+
+struct resume_level {
+	unsigned long	map[BITS_TO_LONGS(IRQ_BITMAP_BITS)];
+	int		top;
+};
+
+static struct resume_level resume_order_lvls[IRQ_NEST_DEPTH];
+static struct resume_level resume_early_lvl;
+static int resume_order_max_depth;
+
+static void update_resume_order(struct irq_desc *desc, int irq)
+{
+	bool early = desc->action && desc->action->flags & IRQF_EARLY_RESUME;
+	int depth = desc->parent_depth;
+	struct resume_level *l;
+
+	if (!early) {
+		if (WARN_ON_ONCE(depth >= IRQ_NEST_DEPTH))
+			depth = IRQ_NEST_DEPTH - 1;
+		l = resume_order_lvls + depth;
+		if (depth > resume_order_max_depth)
+			resume_order_max_depth = depth;
+	} else {
+		WARN_ON_ONCE(depth != 0);
+		l = &resume_early_lvl;
+	}
+	set_bit(irq, l->map);
+	l->top = irq;
+}
+
 /**
  * suspend_device_irqs - disable all currently enabled interrupt lines
  *
@@ -114,11 +146,16 @@  void suspend_device_irqs(void)
 	struct irq_desc *desc;
 	int irq;
 
+	memset(resume_order_lvls, 0, sizeof(resume_order_lvls));
+	memset(resume_early_lvl, 0, sizeof(resume_early_lvl));
+	resume_order_max_depth = 0;
+
 	for_each_irq_desc(irq, desc) {
 		unsigned long flags;
 		bool sync;
 
 		raw_spin_lock_irqsave(&desc->lock, flags);
+		update_resume_order(desc, irq);
 		sync = suspend_device_irq(desc, irq);
 		raw_spin_unlock_irqrestore(&desc->lock, flags);
 
@@ -146,17 +183,16 @@  resume:
 	__enable_irq(desc, irq);
 }
 
-static void resume_irqs(bool want_early)
+static void resume_irq_lvl(struct resume_level *l)
 {
 	struct irq_desc *desc;
+	unsigned long flags;
 	int irq;
 
-	for_each_irq_desc(irq, desc) {
-		unsigned long flags;
-		bool is_early = desc->action &&
-			desc->action->flags & IRQF_EARLY_RESUME;
+	for_each_set_bit(irq, l->map, l->top + 1) {
 
-		if (!is_early && want_early)
+		desc = irq_to_desc(irq);
+		if (WARN_ON_ONCE(!desc))
 			continue;
 
 		raw_spin_lock_irqsave(&desc->lock, flags);
@@ -165,6 +201,18 @@  static void resume_irqs(bool want_early)
 	}
 }
 
+static void resume_irqs(bool early)
+{
+	int i;
+
+	if (early) {
+		resume_irq_lvl(&resume_early_lvl);
+	} else {
+		for (i = resume_order_max_depth; i >= 0; i--)
+			resume_irq_lvl(resume_order_lvls + i);
+	}
+}
+
 /**
  * irq_pm_syscore_ops - enable interrupt lines early
  *
Index: tip/kernel/irq/resend.c
===================================================================
--- tip.orig/kernel/irq/resend.c
+++ tip/kernel/irq/resend.c
@@ -79,9 +79,9 @@  void check_irq_resend(struct irq_desc *d
 			 * in the thread context of the parent irq,
 			 * retrigger the parent.
 			 */
-			if (desc->parent_irq &&
+			if (desc->parent_top &&
 			    irq_settings_is_nested_thread(desc))
-				irq = desc->parent_irq;
+				irq = desc->parent_top;
 			/* Set it pending and activate the softirq: */
 			set_bit(irq, irqs_resend);
 			tasklet_schedule(&resend_tasklet);
Index: tip/kernel/irq/settings.h
===================================================================
--- tip.orig/kernel/irq/settings.h
+++ tip/kernel/irq/settings.h
@@ -53,6 +53,11 @@  static inline void irq_settings_set_per_
 	desc->status_use_accessors |= _IRQ_PER_CPU;
 }
 
+static inline void irq_settings_set_nested_thread(struct irq_desc *desc)
+{
+	desc->status_use_accessors |= _IRQ_NESTED_THREAD;
+}
+
 static inline void irq_settings_set_no_balancing(struct irq_desc *desc)
 {
 	desc->status_use_accessors |= _IRQ_NO_BALANCING;