Patchwork [v3] powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule

login
register
mail settings
Submitter Benjamin Herrenschmidt
Date Oct. 27, 2009, 5:41 a.m.
Message ID <1256622077.11607.85.camel@pasglop>
Download mbox | patch
Permalink /patch/36965/
State Accepted, archived
Commit 4f917ba3d5ee9c98d60fa357e799942df8412de3
Headers show

Comments

Benjamin Herrenschmidt - Oct. 27, 2009, 5:41 a.m.
> So I _think_ that the irqs on/off accounting for lockdep isn't quite
> right. What do you think of this slightly modified version ? I've only
> done a quick boot test on a G5 with lockdep enabled and a played a bit,
> nothing shows up so far but it's definitely not conclusive.
> 
> The main difference is that I call trace_hardirqs_off to "advertise"
> the fact that we are soft-disabling (it could be a dup, but at this
> stage this is no big deal, but it's not always, like in syscall return
> the kernel thinks we have interrupts enabled and could thus get out
> of sync without that).
> 
> I also mark the PACA hard disable to reflect the MSR:EE state before
> calling into preempt_schedule_irq().

Allright, second thought :-)

It's probably simpler to just keep hardirqs off. Code is smaller and
simpler and the scheduler will re-enable them soon enough anyways.

This version of the patch also spaces the code a bit and adds comments
which makes them (the code and the patch) more readable.

Cheers,
Ben.
 
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH v3] powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule

Based on an original patch by Valentine Barshak <vbarshak@ru.mvista.com>

Use preempt_schedule_irq to prevent infinite irq-entry and
eventual stack overflow problems with fast-paced IRQ sources.

This kind of problems has been observed on the PASemi Electra IDE
controller. We have to make sure we are soft-disabled before calling
preempt_schedule_irq and hard disable interrupts after that
to avoid unrecoverable exceptions.

This patch also moves the "clrrdi r9,r1,THREAD_SHIFT" out of
the #ifdef CONFIG_PPC_BOOK3E scope, since r9 is clobbered
and has to be restored in both cases.
---
 arch/powerpc/kernel/entry_64.S |   41 ++++++++++++++++++++-------------------
 1 files changed, 21 insertions(+), 20 deletions(-)
Valentine Barshak - Oct. 28, 2009, 7:19 p.m.
Benjamin Herrenschmidt wrote:
>> So I _think_ that the irqs on/off accounting for lockdep isn't quite
>> right. What do you think of this slightly modified version ? I've only
>> done a quick boot test on a G5 with lockdep enabled and a played a bit,
>> nothing shows up so far but it's definitely not conclusive.
>>
>> The main difference is that I call trace_hardirqs_off to "advertise"
>> the fact that we are soft-disabling (it could be a dup, but at this
>> stage this is no big deal, but it's not always, like in syscall return
>> the kernel thinks we have interrupts enabled and could thus get out
>> of sync without that).
>>
>> I also mark the PACA hard disable to reflect the MSR:EE state before
>> calling into preempt_schedule_irq().
> 
> Allright, second thought :-)
> 
> It's probably simpler to just keep hardirqs off. Code is smaller and
> simpler and the scheduler will re-enable them soon enough anyways.
> 
> This version of the patch also spaces the code a bit and adds comments
> which makes them (the code and the patch) more readable.
> 

This one seems to work fine on pasemi and another maple-compatible board.

> Cheers,
> Ben.
>  
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> [PATCH v3] powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
> 
> Based on an original patch by Valentine Barshak <vbarshak@ru.mvista.com>
> 
> Use preempt_schedule_irq to prevent infinite irq-entry and
> eventual stack overflow problems with fast-paced IRQ sources.
> 
> This kind of problems has been observed on the PASemi Electra IDE
> controller. We have to make sure we are soft-disabled before calling
> preempt_schedule_irq and hard disable interrupts after that
> to avoid unrecoverable exceptions.
> 
> This patch also moves the "clrrdi r9,r1,THREAD_SHIFT" out of
> the #ifdef CONFIG_PPC_BOOK3E scope, since r9 is clobbered
> and has to be restored in both cases.
> ---
>  arch/powerpc/kernel/entry_64.S |   41 ++++++++++++++++++++-------------------
>  1 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index f9fd54b..9763267 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -658,42 +658,43 @@ do_work:
>  	cmpdi	r0,0
>  	crandc	eq,cr1*4+eq,eq
>  	bne	restore
> -	/* here we are preempting the current task */
> -1:
> -#ifdef CONFIG_TRACE_IRQFLAGS
> -	bl	.trace_hardirqs_on
> -	/* Note: we just clobbered r10 which used to contain the previous
> -	 * MSR before the hard-disabling done by the caller of do_work.
> -	 * We don't have that value anymore, but it doesn't matter as
> -	 * we will hard-enable unconditionally, we can just reload the
> -	 * current MSR into r10
> +
> +	/* Here we are preempting the current task.
> +	 *
> +	 * Ensure interrupts are soft-disabled. We also properly mark
> +	 * the PACA to reflect the fact that they are hard-disabled
> +	 * and trace the change
>  	 */
> -	mfmsr	r10
> -#endif /* CONFIG_TRACE_IRQFLAGS */
> -	li	r0,1
> +	li	r0,0
>  	stb	r0,PACASOFTIRQEN(r13)
>  	stb	r0,PACAHARDIRQEN(r13)

I'm just not sure that we need to clear HARDIRQEN here, since we don't 
really hard-disable the the interrupts.

Thanks,
Val.

> +	TRACE_DISABLE_INTS
> +
> +	/* Call the scheduler with soft IRQs off */
> +1:	bl	.preempt_schedule_irq
> +
> +	/* Hard-disable interrupts again (and update PACA) */
>  #ifdef CONFIG_PPC_BOOK3E
> -	wrteei	1
> -	bl	.preempt_schedule
>  	wrteei	0
>  #else
> -	ori	r10,r10,MSR_EE
> -	mtmsrd	r10,1		/* reenable interrupts */
> -	bl	.preempt_schedule
>  	mfmsr	r10
> -	clrrdi	r9,r1,THREAD_SHIFT
> -	rldicl	r10,r10,48,1	/* disable interrupts again */
> +	rldicl	r10,r10,48,1
>  	rotldi	r10,r10,16
>  	mtmsrd	r10,1
>  #endif /* CONFIG_PPC_BOOK3E */
> +	li	r0,0
> +	stb	r0,PACAHARDIRQEN(r13)
> +
> +	/* Re-test flags and eventually loop */
> +	clrrdi	r9,r1,THREAD_SHIFT
>  	ld	r4,TI_FLAGS(r9)
>  	andi.	r0,r4,_TIF_NEED_RESCHED
>  	bne	1b
>  	b	restore
>  
>  user_work:
> -#endif
> +#endif /* CONFIG_PREEMPT */
> +
>  	/* Enable interrupts */
>  #ifdef CONFIG_PPC_BOOK3E
>  	wrteei	1
Benjamin Herrenschmidt - Oct. 28, 2009, 8:30 p.m.
On Wed, 2009-10-28 at 22:19 +0300, Valentine wrote:

> I'm just not sure that we need to clear HARDIRQEN here, since we don't 
> really hard-disable the the interrupts.

We do, or rather, we come in with the interrupts hard disabled, no ?

Ben.

> Thanks,
> Val.
> 
> > +	TRACE_DISABLE_INTS
> > +
> > +	/* Call the scheduler with soft IRQs off */
> > +1:	bl	.preempt_schedule_irq
> > +
> > +	/* Hard-disable interrupts again (and update PACA) */
> >  #ifdef CONFIG_PPC_BOOK3E
> > -	wrteei	1
> > -	bl	.preempt_schedule
> >  	wrteei	0
> >  #else
> > -	ori	r10,r10,MSR_EE
> > -	mtmsrd	r10,1		/* reenable interrupts */
> > -	bl	.preempt_schedule
> >  	mfmsr	r10
> > -	clrrdi	r9,r1,THREAD_SHIFT
> > -	rldicl	r10,r10,48,1	/* disable interrupts again */
> > +	rldicl	r10,r10,48,1
> >  	rotldi	r10,r10,16
> >  	mtmsrd	r10,1
> >  #endif /* CONFIG_PPC_BOOK3E */
> > +	li	r0,0
> > +	stb	r0,PACAHARDIRQEN(r13)
> > +
> > +	/* Re-test flags and eventually loop */
> > +	clrrdi	r9,r1,THREAD_SHIFT
> >  	ld	r4,TI_FLAGS(r9)
> >  	andi.	r0,r4,_TIF_NEED_RESCHED
> >  	bne	1b
> >  	b	restore
> >  
> >  user_work:
> > -#endif
> > +#endif /* CONFIG_PREEMPT */
> > +
> >  	/* Enable interrupts */
> >  #ifdef CONFIG_PPC_BOOK3E
> >  	wrteei	1
Valentine Barshak - Oct. 28, 2009, 9:28 p.m.
Benjamin Herrenschmidt wrote:
> On Wed, 2009-10-28 at 22:19 +0300, Valentine wrote:
> 
>> I'm just not sure that we need to clear HARDIRQEN here, since we don't 
>> really hard-disable the the interrupts.
> 
> We do, or rather, we come in with the interrupts hard disabled, no ?

Yes, looks like the interrupts are disabled at this point (before 
preempt_schedule_irq) most of the times, but we don't hard-disable them 
here. We just soft-disable them to make preempt_schedule_irq happy. Even 
if an interrupt fires, it will be hard-disabled and the hardirq flag 
will be cleared by the exception handler right away. I just think that 
there's no need to clear hardirq flag if we don't clear MSR_EE.

Thanks,
Val.
> 
> Ben.
> 
>> Thanks,
>> Val.
>>
>>> +	TRACE_DISABLE_INTS
>>> +
>>> +	/* Call the scheduler with soft IRQs off */
>>> +1:	bl	.preempt_schedule_irq
>>> +
>>> +	/* Hard-disable interrupts again (and update PACA) */
>>>  #ifdef CONFIG_PPC_BOOK3E
>>> -	wrteei	1
>>> -	bl	.preempt_schedule
>>>  	wrteei	0
>>>  #else
>>> -	ori	r10,r10,MSR_EE
>>> -	mtmsrd	r10,1		/* reenable interrupts */
>>> -	bl	.preempt_schedule
>>>  	mfmsr	r10
>>> -	clrrdi	r9,r1,THREAD_SHIFT
>>> -	rldicl	r10,r10,48,1	/* disable interrupts again */
>>> +	rldicl	r10,r10,48,1
>>>  	rotldi	r10,r10,16
>>>  	mtmsrd	r10,1
>>>  #endif /* CONFIG_PPC_BOOK3E */
>>> +	li	r0,0
>>> +	stb	r0,PACAHARDIRQEN(r13)
>>> +
>>> +	/* Re-test flags and eventually loop */
>>> +	clrrdi	r9,r1,THREAD_SHIFT
>>>  	ld	r4,TI_FLAGS(r9)
>>>  	andi.	r0,r4,_TIF_NEED_RESCHED
>>>  	bne	1b
>>>  	b	restore
>>>  
>>>  user_work:
>>> -#endif
>>> +#endif /* CONFIG_PREEMPT */
>>> +
>>>  	/* Enable interrupts */
>>>  #ifdef CONFIG_PPC_BOOK3E
>>>  	wrteei	1
> 
>
Benjamin Herrenschmidt - Oct. 28, 2009, 9:37 p.m.
On Thu, 2009-10-29 at 00:28 +0300, Valentine wrote:
> Benjamin Herrenschmidt wrote:
> > On Wed, 2009-10-28 at 22:19 +0300, Valentine wrote:
> > 
> >> I'm just not sure that we need to clear HARDIRQEN here, since we don't 
> >> really hard-disable the the interrupts.
> > 
> > We do, or rather, we come in with the interrupts hard disabled, no ?
> 
> Yes, looks like the interrupts are disabled at this point (before 
> preempt_schedule_irq) most of the times, but we don't hard-disable them 
> here. We just soft-disable them to make preempt_schedule_irq happy. Even 
> if an interrupt fires, it will be hard-disabled and the hardirq flag 
> will be cleared by the exception handler right away. I just think that 
> there's no need to clear hardirq flag if we don't clear MSR_EE.

My point is that MSR_EE _is_ already clear... isn't it ? So either we
set it back, or we clear HARDIRQEN to reflect it. It will be re-enable
as soon as preempt_schedule_irq() calls local_irq_enable() which is soon
enough anyways.

Also that avoids perf interrupt sneaking in since those act as NMIs in
that regard and -will- get in even when soft disabled.

Cheers,
Ben.

> Thanks,
> Val.
> > 
> > Ben.
> > 
> >> Thanks,
> >> Val.
> >>
> >>> +	TRACE_DISABLE_INTS
> >>> +
> >>> +	/* Call the scheduler with soft IRQs off */
> >>> +1:	bl	.preempt_schedule_irq
> >>> +
> >>> +	/* Hard-disable interrupts again (and update PACA) */
> >>>  #ifdef CONFIG_PPC_BOOK3E
> >>> -	wrteei	1
> >>> -	bl	.preempt_schedule
> >>>  	wrteei	0
> >>>  #else
> >>> -	ori	r10,r10,MSR_EE
> >>> -	mtmsrd	r10,1		/* reenable interrupts */
> >>> -	bl	.preempt_schedule
> >>>  	mfmsr	r10
> >>> -	clrrdi	r9,r1,THREAD_SHIFT
> >>> -	rldicl	r10,r10,48,1	/* disable interrupts again */
> >>> +	rldicl	r10,r10,48,1
> >>>  	rotldi	r10,r10,16
> >>>  	mtmsrd	r10,1
> >>>  #endif /* CONFIG_PPC_BOOK3E */
> >>> +	li	r0,0
> >>> +	stb	r0,PACAHARDIRQEN(r13)
> >>> +
> >>> +	/* Re-test flags and eventually loop */
> >>> +	clrrdi	r9,r1,THREAD_SHIFT
> >>>  	ld	r4,TI_FLAGS(r9)
> >>>  	andi.	r0,r4,_TIF_NEED_RESCHED
> >>>  	bne	1b
> >>>  	b	restore
> >>>  
> >>>  user_work:
> >>> -#endif
> >>> +#endif /* CONFIG_PREEMPT */
> >>> +
> >>>  	/* Enable interrupts */
> >>>  #ifdef CONFIG_PPC_BOOK3E
> >>>  	wrteei	1
> > 
> >
Valentine Barshak - Oct. 28, 2009, 10:49 p.m.
Benjamin Herrenschmidt wrote:
> On Thu, 2009-10-29 at 00:28 +0300, Valentine wrote:
>> Benjamin Herrenschmidt wrote:
>>> On Wed, 2009-10-28 at 22:19 +0300, Valentine wrote:
>>>
>>>> I'm just not sure that we need to clear HARDIRQEN here, since we don't 
>>>> really hard-disable the the interrupts.
>>> We do, or rather, we come in with the interrupts hard disabled, no ?
>> Yes, looks like the interrupts are disabled at this point (before 
>> preempt_schedule_irq) most of the times, but we don't hard-disable them 
>> here. We just soft-disable them to make preempt_schedule_irq happy. Even 
>> if an interrupt fires, it will be hard-disabled and the hardirq flag 
>> will be cleared by the exception handler right away. I just think that 
>> there's no need to clear hardirq flag if we don't clear MSR_EE.
> 
> My point is that MSR_EE _is_ already clear... isn't it ? 

Yes, the MSR_EE is cleared before we jump to do_work. I'm OK with 
clearing the hardirqenable flag. I just assumed that the hardirq flag 
was supposed to reflect the MSR_EE state, so it looked a bit odd 
clearing the MSR_EE at one place and then reflecting the change at another.

Anyway, the patch works fine.

Thanks,
Val.

So either we
> set it back, or we clear HARDIRQEN to reflect it. It will be re-enable
> as soon as preempt_schedule_irq() calls local_irq_enable() which is soon
> enough anyways.
> 
> Also that avoids perf interrupt sneaking in since those act as NMIs in
> that regard and -will- get in even when soft disabled.
> 
> Cheers,
> Ben.
> 
>> Thanks,
>> Val.
>>> Ben.
>>>
>>>> Thanks,
>>>> Val.
>>>>
>>>>> +	TRACE_DISABLE_INTS
>>>>> +
>>>>> +	/* Call the scheduler with soft IRQs off */
>>>>> +1:	bl	.preempt_schedule_irq
>>>>> +
>>>>> +	/* Hard-disable interrupts again (and update PACA) */
>>>>>  #ifdef CONFIG_PPC_BOOK3E
>>>>> -	wrteei	1
>>>>> -	bl	.preempt_schedule
>>>>>  	wrteei	0
>>>>>  #else
>>>>> -	ori	r10,r10,MSR_EE
>>>>> -	mtmsrd	r10,1		/* reenable interrupts */
>>>>> -	bl	.preempt_schedule
>>>>>  	mfmsr	r10
>>>>> -	clrrdi	r9,r1,THREAD_SHIFT
>>>>> -	rldicl	r10,r10,48,1	/* disable interrupts again */
>>>>> +	rldicl	r10,r10,48,1
>>>>>  	rotldi	r10,r10,16
>>>>>  	mtmsrd	r10,1
>>>>>  #endif /* CONFIG_PPC_BOOK3E */
>>>>> +	li	r0,0
>>>>> +	stb	r0,PACAHARDIRQEN(r13)
>>>>> +
>>>>> +	/* Re-test flags and eventually loop */
>>>>> +	clrrdi	r9,r1,THREAD_SHIFT
>>>>>  	ld	r4,TI_FLAGS(r9)
>>>>>  	andi.	r0,r4,_TIF_NEED_RESCHED
>>>>>  	bne	1b
>>>>>  	b	restore
>>>>>  
>>>>>  user_work:
>>>>> -#endif
>>>>> +#endif /* CONFIG_PREEMPT */
>>>>> +
>>>>>  	/* Enable interrupts */
>>>>>  #ifdef CONFIG_PPC_BOOK3E
>>>>>  	wrteei	1
>>>
> 
>
Benjamin Herrenschmidt - Oct. 29, 2009, 12:49 a.m.
> Yes, the MSR_EE is cleared before we jump to do_work. I'm OK with 
> clearing the hardirqenable flag. I just assumed that the hardirq flag 
> was supposed to reflect the MSR_EE state, so it looked a bit odd 
> clearing the MSR_EE at one place and then reflecting the change at another.

Yeah well, it is supposed to reflect EE in the "general case", it's just
that in the exception entry/exit, we take shortcuts when turning EE off
for short amount of times without reflecting it in the PACA. This is
why, in this case, since we are going back to C code, I want to have it
"fixed up" to reflect reality.

Cheers,
Ben.

> Anyway, the patch works fine.
> 
> Thanks,
> Val.
> 
> So either we
> > set it back, or we clear HARDIRQEN to reflect it. It will be re-enable
> > as soon as preempt_schedule_irq() calls local_irq_enable() which is soon
> > enough anyways.
> > 
> > Also that avoids perf interrupt sneaking in since those act as NMIs in
> > that regard and -will- get in even when soft disabled.
> > 
> > Cheers,
> > Ben.
> > 
> >> Thanks,
> >> Val.
> >>> Ben.
> >>>
> >>>> Thanks,
> >>>> Val.
> >>>>
> >>>>> +	TRACE_DISABLE_INTS
> >>>>> +
> >>>>> +	/* Call the scheduler with soft IRQs off */
> >>>>> +1:	bl	.preempt_schedule_irq
> >>>>> +
> >>>>> +	/* Hard-disable interrupts again (and update PACA) */
> >>>>>  #ifdef CONFIG_PPC_BOOK3E
> >>>>> -	wrteei	1
> >>>>> -	bl	.preempt_schedule
> >>>>>  	wrteei	0
> >>>>>  #else
> >>>>> -	ori	r10,r10,MSR_EE
> >>>>> -	mtmsrd	r10,1		/* reenable interrupts */
> >>>>> -	bl	.preempt_schedule
> >>>>>  	mfmsr	r10
> >>>>> -	clrrdi	r9,r1,THREAD_SHIFT
> >>>>> -	rldicl	r10,r10,48,1	/* disable interrupts again */
> >>>>> +	rldicl	r10,r10,48,1
> >>>>>  	rotldi	r10,r10,16
> >>>>>  	mtmsrd	r10,1
> >>>>>  #endif /* CONFIG_PPC_BOOK3E */
> >>>>> +	li	r0,0
> >>>>> +	stb	r0,PACAHARDIRQEN(r13)
> >>>>> +
> >>>>> +	/* Re-test flags and eventually loop */
> >>>>> +	clrrdi	r9,r1,THREAD_SHIFT
> >>>>>  	ld	r4,TI_FLAGS(r9)
> >>>>>  	andi.	r0,r4,_TIF_NEED_RESCHED
> >>>>>  	bne	1b
> >>>>>  	b	restore
> >>>>>  
> >>>>>  user_work:
> >>>>> -#endif
> >>>>> +#endif /* CONFIG_PREEMPT */
> >>>>> +
> >>>>>  	/* Enable interrupts */
> >>>>>  #ifdef CONFIG_PPC_BOOK3E
> >>>>>  	wrteei	1
> >>>
> > 
> >
Valentine Barshak - Nov. 6, 2009, 10:38 p.m.
Benjamin Herrenschmidt wrote:
>> Yes, the MSR_EE is cleared before we jump to do_work. I'm OK with 
>> clearing the hardirqenable flag. I just assumed that the hardirq flag 
>> was supposed to reflect the MSR_EE state, so it looked a bit odd 
>> clearing the MSR_EE at one place and then reflecting the change at another.
> 
> Yeah well, it is supposed to reflect EE in the "general case", it's just
> that in the exception entry/exit, we take shortcuts when turning EE off
> for short amount of times without reflecting it in the PACA. This is
> why, in this case, since we are going back to C code, I want to have it
> "fixed up" to reflect reality.
> 

Ben, this one works fine. Are you going to pick it?

Thanks,
Val.
Benjamin Herrenschmidt - Nov. 6, 2009, 10:49 p.m.
On Sat, 2009-11-07 at 01:38 +0300, Valentine wrote:
> Benjamin Herrenschmidt wrote:
> >> Yes, the MSR_EE is cleared before we jump to do_work. I'm OK with 
> >> clearing the hardirqenable flag. I just assumed that the hardirq flag 
> >> was supposed to reflect the MSR_EE state, so it looked a bit odd 
> >> clearing the MSR_EE at one place and then reflecting the change at another.
> > 
> > Yeah well, it is supposed to reflect EE in the "general case", it's just
> > that in the exception entry/exit, we take shortcuts when turning EE off
> > for short amount of times without reflecting it in the PACA. This is
> > why, in this case, since we are going back to C code, I want to have it
> > "fixed up" to reflect reality.
> > 
> 
> Ben, this one works fine. Are you going to pick it?

Already upstream:

Gitweb:
http://git.kernel.org/linus/4f917ba3d5ee9c98d60fa357e799942df8412de3
Commit:     4f917ba3d5ee9c98d60fa357e799942df8412de3
Parent:     01deab98e3ad8ff27243a8d5f8dd746c7110ae4f
Author:     Benjamin Herrenschmidt <benh@kernel.crashing.org>
AuthorDate: Mon Oct 26 19:41:17 2009 +0000
Committer:  Benjamin Herrenschmidt <benh@kernel.crashing.org>
CommitDate: Tue Oct 27 16:42:43 2009 +1100

    powerpc/ppc64: Use preempt_schedule_irq instead of preempt_schedule
    
    Based on an original patch by Valentine Barshak <vbarshak@ru.mvista.com>
    
    Use preempt_schedule_irq to prevent infinite irq-entry and
    eventual stack overflow problems with fast-paced IRQ sources.

    .../...

Now, it might be a good idea to do a -stable variant of it for 2.6.31
and back, but that will have to be a separate patch due to the new
Book3E churn in .32

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index f9fd54b..9763267 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -658,42 +658,43 @@  do_work:
 	cmpdi	r0,0
 	crandc	eq,cr1*4+eq,eq
 	bne	restore
-	/* here we are preempting the current task */
-1:
-#ifdef CONFIG_TRACE_IRQFLAGS
-	bl	.trace_hardirqs_on
-	/* Note: we just clobbered r10 which used to contain the previous
-	 * MSR before the hard-disabling done by the caller of do_work.
-	 * We don't have that value anymore, but it doesn't matter as
-	 * we will hard-enable unconditionally, we can just reload the
-	 * current MSR into r10
+
+	/* Here we are preempting the current task.
+	 *
+	 * Ensure interrupts are soft-disabled. We also properly mark
+	 * the PACA to reflect the fact that they are hard-disabled
+	 * and trace the change
 	 */
-	mfmsr	r10
-#endif /* CONFIG_TRACE_IRQFLAGS */
-	li	r0,1
+	li	r0,0
 	stb	r0,PACASOFTIRQEN(r13)
 	stb	r0,PACAHARDIRQEN(r13)
+	TRACE_DISABLE_INTS
+
+	/* Call the scheduler with soft IRQs off */
+1:	bl	.preempt_schedule_irq
+
+	/* Hard-disable interrupts again (and update PACA) */
 #ifdef CONFIG_PPC_BOOK3E
-	wrteei	1
-	bl	.preempt_schedule
 	wrteei	0
 #else
-	ori	r10,r10,MSR_EE
-	mtmsrd	r10,1		/* reenable interrupts */
-	bl	.preempt_schedule
 	mfmsr	r10
-	clrrdi	r9,r1,THREAD_SHIFT
-	rldicl	r10,r10,48,1	/* disable interrupts again */
+	rldicl	r10,r10,48,1
 	rotldi	r10,r10,16
 	mtmsrd	r10,1
 #endif /* CONFIG_PPC_BOOK3E */
+	li	r0,0
+	stb	r0,PACAHARDIRQEN(r13)
+
+	/* Re-test flags and eventually loop */
+	clrrdi	r9,r1,THREAD_SHIFT
 	ld	r4,TI_FLAGS(r9)
 	andi.	r0,r4,_TIF_NEED_RESCHED
 	bne	1b
 	b	restore
 
 user_work:
-#endif
+#endif /* CONFIG_PREEMPT */
+
 	/* Enable interrupts */
 #ifdef CONFIG_PPC_BOOK3E
 	wrteei	1