diff mbox series

5.7-rc interrupt_return Unrecoverable exception 380

Message ID alpine.LSU.2.11.2005011253250.3734@eggly.anvils (mailing list archive)
State Not Applicable
Headers show
Series 5.7-rc interrupt_return Unrecoverable exception 380 | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (54dc28ff5e0b3585224d49a31b53e030342ca5c3)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 warning Upstream build failed, couldn't test patch
snowpatch_ozlabs/checkpatch warning total: 1 errors, 0 warnings, 0 checks, 8 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Hugh Dickins May 1, 2020, 8:38 p.m. UTC
Hi Nick,

I've been getting an "Unrecoverable exception 380" after a few hours
of load on the G5 (yes, that G5!) with 5.7-rc: when interrupt_return
checks lazy_irq_pending, it crashes at check_preemption_disabled+0x24
with CONFIG_DEBUG_PREEMPT=y.

check_preemption_disabled():
lib/smp_processor_id.c:13
   0:	7c 08 02 a6 	mflr    r0
   4:	fb e1 ff f8 	std     r31,-8(r1)
   8:	fb 61 ff d8 	std     r27,-40(r1)
   c:	fb 81 ff e0 	std     r28,-32(r1)
  10:	fb a1 ff e8 	std     r29,-24(r1)
  14:	fb c1 ff f0 	std     r30,-16(r1)
get_current():
arch/powerpc/include/asm/current.h:20
  18:	eb ed 01 88 	ld      r31,392(r13)
check_preemption_disabled():
lib/smp_processor_id.c:13
  1c:	f8 01 00 10 	std     r0,16(r1)
  20:	f8 21 ff 61 	stdu    r1,-160(r1)
__read_once_size():
include/linux/compiler.h:199
  24:	81 3f 00 00 	lwz     r9,0(r31)
check_preemption_disabled():
lib/smp_processor_id.c:14
  28:	a3 cd 00 02 	lhz     r30,2(r13)

I don't read ppc assembly, and have not jotted down the registers,
but hope you can make sense of it. I get around it with the patch
below (just avoiding the debug), but have no idea whether it's a
necessary fix or a hacky workaround.

Hugh

Comments

Nicholas Piggin May 2, 2020, 2:40 a.m. UTC | #1
Excerpts from Hugh Dickins's message of May 2, 2020 6:38 am:
> Hi Nick,
> 
> I've been getting an "Unrecoverable exception 380" after a few hours
> of load on the G5 (yes, that G5!) with 5.7-rc: when interrupt_return
> checks lazy_irq_pending, it crashes at check_preemption_disabled+0x24
> with CONFIG_DEBUG_PREEMPT=y.
> 
> check_preemption_disabled():
> lib/smp_processor_id.c:13
>    0:	7c 08 02 a6 	mflr    r0
>    4:	fb e1 ff f8 	std     r31,-8(r1)
>    8:	fb 61 ff d8 	std     r27,-40(r1)
>    c:	fb 81 ff e0 	std     r28,-32(r1)
>   10:	fb a1 ff e8 	std     r29,-24(r1)
>   14:	fb c1 ff f0 	std     r30,-16(r1)
> get_current():
> arch/powerpc/include/asm/current.h:20
>   18:	eb ed 01 88 	ld      r31,392(r13)
> check_preemption_disabled():
> lib/smp_processor_id.c:13
>   1c:	f8 01 00 10 	std     r0,16(r1)
>   20:	f8 21 ff 61 	stdu    r1,-160(r1)
> __read_once_size():
> include/linux/compiler.h:199
>   24:	81 3f 00 00 	lwz     r9,0(r31)
> check_preemption_disabled():
> lib/smp_processor_id.c:14
>   28:	a3 cd 00 02 	lhz     r30,2(r13)
> 
> I don't read ppc assembly, and have not jotted down the registers,
> but hope you can make sense of it. I get around it with the patch
> below (just avoiding the debug), but have no idea whether it's a
> necessary fix or a hacky workaround.

Hi Hugh,

Thanks for the report, nice catch. Your fix is actually the correct one 
(well, we probably want a __lazy_irq_pending() variant which is to be 
used in these cases).

Problem is MSR[RI] is cleared here, ready to do the last few things for 
interrupt return where we're not allowed to take any other interrupts.

SLB interrupts can happen just about anywhere aside from kernel text, 
global variables, and stack. When that hits, it appears to be 
unrecoverable due to RI=0.

We could clear just MSR[EE] for asynchronous interrupts, then check 
lazy_irq_pending(), and then clear MSR[RI] ready to return, and the
SLB miss in the debug check would be fine. But that's two mtmsr 
instructions, which is slower. So we'll skip the check.

I tested hash, and preempt, possibly even preempt+hash, but clearly not 
preempt+preempt_debug+hash+slb thrashing!

Thanks,
Nick

> 
> Hugh
> 
> --- 5.7-rc3/arch/powerpc/include/asm/hw_irq.h	2020-04-12 16:24:29.802769727 -0700
> +++ linux/arch/powerpc/include/asm/hw_irq.h	2020-04-27 11:31:10.000000000 -0700
> @@ -252,7 +252,7 @@ static inline bool arch_irqs_disabled(vo
>  
>  static inline bool lazy_irq_pending(void)
>  {
> -	return !!(get_paca()->irq_happened & ~PACA_IRQ_HARD_DIS);
> +	return !!(local_paca->irq_happened & ~PACA_IRQ_HARD_DIS);
>  }
>  
>  /*
>
diff mbox series

Patch

--- 5.7-rc3/arch/powerpc/include/asm/hw_irq.h	2020-04-12 16:24:29.802769727 -0700
+++ linux/arch/powerpc/include/asm/hw_irq.h	2020-04-27 11:31:10.000000000 -0700
@@ -252,7 +252,7 @@  static inline bool arch_irqs_disabled(vo
 
 static inline bool lazy_irq_pending(void)
 {
-	return !!(get_paca()->irq_happened & ~PACA_IRQ_HARD_DIS);
+	return !!(local_paca->irq_happened & ~PACA_IRQ_HARD_DIS);
 }
 
 /*