external/mambo: add helper for machine checks

Submitted by Nicholas Piggin on March 24, 2017, 4:04 a.m.

Details

Message ID 20170324040414.3796-1-npiggin@gmail.com
State New
Headers show

Commit Message

Nicholas Piggin March 24, 2017, 4:04 a.m.
Add helpers to construct machine checks with registers set up properly.
exc_mce raises a machine check exception that can be stepped into. This
is useful for testing the machine check handler.

Also add a similar exc_sreset for system reset.

inject_mce does the same but runs immediately and stops when the
instruction reaches the NIP (which can get tangled up if machine check
re-enters this code). This is useful for testing robustness to
interleaving machine checks.

inject_mce_step allows injecting MCEs between each instruction and stepping
over them. inject_mce_step_ri does the same but only when MSR has RI set.
This can be useful to test correctness of low level code. For example,
testing system call vs machine check:

systemsim % b 0xC000000000004c00
systemsim % c
0xC000000000004C00 (0x0000000000004C00) Enc:0xA64BB17D : mtspr   HSPRG1,r13
systemsim % inject_mce_step_ri 100
0xC000000000004C04 (0x0000000000004C04) Enc:0xA64AB07D : mfspr   r13,HSPRG0
0xC000000000004C08 (0x0000000000004C08) Enc:0x80002DF9 : std     r9,0x80(r13)
0xC000000000004C0C (0x0000000000004C0C) Enc:0xA6E2207D : mfspr   r9,PPR
0xC000000000004C10 (0x0000000000004C10) Enc:0x7813427C : mr      r2,r2
0xC000000000004C14 (0x0000000000004C14) Enc:0x88004DF9 : std     r10,0x88(r13)
0xC000000000004C18 (0x0000000000004C18) Enc:0xD8002DF9 : std     r9,0xD8(r13)
0xC000000000004C1C (0x0000000000004C1C) Enc:0x2600207D : mfcr    r9
0xC000000000004C20 (0x0000000000004C20) Enc:0xE8074D89 : lbz     r10,0x7E8(r13)
0xC000000000004C24 (0x0000000000004C24) Enc:0x00000A2C : cmpwi   cr0,r10,0
0xC000000000004C28 (0x0000000000004C28) Enc:0xA80F8240 : bne     cr0,$+0xFA8  (bc 0x4,0x2,0xFA8,0,0)
0xC000000000004C2C (0x0000000000004C2C) Enc:0xA64AB17D : mfspr   r13,HSPRG1
0xC000000000004C30 (0x0000000000004C30) Enc:0xBE1E202C : cmpdi   cr0,r0,7870
0xC000000000004C34 (0x0000000000004C34) Enc:0x2000C241 : beq     cr0,$+0x20  (bc 0xE,0x2,0x20,0,0)
0xC000000000004C38 (0x0000000000004C38) Enc:0x786BA97D : mr      r9,r13
0xC000000000004C3C (0x0000000000004C3C) Enc:0xA64AB07D : mfspr   r13,HSPRG0
0xC000000000004C40 (0x0000000000004C40) Enc:0xA6027A7D : mfspr   r11,SRR0
0xC000000000004C44 (0x0000000000004C44) Enc:0xA6029B7D : mfspr   r12,SRR1
0xC000000000004C48 (0x0000000000004C48) Enc:0x02004039 : li      r10,2
0xC000000000004C4C (0x0000000000004C4C) Enc:0x6401417D : mtmsrd  r10,1
0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
236380163: (212143620): Disabling lock debugging due to kernel taint
0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
[...]

Every instruction after 0xC000000000004C4C is getting an interleaving
MCE, and continuing after this injection the kernel prints a lot of MCE
reports and continues working properly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Hi,

If anybody would find this useful or has a better way to do it, let me
know. This is a polished up and improved version of what I've been using
for testing.

It should be noted that upstream mambo currently does not quite work
properly with this because of a quirk in how it injects MCE interrupts.
I was kind-of hacking around that in the script but took out that code
because the mambo developers will be fixing that or giving us an option
to change behaviour soon.

Thanks,
Nick

 external/mambo/mambo_utils.tcl | 190 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 187 insertions(+), 3 deletions(-)

Comments

Michael Ellerman April 4, 2017, 1:52 a.m.
Nicholas Piggin <npiggin@gmail.com> writes:

> Add helpers to construct machine checks with registers set up properly.
> exc_mce raises a machine check exception that can be stepped into. This
> is useful for testing the machine check handler.
>
> Also add a similar exc_sreset for system reset.
>
> inject_mce does the same but runs immediately and stops when the
> instruction reaches the NIP (which can get tangled up if machine check
> re-enters this code). This is useful for testing robustness to
> interleaving machine checks.
>
> inject_mce_step allows injecting MCEs between each instruction and stepping
> over them. inject_mce_step_ri does the same but only when MSR has RI set.
> This can be useful to test correctness of low level code. For example,
> testing system call vs machine check:
>
> systemsim % b 0xC000000000004c00
> systemsim % c
> 0xC000000000004C00 (0x0000000000004C00) Enc:0xA64BB17D : mtspr   HSPRG1,r13
> systemsim % inject_mce_step_ri 100
> 0xC000000000004C04 (0x0000000000004C04) Enc:0xA64AB07D : mfspr   r13,HSPRG0
> 0xC000000000004C08 (0x0000000000004C08) Enc:0x80002DF9 : std     r9,0x80(r13)
> 0xC000000000004C0C (0x0000000000004C0C) Enc:0xA6E2207D : mfspr   r9,PPR
> 0xC000000000004C10 (0x0000000000004C10) Enc:0x7813427C : mr      r2,r2
> 0xC000000000004C14 (0x0000000000004C14) Enc:0x88004DF9 : std     r10,0x88(r13)
> 0xC000000000004C18 (0x0000000000004C18) Enc:0xD8002DF9 : std     r9,0xD8(r13)
> 0xC000000000004C1C (0x0000000000004C1C) Enc:0x2600207D : mfcr    r9
> 0xC000000000004C20 (0x0000000000004C20) Enc:0xE8074D89 : lbz     r10,0x7E8(r13)
> 0xC000000000004C24 (0x0000000000004C24) Enc:0x00000A2C : cmpwi   cr0,r10,0
> 0xC000000000004C28 (0x0000000000004C28) Enc:0xA80F8240 : bne     cr0,$+0xFA8  (bc 0x4,0x2,0xFA8,0,0)
> 0xC000000000004C2C (0x0000000000004C2C) Enc:0xA64AB17D : mfspr   r13,HSPRG1
> 0xC000000000004C30 (0x0000000000004C30) Enc:0xBE1E202C : cmpdi   cr0,r0,7870
> 0xC000000000004C34 (0x0000000000004C34) Enc:0x2000C241 : beq     cr0,$+0x20  (bc 0xE,0x2,0x20,0,0)
> 0xC000000000004C38 (0x0000000000004C38) Enc:0x786BA97D : mr      r9,r13
> 0xC000000000004C3C (0x0000000000004C3C) Enc:0xA64AB07D : mfspr   r13,HSPRG0
> 0xC000000000004C40 (0x0000000000004C40) Enc:0xA6027A7D : mfspr   r11,SRR0
> 0xC000000000004C44 (0x0000000000004C44) Enc:0xA6029B7D : mfspr   r12,SRR1
> 0xC000000000004C48 (0x0000000000004C48) Enc:0x02004039 : li      r10,2
> 0xC000000000004C4C (0x0000000000004C4C) Enc:0x6401417D : mtmsrd  r10,1
> 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
> 236380163: (212143620): Disabling lock debugging due to kernel taint
> 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
> 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> [...]
>
> Every instruction after 0xC000000000004C4C is getting an interleaving
> MCE, and continuing after this injection the kernel prints a lot of MCE
> reports and continues working properly.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>


This is really helpful for tripping these otherwise infrequently tested
code paths.

Tested-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

Patch hide | download patch | download mbox

diff --git a/external/mambo/mambo_utils.tcl b/external/mambo/mambo_utils.tcl
index d8825bc3..86dfda91 100644
--- a/external/mambo/mambo_utils.tcl
+++ b/external/mambo/mambo_utils.tcl
@@ -100,9 +100,11 @@  proc pa { spr } {
     }
 }
 
-proc s { } {
-    mysim step 1
-    ipca
+proc s { {nr 1} } {
+    for { set i 0 } { $i < $nr } { incr i 1 } {
+        mysim step 1
+        ipca
+    }
 }
 
 proc z { count } {
@@ -342,3 +344,185 @@  proc start_qtrace { { qtfile qtrace.qt } } {
     ereader start $env(EXEC_DIR)/emitter/qtracer [pid] -outfile $qtfile
 }
 
+proc current_insn { { t 0 } { c 0 } } {
+    set pc [mysim cpu $c thread $t display spr pc]
+    set pc_laddr [mysim cpu $c util itranslate $pc]
+    set inst [mysim cpu $c memory display $pc_laddr 4]
+    set disasm [mysim cpu $c util ppc_disasm $inst $pc]
+    return $disasm
+}
+
+global SRR1
+global DSISR
+global DAR
+
+proc sreset_trigger { args } {
+    variable SRR1
+
+    mysim trigger clear pc 0x100
+    set s [expr [mysim cpu 0 display spr srr1] & ~0x00000000003c0002]
+    set SRR1 [expr $SRR1 | $s]
+    mysim cpu 0 set spr srr1 $SRR1
+}
+
+proc exc_sreset { } {
+    variable SRR1
+    variable DSISR
+    variable DAR
+
+    # In case of recoverable MCE, idle wakeup always sets RI, others get
+    # RI from current environment. For unrecoverable, RI would always be
+    # clear by hardware.
+    if { [current_insn] in { "stop" "nap" "sleep" "winkle" } } {
+        set msr_ri 0x2
+        set SRR1_powersave [expr (0x2 << (63-47))]
+    } else {
+        set msr_ri [expr [mysim cpu 0 display spr msr] & 0x2]
+        set SRR1_powersave 0
+    }
+
+    # reason system reset
+    set SRR1_reason 0x4
+
+    set SRR1 [expr 0x0 | $msr_ri | $SRR1_powersave]
+    set SRR1 [expr $SRR1 | ((($SRR1_reason >> 3) & 0x1) << (63-42))]
+    set SRR1 [expr $SRR1 | ((($SRR1_reason >> 2) & 0x1) << (63-43))]
+    set SRR1 [expr $SRR1 | ((($SRR1_reason >> 1) & 0x1) << (63-44))]
+    set SRR1 [expr $SRR1 | ((($SRR1_reason >> 0) & 0x1) << (63-45))]
+
+    if { [current_insn] in { "stop" "nap" "sleep" "winkle" } } {
+        # mambo has a quirk that interrupts from idle wake immediately
+        mysim trigger set pc 0x100 "sreset_trigger"
+        mysim cpu 0 interrupt MachineCheck
+	# XXX: only trigger if pc is 0x100
+	sreset_trigger
+    } else {
+        mysim trigger set pc 0x100 "sreset_trigger"
+        mysim cpu 0 interrupt SystemReset
+    }
+}
+
+proc mce_trigger { args } {
+    variable SRR1
+    variable DSISR
+    variable DAR
+
+    mysim trigger clear pc 0x200
+
+    set s [expr [mysim cpu 0 display spr srr1] & ~0x00000000801f0002]
+    set SRR1 [expr $SRR1 | $s]
+    mysim cpu 0 set spr srr1 $SRR1
+    mysim cpu 0 set spr dsisr $DSISR
+    mysim cpu 0 set spr dar $DAR
+}
+
+#
+# Inject a machine check. Recoverable MCE types can be forced to unrecoverable
+# by clearing MSR_RI bit from SRR1 (which hardware may do).
+# If d_side is 0, then cause goes into SRR1. Otherwise it gets put into DSISR.
+# DAR is hardcoded to always 0xdeadbeefdeadbeef
+#
+# Default with no arguments is a recoverable i-side TLB multi-hit
+# Other options:
+# d_side=1 cause=0x80 - recoverable d-side SLB multi-hit
+# d_side=0 cause=0xd  - unrecoverable i-side async store timeout (POWER9 only)
+# d_side=0 cause=0x1  - unrecoverable i-side ifetch
+#
+proc exc_mce { { d_side 0 } { cause 0x5 } { recoverable 1 } } {
+    variable SRR1
+    variable DSISR
+    variable DAR
+
+    # In case of recoverable MCE, idle wakeup always sets RI, others get
+    # RI from current environment. For unrecoverable, RI would always be
+    # clear by hardware.
+    if { [current_insn] in { "stop" "nap" "sleep" "winkle" } } {
+        set msr_ri 0x2
+        set SRR1_powersave [expr (0x2 << (63-47))]
+    } else {
+        set msr_ri [expr [mysim cpu 0 display spr msr] & 0x2]
+        set SRR1_powersave 0
+    }
+
+    if { !$recoverable } {
+        set msr_ri 0x0
+    }
+
+    # recoverable d-side SLB multihit
+    if { $d_side } {
+        set is_dside 1
+        set SRR1_mc_cause 0x0
+        set DSISR $cause
+        set DAR 0xdeadbeefdeadbeef
+    } else {
+        set is_dside 0
+        set SRR1_mc_cause $cause
+        set DSISR 0x0
+        set DAR 0x0
+    }
+
+    set SRR1 [expr 0x0 | $msr_ri | $SRR1_powersave]
+
+    set SRR1 [expr $SRR1 | ($is_dside << (63-42))]
+    set SRR1 [expr $SRR1 | ((($SRR1_mc_cause >> 3) & 0x1) << (63-36))]
+    set SRR1 [expr $SRR1 | ((($SRR1_mc_cause >> 2) & 0x1) << (63-43))]
+    set SRR1 [expr $SRR1 | ((($SRR1_mc_cause >> 1) & 0x1) << (63-44))]
+    set SRR1 [expr $SRR1 | ((($SRR1_mc_cause >> 0) & 0x1) << (63-45))]
+
+    if { [current_insn] in { "stop" "nap" "sleep" "winkle" } } {
+        # mambo has a quirk that interrupts from idle wake immediately
+        mysim trigger set pc 0x200 "mce_trigger"
+        mysim cpu 0 interrupt MachineCheck
+	# XXX: only trigger if pc is 0x200
+	mce_trigger
+    } else {
+        mysim trigger set pc 0x200 "mce_trigger"
+        mysim cpu 0 interrupt MachineCheck
+    }
+}
+
+global R1
+
+# Avoid stopping if we re-enter the same code. Wait until r1 matches.
+# This helps stepping over exceptions or function calls etc.
+proc stop_stack_match { args } {
+    variable R1
+
+    set r1 [mysim cpu 0 display gpr 1]
+    if { $R1 == $r1 } {
+        simstop
+        ipca
+    }
+}
+
+# inject default recoverable MCE and step over it. Useful for testing whether
+# code copes with taking an interleaving MCE.
+proc inject_mce { } {
+    variable R1
+
+    set R1 [mysim cpu 0 display gpr 1]
+    set pc [mysim cpu 0 display spr pc]
+    mysim trigger set pc $pc "stop_stack_match"
+    exc_mce
+    c
+    mysim trigger clear pc $pc ; list
+}
+
+# inject and step over one instruction, and repeat.
+proc inject_mce_step { {nr 1} } {
+    for { set i 0 } { $i < $nr } { incr i 1 } {
+        inject_mce
+        s
+    }
+}
+
+# inject if RI is set and step over one instruction, and repeat.
+proc inject_mce_step_ri { {nr 1} } {
+    for { set i 0 } { $i < $nr } { incr i 1 } {
+        if { [expr [mysim cpu 0 display spr msr] & 0x2] } {
+            inject_mce
+        }
+        s
+    }
+}
+