Patchwork Re: sparc64 lazy conditional codes evaluation

login
register
mail settings
Submitter Blue Swirl
Date May 3, 2010, 7:24 p.m.
Message ID <r2pf43fc5581005031224o2b430a8bo5c5605734ea64b52@mail.gmail.com>
Download mbox | patch
Permalink /patch/51522/
State New
Headers show

Comments

Blue Swirl - May 3, 2010, 7:24 p.m.
On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> Hi!
>
>  There is an issue with lazy conditional codes evaluation where
>  we return from trap handler with mismatching conditionals.
>
>  I seldom reproduce it here when dragging qemu window while
>  machine is working through silo initialization. I use gentoo minimal cd
>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>  would experience the same. Once in a while it would report crc error,
>  unable to open cd partition or it would fail to decompress image.

I think I've also seen this.

>  Pattern that fails appears to require a sequence of compare insn
>  possibly followed by a few instructions which do not touch conditionals,
>  then conditional branch insn. If it happens that we trap while processing
>  conditional branch insn so it is restarted after return from trap then
>  seldom conditional codes are calculated incorrectly.
>
>  I cannot point to exact cause but it appears that after trap return
>  we may have CC_OP and CC_SRC* mismatch somewhere,
>  since adding more cond evaluation flushes over the code helps.
>
>  We already tried doing flush more frequently and it is still not
>  complete, so the question is how to finally do this once and right :)
>
>  Obviously I do not get the design of lazy evaluation right, but
>  the following list appears to be good start. Plan is to prepare
>  a change to qemu and find a way to test it.
>
>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>    use DisasContext->cc_op to predict if flags should be not evaluated
>    due to overriding insn. Instead we can drop cc_op from disassembler
>    context and simplify code to only use cc_op from env.

Not currently, but in the future we may use that to do even lazier
flags computation. For example the sequence 'cmp x, y; bne target'
could be much more optimal by changing the branch to do the
comparison. Here's an old unfinished patch to do some of this.

>    Another point is that we always write to env->cc_op when
>  translating *cc insns
>    This should solve any issue with dc->cc_op prediction going
>    out of sync with env->cc_op and cpu_cc_src*

I think this is what is happening now.

>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>    a. conditional code is required by insn (like addc, cond branch etc.)
>       - here we can optimize by evaluating specific bits (carry?)
>       - not sure if it works in case we have two cond consuming insns,
>         where first needs carry another needs the rest of flags

Here's another patch to optimize C flag handling. It doesn't pass my
tests though.

>    b. CCR is read by rdccr (helper_rdccr)
>       - have to compute all flags
>    c. trap occurs and we prepare trap level context (saving pstate)
>       - have to compute all flags
>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>       - have to compute all flags

Fully agree.
Igor V. Kovalenko - May 3, 2010, 7:46 p.m.
On Mon, May 3, 2010 at 11:24 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
>> Hi!
>>
>>  There is an issue with lazy conditional codes evaluation where
>>  we return from trap handler with mismatching conditionals.
>>
>>  I seldom reproduce it here when dragging qemu window while
>>  machine is working through silo initialization. I use gentoo minimal cd
>>  install-sparc64-minimal-20100322.iso but I think anything with silo boot
>>  would experience the same. Once in a while it would report crc error,
>>  unable to open cd partition or it would fail to decompress image.
>
> I think I've also seen this.
>
>>  Pattern that fails appears to require a sequence of compare insn
>>  possibly followed by a few instructions which do not touch conditionals,
>>  then conditional branch insn. If it happens that we trap while processing
>>  conditional branch insn so it is restarted after return from trap then
>>  seldom conditional codes are calculated incorrectly.
>>
>>  I cannot point to exact cause but it appears that after trap return
>>  we may have CC_OP and CC_SRC* mismatch somewhere,
>>  since adding more cond evaluation flushes over the code helps.
>>
>>  We already tried doing flush more frequently and it is still not
>>  complete, so the question is how to finally do this once and right :)
>>
>>  Obviously I do not get the design of lazy evaluation right, but
>>  the following list appears to be good start. Plan is to prepare
>>  a change to qemu and find a way to test it.
>>
>>  1. Since SPARC* is a RISC CPU it seems to be not profitable to
>>    use DisasContext->cc_op to predict if flags should be not evaluated
>>    due to overriding insn. Instead we can drop cc_op from disassembler
>>    context and simplify code to only use cc_op from env.
>
> Not currently, but in the future we may use that to do even lazier
> flags computation. For example the sequence 'cmp x, y; bne target'
> could be much more optimal by changing the branch to do the
> comparison. Here's an old unfinished patch to do some of this.
>
>>    Another point is that we always write to env->cc_op when
>>  translating *cc insns
>>    This should solve any issue with dc->cc_op prediction going
>>    out of sync with env->cc_op and cpu_cc_src*
>
> I think this is what is happening now.
>
>>  2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
>>    a. conditional code is required by insn (like addc, cond branch etc.)
>>       - here we can optimize by evaluating specific bits (carry?)
>>       - not sure if it works in case we have two cond consuming insns,
>>         where first needs carry another needs the rest of flags
>
> Here's another patch to optimize C flag handling. It doesn't pass my
> tests though.
>
>>    b. CCR is read by rdccr (helper_rdccr)
>>       - have to compute all flags
>>    c. trap occurs and we prepare trap level context (saving pstate)
>>       - have to compute all flags
>>    d. control goes out of tcg runtime (so gdbstub reads correct value from env)
>>       - have to compute all flags
>
> Fully agree.

Cool

Still I'd propose to kill dc->cc_op, find a reliable way to test it
and then add it back possibly with more optimizations.
I'm lost in the code up to the point where I believe we need to
save/restore cc_op and cpu_cc* while switching trap levels.

Patch

From 93cce43be043ca25770165b8c06546eafc320716 Mon Sep 17 00:00:00 2001
From: Blue Swirl <blauwirbel@gmail.com>
Date: Mon, 3 May 2010 19:21:59 +0000
Subject: [PATCH] Branch optimization BROKEN

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
---
 target-sparc/translate.c |  108 +++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 94c343d..57bda12 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -1115,6 +1115,104 @@  static inline void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src)
 }
 #endif
 
+// Inverted logic
+static const int gen_tcg_cond[16] = {
+    -1,
+    TCG_COND_NE,
+    TCG_COND_GT,
+    TCG_COND_GE,
+    TCG_COND_GTU,
+    TCG_COND_GEU,
+    -1,
+    -1,
+    -1,
+    TCG_COND_EQ,
+    TCG_COND_LE,
+    TCG_COND_LT,
+    TCG_COND_LEU,
+    TCG_COND_LTU,
+    -1,
+    -1,
+};
+
+/* generate a conditional jump to label 'l1' according to jump opcode
+   value 'b'. In the fast case, T0 is guaranted not to be used. */
+static inline void gen_brcond(DisasContext *dc, int cond, int l1, int cc, TCGv r_cond)
+{
+    //printf("gen_brcond: cc_op %d\n", dc->cc_op);
+    switch (dc->cc_op) {
+        /* we optimize the cmp/br case */
+    case CC_OP_SUB:
+        // Inverted logic
+        switch (cond) {
+        case 0x0: // n
+            tcg_gen_br(l1);
+            break;
+        case 0x1: // e
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_NE, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0x2: // le
+        case 0x3: // l
+        case 0x4: // leu
+        case 0x5: // cs/lu
+        case 0xa: // g
+        case 0xb: // ge
+        case 0xc: // gu
+        case 0xd: // cc/geu
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+            } else {
+                tcg_gen_brcondi_i32(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+            }
+            break;
+        case 0x6: // neg
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_GE, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_GE, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0x7: // vs
+            gen_helper_compute_psr();
+            dc->cc_op = CC_OP_FLAGS;
+            gen_op_eval_bvs(cpu_cc_dst, cpu_cc_src);
+            break;
+        case 0x8: // a
+            // nop
+            break;
+        case 0x9: // ne
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0xe: // pos
+            if (cc == 1) {
+                tcg_gen_brcondi_i64(TCG_COND_LT, cpu_cc_dst, 0, l1);
+            } else {
+                tcg_gen_brcondi_i32(TCG_COND_LT, cpu_cc_dst, 0, l1);
+            }
+            break;
+        case 0xf: // vc
+            gen_helper_compute_psr();
+            dc->cc_op = CC_OP_FLAGS;
+            gen_op_eval_bvc(cpu_cc_dst, cpu_cc_src);
+            break;
+        }
+        break;
+    case CC_OP_FLAGS:
+    default:
+        gen_cond(r_cond, cc, cond, dc);
+        tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
+        break;
+    }
+}
+
 /* XXX: potentially incorrect if dynamic npc */
 static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
                       TCGv r_cond)
@@ -1143,11 +1241,17 @@  static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
         }
     } else {
         flush_cond(dc, r_cond);
-        gen_cond(r_cond, cc, cond, dc);
         if (a) {
-            gen_branch_a(dc, target, dc->npc, r_cond);
+            int l1 = gen_new_label();
+
+            gen_brcond(dc, cond, l1, cc, r_cond);
+            gen_goto_tb(dc, 0, dc->npc, target);
+
+            gen_set_label(l1);
+            gen_goto_tb(dc, 1, dc->npc + 4, dc->npc + 8);
             dc->is_br = 1;
         } else {
+            gen_cond(r_cond, cc, cond, dc);
             dc->pc = dc->npc;
             dc->jump_pc[0] = target;
             dc->jump_pc[1] = dc->npc + 4;
-- 
1.5.6.5