diff mbox

[12/12] target-arm: A64: add support for compare and branch imm

Message ID 1386107477-24165-13-git-send-email-peter.maydell@linaro.org
State New
Headers show

Commit Message

Peter Maydell Dec. 3, 2013, 9:51 p.m. UTC
From: Alexander Graf <agraf@suse.de>

This patch adds emulation for the compare and branch insns,
CBZ and CBNZ.

Signed-off-by: Alexander Graf <agraf@suse.de>
[claudio: adapted to new decoder,
          compare with immediate 0,
	  introduce read_cpu_reg to get the 0 extension on (!sf)]
Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c |   45 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

Comments

Richard Henderson Dec. 4, 2013, 12:10 a.m. UTC | #1
On 12/04/2013 10:51 AM, Peter Maydell wrote:
> @@ -184,6 +184,18 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
>      }
>  }
>  
> +/* read a cpu register in 32bit/64bit mode to dst */
> +static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf)
> +{
> +    if (reg == 31) {
> +        tcg_gen_movi_i64(dst, 0);
> +    } else if (sf) {
> +        tcg_gen_mov_i64(dst, cpu_X[reg]);
> +    } else { /* (!sf) */
> +        tcg_gen_ext32u_i64(dst, cpu_X[reg]);
> +    }
> +}

I think this should be more like cpu_reg and return a TCGv instead of force a
copy into a newly generated temporary.  You've got a pool of auto-freeing
temporaries, after all.

> +    if (op) { /* CBNZ */
> +        tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch);
> +    } else { /* CBZ */
> +        tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch);
> +    }

Similar comments to TBNZ/TBZ apply here.


r~
Peter Maydell Dec. 4, 2013, 12:32 a.m. UTC | #2
On 4 December 2013 00:10, Richard Henderson <rth@twiddle.net> wrote:
> On 12/04/2013 10:51 AM, Peter Maydell wrote:
>> @@ -184,6 +184,18 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
>>      }
>>  }
>>
>> +/* read a cpu register in 32bit/64bit mode to dst */
>> +static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf)
>> +{
>> +    if (reg == 31) {
>> +        tcg_gen_movi_i64(dst, 0);
>> +    } else if (sf) {
>> +        tcg_gen_mov_i64(dst, cpu_X[reg]);
>> +    } else { /* (!sf) */
>> +        tcg_gen_ext32u_i64(dst, cpu_X[reg]);
>> +    }
>> +}
>
> I think this should be more like cpu_reg and return a TCGv instead of force a
> copy into a newly generated temporary.  You've got a pool of auto-freeing
> temporaries, after all.

You're right that we can just make this function return the TCGv
temp rather than making the caller pass one in. Are you suggesting
the 64-bit case should return cpu_X[reg] rather than a copy of it,
though? I think it would be pretty hard to reason about if you had to
remember that sometimes you got a trashable copy and sometimes
the real register.

(I'm still a bit on the fence about that pool of auto-freeing temporaries.
Manual temp management is certainly a fertile source of decoder bugs,
but in the longer term we might want to push the functionality down into
the common code rather than having an ad-hoc thing in one decoder.)

>> +    if (op) { /* CBNZ */
>> +        tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch);
>> +    } else { /* CBZ */
>> +        tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch);
>> +    }
>
> Similar comments to TBNZ/TBZ apply here.

Agreed.

thanks
-- PMM
Richard Henderson Dec. 4, 2013, 12:48 a.m. UTC | #3
On 12/04/2013 01:32 PM, Peter Maydell wrote:
> You're right that we can just make this function return the TCGv
> temp rather than making the caller pass one in. Are you suggesting
> the 64-bit case should return cpu_X[reg] rather than a copy of it,
> though? I think it would be pretty hard to reason about if you had to
> remember that sometimes you got a trashable copy and sometimes
> the real register.

I think that the normal case is to write to the output in one step, and thus
having an input overlap an output isn't your problem but tcg's.  I would think
the case of multi-step output to be fairly rare, and easily solvable by always
allocating an output temporary.

Am I off-base on the multi-output thing?  Modulo flags setting, of course, but
I think that special case ought to be relatively solvable.

> (I'm still a bit on the fence about that pool of auto-freeing temporaries.
> Manual temp management is certainly a fertile source of decoder bugs,
> but in the longer term we might want to push the functionality down into
> the common code rather than having an ad-hoc thing in one decoder.)

See also Sparc and S390.  ;-)

I'm open to suggestions for doing this generically, but I've no great ideas
off-hand.


r~
Peter Maydell Dec. 4, 2013, 11:05 a.m. UTC | #4
On 4 December 2013 00:48, Richard Henderson <rth@twiddle.net> wrote:
> On 12/04/2013 01:32 PM, Peter Maydell wrote:
>> You're right that we can just make this function return the TCGv
>> temp rather than making the caller pass one in. Are you suggesting
>> the 64-bit case should return cpu_X[reg] rather than a copy of it,
>> though? I think it would be pretty hard to reason about if you had to
>> remember that sometimes you got a trashable copy and sometimes
>> the real register.
>
> I think that the normal case is to write to the output in one step, and thus
> having an input overlap an output isn't your problem but tcg's.  I would think
> the case of multi-step output to be fairly rare, and easily solvable by always
> allocating an output temporary.

Does a copy to temp actually cost us anything, though? I was
under the impression the TCG optimizer would basically throw
it away again.

I suppose you could just treat the semantics of it as "assume you
can't trash this TCGv" in all cases and rely on the auto-free.

> Am I off-base on the multi-output thing?  Modulo flags setting, of course, but
> I think that special case ought to be relatively solvable.
>
>> (I'm still a bit on the fence about that pool of auto-freeing temporaries.
>> Manual temp management is certainly a fertile source of decoder bugs,
>> but in the longer term we might want to push the functionality down into
>> the common code rather than having an ad-hoc thing in one decoder.)
>
> See also Sparc and S390.  ;-)

Ah, I hadn't noticed we were already doing this in other frontends.

-- PMM
Peter Maydell Dec. 4, 2013, 5:02 p.m. UTC | #5
On 4 December 2013 00:48, Richard Henderson <rth@twiddle.net> wrote:
> On 12/04/2013 01:32 PM, Peter Maydell wrote:
>> You're right that we can just make this function return the TCGv
>> temp rather than making the caller pass one in. Are you suggesting
>> the 64-bit case should return cpu_X[reg] rather than a copy of it,
>> though? I think it would be pretty hard to reason about if you had to
>> remember that sometimes you got a trashable copy and sometimes
>> the real register.
>
> I think that the normal case is to write to the output in one step, and thus
> having an input overlap an output isn't your problem but tcg's.  I would think
> the case of multi-step output to be fairly rare, and easily solvable by always
> allocating an output temporary.

So I had a look at this, and it seems to make code like this (from
the handling for logic ops with shifted registers):

    tcg_rm = read_cpu_reg(s, rm, sf);

    if (shift_amount) {
        shift_reg_imm(tcg_rm, tcg_rm, sf,
                      shift_type, shift_amount);
    }

    if (invert) {
        tcg_gen_not_i64(tcg_rm, tcg_rm);
        /* we zero extend later on (!sf) */
    }

a bit awkward if the value returned from read_cpu_reg() isn't a
trashable copy, because of the sequence of "maybe we need
to change the value like this" conditionals. The code basically
needs to copy the return value from read_cpu_reg() into its
own temporary immediately. There's a similar thing in the
conditional-select code where the else-case's value might be
inverted and might be incremented and might be neither.

I think that kind of thing tips the balance in favour of having
read_cpu_reg() always return a trashable temporary.

thanks
-- PMM
Richard Henderson Dec. 4, 2013, 5:31 p.m. UTC | #6
On 12/05/2013 06:02 AM, Peter Maydell wrote:
> I think that kind of thing tips the balance in favour of having
> read_cpu_reg() always return a trashable temporary.

Fair enough.


r~
diff mbox

Patch

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index bcb59db..37dc462 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -184,6 +184,18 @@  static TCGv_i64 cpu_reg(DisasContext *s, int reg)
     }
 }
 
+/* read a cpu register in 32bit/64bit mode to dst */
+static void read_cpu_reg(DisasContext *s, TCGv_i64 dst, int reg, int sf)
+{
+    if (reg == 31) {
+        tcg_gen_movi_i64(dst, 0);
+    } else if (sf) {
+        tcg_gen_mov_i64(dst, cpu_X[reg]);
+    } else { /* (!sf) */
+        tcg_gen_ext32u_i64(dst, cpu_X[reg]);
+    }
+}
+
 /*
  * the instruction disassembly implemented here matches
  * the instruction encoding classifications in chapter 3 (C3)
@@ -209,10 +221,39 @@  static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
     gen_goto_tb(s, 0, addr);
 }
 
-/* Compare & branch (immediate) */
+/* C3.2.1 Compare & branch (immediate)
+ *   31  30         25  24  23                  5 4      0
+ * +----+-------------+----+---------------------+--------+
+ * | sf | 0 1 1 0 1 0 | op |         imm19       |   Rt   |
+ * +----+-------------+----+---------------------+--------+
+ */
 static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    unsigned int sf, op, rt;
+    uint64_t addr;
+    int label_nomatch;
+    TCGv_i64 tcg_cmp;
+
+    sf = extract32(insn, 31, 1);
+    op = extract32(insn, 24, 1);
+    rt = extract32(insn, 0, 5);
+    addr = s->pc + sextract32(insn, 5, 19) * 4 - 4;
+
+    tcg_cmp = tcg_temp_new_i64();
+    read_cpu_reg(s, tcg_cmp, rt, sf);
+    label_nomatch = gen_new_label();
+
+    if (op) { /* CBNZ */
+        tcg_gen_brcondi_i64(TCG_COND_EQ, tcg_cmp, 0, label_nomatch);
+    } else { /* CBZ */
+        tcg_gen_brcondi_i64(TCG_COND_NE, tcg_cmp, 0, label_nomatch);
+    }
+
+    tcg_temp_free_i64(tcg_cmp);
+
+    gen_goto_tb(s, 0, addr);
+    gen_set_label(label_nomatch);
+    gen_goto_tb(s, 1, s->pc);
 }
 
 /* C3.2.5 Test & branch (immediate)