diff mbox

[v3,08/20] tcg-arm: Implement deposit for armv7

Message ID 1364484781-15561-9-git-send-email-rth@twiddle.net
State New
Headers show

Commit Message

Richard Henderson March 28, 2013, 3:32 p.m. UTC
We have BFI and BFC available for implementing it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
 tcg/arm/tcg-target.h |  5 ++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

Comments

Peter Maydell March 28, 2013, 4:15 p.m. UTC | #1
On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote:
> We have BFI and BFC available for implementing it.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
>  tcg/arm/tcg-target.h |  5 ++++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 88f5689..4950eaf 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -702,6 +702,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
>      }
>  }
>
> +bool tcg_target_deposit_valid(int ofs, int len)
> +{
> +    /* ??? Without bfi, we could improve over generic code by combining
> +       the right-shift from a non-zero ofs with the orr.  We do run into
> +       problems when rd == rs, and the mask generated from ofs+len don't
> +       fit into an immediate.  We would have to be careful not to pessimize
> +       wrt the optimizations performed on the expanded code.  */
> +    return use_armv7_instructions;

Strictly speaking BFI is v6T2, but there doesn't seem much point
in making the distinction given it would only affect the rare
ARM1156. (Personally I don't think there's much point worrying about
optmising codegen for anything pre-v7 at all.)

> +}
> +
> +static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
> +                                   TCGArg a1, int ofs, int len, bool const_a1)
> +{
> +    if (const_a1) {
> +        uint32_t mask = (2u << (len - 1)) - 1;

What guarantees us that we won't see a length of 0?
The tcg/README description doesn't say that's invalid
and I don't think the optimize pass handles it (maybe I
missed it).

-- PMM
Richard Henderson March 28, 2013, 4:22 p.m. UTC | #2
On 03/28/2013 09:15 AM, Peter Maydell wrote:
>> +    /* ??? Without bfi, we could improve over generic code by combining
>> +       the right-shift from a non-zero ofs with the orr.  We do run into
>> +       problems when rd == rs, and the mask generated from ofs+len don't
>> +       fit into an immediate.  We would have to be careful not to pessimize
>> +       wrt the optimizations performed on the expanded code.  */
>> +    return use_armv7_instructions;
> 
> Strictly speaking BFI is v6T2, but there doesn't seem much point
> in making the distinction given it would only affect the rare
> ARM1156. (Personally I don't think there's much point worrying about
> optmising codegen for anything pre-v7 at all.)

Fair enough.  I could update the comment to include v6t2, since I've done
similar for e.g. v6k (while retaining the v7 test) elsewhere in the patch set.

> What guarantees us that we won't see a length of 0?
> The tcg/README description doesn't say that's invalid
> and I don't think the optimize pass handles it (maybe I
> missed it).

We can patch the readme, and the asserts in tcg-op.h if you like.

I've assumed elsewhere that we won't see a zero length.  E.g. none of the other
cpus -- ppc, hppa, ia64 -- can encode that either.


r~
Peter Maydell March 28, 2013, 4:59 p.m. UTC | #3
On 28 March 2013 16:22, Richard Henderson <rth@twiddle.net> wrote:
> On 03/28/2013 09:15 AM, Peter Maydell wrote:
>> What guarantees us that we won't see a length of 0?
>> The tcg/README description doesn't say that's invalid
>> and I don't think the optimize pass handles it (maybe I
>> missed it).
>
> We can patch the readme, and the asserts in tcg-op.h if you like.

That would be nice, but I think I was getting confused with
the other edge case (length == 32), which we do handle
correctly.

-- PMM
diff mbox

Patch

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 88f5689..4950eaf 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -702,6 +702,35 @@  static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
     }
 }
 
+bool tcg_target_deposit_valid(int ofs, int len)
+{
+    /* ??? Without bfi, we could improve over generic code by combining
+       the right-shift from a non-zero ofs with the orr.  We do run into
+       problems when rd == rs, and the mask generated from ofs+len don't
+       fit into an immediate.  We would have to be careful not to pessimize
+       wrt the optimizations performed on the expanded code.  */
+    return use_armv7_instructions;
+}
+
+static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
+                                   TCGArg a1, int ofs, int len, bool const_a1)
+{
+    if (const_a1) {
+        uint32_t mask = (2u << (len - 1)) - 1;
+        a1 &= mask;
+        if (a1 == 0) {
+            /* bfi becomes bfc with rn == 15.  */
+            a1 = 15;
+        } else {
+            tcg_out_movi32(s, cond, TCG_REG_R8, a1);
+            a1 = TCG_REG_R8;
+        }
+    }
+    /* bfi/bfc */
+    tcg_out32(s, 0x07c00010 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((ofs + len - 1) << 16));
+}
+
 static inline void tcg_out_ld32_12(TCGContext *s, int cond,
                 int rd, int rn, tcg_target_long im)
 {
@@ -1873,6 +1902,11 @@  static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_ext16u(s, COND_AL, args[0], args[1]);
         break;
 
+    case INDEX_op_deposit_i32:
+        tcg_out_deposit(s, COND_AL, args[0], args[2],
+                        args[3], args[4], const_args[2]);
+        break;
+
     default:
         tcg_abort();
     }
@@ -1957,6 +1991,8 @@  static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_ext16s_i32, { "r", "r" } },
     { INDEX_op_ext16u_i32, { "r", "r" } },
 
+    { INDEX_op_deposit_i32, { "r", "0", "ri" } },
+
     { -1 },
 };
 
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 354dd8a..209f585 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -71,10 +71,13 @@  typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        1
 
+extern bool tcg_target_deposit_valid(int ofs, int len);
+#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
+
 enum {
     TCG_AREG0 = TCG_REG_R6,
 };