Message ID | 1364484781-15561-9-git-send-email-rth@twiddle.net |
---|---|
State | New |
Headers | show |
On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote: > We have BFI and BFC available for implementing it. > > Signed-off-by: Richard Henderson <rth@twiddle.net> > --- > tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++ > tcg/arm/tcg-target.h | 5 ++++- > 2 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c > index 88f5689..4950eaf 100644 > --- a/tcg/arm/tcg-target.c > +++ b/tcg/arm/tcg-target.c > @@ -702,6 +702,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn) > } > } > > +bool tcg_target_deposit_valid(int ofs, int len) > +{ > + /* ??? Without bfi, we could improve over generic code by combining > + the right-shift from a non-zero ofs with the orr. We do run into > + problems when rd == rs, and the mask generated from ofs+len don't > + fit into an immediate. We would have to be careful not to pessimize > + wrt the optimizations performed on the expanded code. */ > + return use_armv7_instructions; Strictly speaking BFI is v6T2, but there doesn't seem much point in making the distinction given it would only affect the rare ARM1156. (Personally I don't think there's much point worrying about optmising codegen for anything pre-v7 at all.) > +} > + > +static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd, > + TCGArg a1, int ofs, int len, bool const_a1) > +{ > + if (const_a1) { > + uint32_t mask = (2u << (len - 1)) - 1; What guarantees us that we won't see a length of 0? The tcg/README description doesn't say that's invalid and I don't think the optimize pass handles it (maybe I missed it). -- PMM
On 03/28/2013 09:15 AM, Peter Maydell wrote: >> + /* ??? Without bfi, we could improve over generic code by combining >> + the right-shift from a non-zero ofs with the orr. We do run into >> + problems when rd == rs, and the mask generated from ofs+len don't >> + fit into an immediate. We would have to be careful not to pessimize >> + wrt the optimizations performed on the expanded code. */ >> + return use_armv7_instructions; > > Strictly speaking BFI is v6T2, but there doesn't seem much point > in making the distinction given it would only affect the rare > ARM1156. (Personally I don't think there's much point worrying about > optmising codegen for anything pre-v7 at all.) Fair enough. I could update the comment to include v6t2, since I've done similar for e.g. v6k (while retaining the v7 test) elsewhere in the patch set. > What guarantees us that we won't see a length of 0? > The tcg/README description doesn't say that's invalid > and I don't think the optimize pass handles it (maybe I > missed it). We can patch the readme, and the asserts in tcg-op.h if you like. I've assumed elsewhere that we won't see a zero length. E.g. none of the other cpus -- ppc, hppa, ia64 -- can encode that either. r~
On 28 March 2013 16:22, Richard Henderson <rth@twiddle.net> wrote: > On 03/28/2013 09:15 AM, Peter Maydell wrote: >> What guarantees us that we won't see a length of 0? >> The tcg/README description doesn't say that's invalid >> and I don't think the optimize pass handles it (maybe I >> missed it). > > We can patch the readme, and the asserts in tcg-op.h if you like. That would be nice, but I think I was getting confused with the other edge case (length == 32), which we do handle correctly. -- PMM
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c index 88f5689..4950eaf 100644 --- a/tcg/arm/tcg-target.c +++ b/tcg/arm/tcg-target.c @@ -702,6 +702,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn) } } +bool tcg_target_deposit_valid(int ofs, int len) +{ + /* ??? Without bfi, we could improve over generic code by combining + the right-shift from a non-zero ofs with the orr. We do run into + problems when rd == rs, and the mask generated from ofs+len don't + fit into an immediate. We would have to be careful not to pessimize + wrt the optimizations performed on the expanded code. */ + return use_armv7_instructions; +} + +static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd, + TCGArg a1, int ofs, int len, bool const_a1) +{ + if (const_a1) { + uint32_t mask = (2u << (len - 1)) - 1; + a1 &= mask; + if (a1 == 0) { + /* bfi becomes bfc with rn == 15. */ + a1 = 15; + } else { + tcg_out_movi32(s, cond, TCG_REG_R8, a1); + a1 = TCG_REG_R8; + } + } + /* bfi/bfc */ + tcg_out32(s, 0x07c00010 | (cond << 28) | (rd << 12) | a1 + | (ofs << 7) | ((ofs + len - 1) << 16)); +} + static inline void tcg_out_ld32_12(TCGContext *s, int cond, int rd, int rn, tcg_target_long im) { @@ -1873,6 +1902,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, tcg_out_ext16u(s, COND_AL, args[0], args[1]); break; + case INDEX_op_deposit_i32: + tcg_out_deposit(s, COND_AL, args[0], args[2], + args[3], args[4], const_args[2]); + break; + default: tcg_abort(); } @@ -1957,6 +1991,8 @@ static const TCGTargetOpDef arm_op_defs[] = { { INDEX_op_ext16s_i32, { "r", "r" } }, { INDEX_op_ext16u_i32, { "r", "r" } }, + { INDEX_op_deposit_i32, { "r", "0", "ri" } }, + { -1 }, }; diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h index 354dd8a..209f585 100644 --- a/tcg/arm/tcg-target.h +++ b/tcg/arm/tcg-target.h @@ -71,10 +71,13 @@ typedef enum { #define TCG_TARGET_HAS_eqv_i32 0 #define TCG_TARGET_HAS_nand_i32 0 #define TCG_TARGET_HAS_nor_i32 0 -#define TCG_TARGET_HAS_deposit_i32 0 +#define TCG_TARGET_HAS_deposit_i32 1 #define TCG_TARGET_HAS_movcond_i32 1 #define TCG_TARGET_HAS_muls2_i32 1 +extern bool tcg_target_deposit_valid(int ofs, int len); +#define TCG_TARGET_deposit_i32_valid tcg_target_deposit_valid + enum { TCG_AREG0 = TCG_REG_R6, };
We have BFI and BFC available for implementing it. Signed-off-by: Richard Henderson <rth@twiddle.net> --- tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++ tcg/arm/tcg-target.h | 5 ++++- 2 files changed, 40 insertions(+), 1 deletion(-)