From patchwork Tue Jan 25 22:07:15 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 80410 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 0DC271007D1 for ; Wed, 26 Jan 2011 09:10:37 +1100 (EST) Received: from localhost ([127.0.0.1]:40382 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Phr4J-0006eO-Ug for incoming@patchwork.ozlabs.org; Tue, 25 Jan 2011 17:08:55 -0500 Received: from [140.186.70.92] (port=50344 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Phr34-0005vc-Qz for qemu-devel@nongnu.org; Tue, 25 Jan 2011 17:07:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Phr32-0005vd-Q8 for qemu-devel@nongnu.org; Tue, 25 Jan 2011 17:07:38 -0500 Received: from a.mail.sonic.net ([64.142.16.245]:37876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Phr32-0005uR-Em for qemu-devel@nongnu.org; Tue, 25 Jan 2011 17:07:36 -0500 Received: from are.twiddle.net (are.twiddle.net [75.101.38.216]) by a.mail.sonic.net (8.13.8.Beta0-Sonic/8.13.7) with ESMTP id p0PM7HB4019999 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 25 Jan 2011 14:07:17 -0800 Received: from anchor.twiddle.home (anchor.twiddle.home [172.31.0.4]) by are.twiddle.net (8.14.4/8.14.4) with ESMTP id p0PM7FOF017617; Tue, 25 Jan 2011 14:07:15 -0800 Message-ID: <4D3F4993.4010109@twiddle.net> Date: Tue, 25 Jan 2011 14:07:15 -0800 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Thunderbird/3.1.7 MIME-Version: 1.0 To: "Edgar E. Iglesias" Subject: Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation. References: <1294716228-9299-1-git-send-email-rth@twiddle.net> <1294716228-9299-6-git-send-email-rth@twiddle.net> <20110125122749.GA19736@edde.se.axis.com> <4D3EF6C1.3080502@twiddle.net> <20110125164816.GA23569@laped.lan> In-Reply-To: <20110125164816.GA23569@laped.lan> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4-2.6 Cc: qemu-devel@nongnu.org, aurelien@aurel32.net, agraf@suse.de X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On 01/25/2011 08:48 AM, Edgar E. Iglesias wrote: > OK, I see. Maybe we should try to emit an insn sequence more similar > to what tcg was emitting (for the non 8 & 16-bit deposits)? > That ought too at least give similar results as before for those and > give us a speedup for the byte and word moves. Please try this replacement version for tcg/i386/*. I was able to run your cris testsuite, and stuff looked ok there. But for some reason the microblaze kernel would not boot. It seems that the kernel commandline isn't in place properly and it isn't finding the disk image. r~ diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c index bb19a95..ec49b34 100644 --- a/tcg/i386/tcg-target.c +++ b/tcg/i386/tcg-target.c @@ -258,7 +258,8 @@ static inline int tcg_target_const_match(tcg_target_long val, #define OPC_JMP_long (0xe9) #define OPC_JMP_short (0xeb) #define OPC_LEA (0x8d) -#define OPC_MOVB_EvGv (0x88) /* stores, more or less */ +#define OPC_MOVB_EbGb (0x88) /* stores, more or less */ +#define OPC_MOVB_GbEb (0x8a) /* loads, more or less */ #define OPC_MOVL_EvGv (0x89) /* stores, more or less */ #define OPC_MOVL_GvEv (0x8b) /* loads, more or less */ #define OPC_MOVL_EvIz (0xc7) @@ -277,6 +278,7 @@ static inline int tcg_target_const_match(tcg_target_long val, #define OPC_SHIFT_1 (0xd1) #define OPC_SHIFT_Ib (0xc1) #define OPC_SHIFT_cl (0xd3) +#define OPC_SHRD_Ib (0xac | P_EXT) #define OPC_TESTL (0x85) #define OPC_XCHG_ax_r32 (0x90) @@ -710,6 +712,107 @@ static void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val) } } +static void tcg_out_deposit(TCGContext *s, int inout, int val, + unsigned ofs, unsigned len, int rexw) +{ + TCGType type = rexw ? TCG_TYPE_I64 : TCG_TYPE_I32; + tcg_target_ulong imask, vmask; + TCGRegSet live; + int scratch; + + /* Look for MOVB w/ %reg_h special case. */ + if (ofs == 8 && len == 8 && inout < 4 && val < 4) { + tcg_out_modrm(s, OPC_MOVB_GbEb, inout + 4, val); + return; + } + + /* Look for MOVB/MOVW special cases. */ + if (len == 16 + || (len == 8 + && (TCG_TARGET_REG_BITS == 64 || (inout < 4 && val < 4)))) { + /* If the offset is non-zero, and we have a deposit from self, + then we need a tempoarary. */ + if (ofs != 0 && inout == val) { + tcg_regset_clear(live); + tcg_regset_set_reg(live, inout); + val = tcg_scratch_alloc(s, type, live); + tcg_out_mov(s, type, val, inout); + } + + /* If the offset is non-zero, rotate the destination into place. */ + if (ofs != 0) { + tcg_out_shifti(s, SHIFT_ROR + rexw, inout, ofs); + } + + if (len == 8) { + tcg_out_modrm(s, OPC_MOVB_GbEb + P_REXB_R + P_REXB_RM, inout, val); + } else { + tcg_out_modrm(s, OPC_MOVL_GvEv + P_DATA16, inout, val); + } + + /* Restore the destination to its proper location. */ + if (ofs != 0) { + tcg_out_shifti(s, SHIFT_ROL + rexw, inout, ofs); + } + return; + } + + /* Otherwise we can't support this operation natively. It's possible to + play tricks with rotates and shld in order to implement this. While + this is much smaller than masks, but it turns out that shld is too slow + on many cpus. */ + tcg_regset_clear(live); + tcg_regset_set_reg(live, inout); + tcg_regset_set_reg(live, val); + scratch = tcg_scratch_alloc(s, type, live); + + vmask = ((tcg_target_ulong)1 << len) - 1; + imask = ~(vmask << ofs); + + /* Careful, some 64-bit masks cannot use immediate operands. */ + if (type == TCG_TYPE_I64 && imask != (int32_t)imask) { + bool val_scratch = false; + + /* Since we are going to clobber INOUT first, the destination + bitfield cannot overlap the input bits. */ + if (inout == val && ofs < len) { + tcg_regset_set_reg(live, scratch); + val = tcg_scratch_alloc(s, type, live); + tcg_out_mov(s, type, val, inout); + val_scratch = true; + } + + tcg_out_movi(s, type, scratch, imask); + tgen_arithr(s, ARITH_AND + rexw, inout, scratch); + + if (vmask > 0xffffffffu) { + tcg_out_movi(s, type, scratch, vmask); + tgen_arithr(s, ARITH_AND + P_REXW, scratch, val); + } else { + if (val_scratch) { + scratch = val; + } else { + tcg_out_mov(s, TCG_TYPE_I32, scratch, val); + } + tgen_arithi(s, ARITH_AND, scratch, vmask, 0); + } + + tcg_out_shifti(s, SHIFT_SHL + P_REXW, scratch, ofs); + tgen_arithr(s, ARITH_OR + P_REXW, inout, scratch); + return; + } + + /* Both IMASK and VMASK are valid immediate operands, which means that + VAL may be treated as a 32-bit value. */ + tcg_out_mov(s, TCG_TYPE_I32, scratch, val); + tgen_arithi(s, ARITH_AND, scratch, vmask, 0); + tcg_out_shifti (s, SHIFT_SHL + rexw, scratch, ofs); + + tgen_arithi(s, ARITH_AND + rexw, inout, imask, 0); + tgen_arithr(s, ARITH_OR + rexw, inout, scratch); +} + + /* Use SMALL != 0 to force a short forward branch. */ static void tcg_out_jxx(TCGContext *s, int opc, int label_index, int small) { @@ -1266,7 +1369,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi, switch (sizeop) { case 0: - tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R, datalo, base, ofs); + tcg_out_modrm_offset(s, OPC_MOVB_EbGb + P_REXB_R, datalo, base, ofs); break; case 1: if (bswap) { @@ -1504,7 +1607,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, break; OP_32_64(st8): - tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R, + tcg_out_modrm_offset(s, OPC_MOVB_EbGb | P_REXB_R, args[0], args[1], args[2]); break; OP_32_64(st16): @@ -1603,6 +1706,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, } break; + OP_32_64(deposit): + tcg_out_deposit(s, args[0], args[2], args[3], args[4], rexw); + break; + case INDEX_op_brcond_i32: tcg_out_brcond32(s, args[2], args[0], args[1], const_args[1], args[3], 0); @@ -1783,6 +1890,7 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_sar_i32, { "r", "0", "ci" } }, { INDEX_op_rotl_i32, { "r", "0", "ci" } }, { INDEX_op_rotr_i32, { "r", "0", "ci" } }, + { INDEX_op_deposit_i32, { "r", "0", "r" } }, { INDEX_op_brcond_i32, { "r", "ri" } }, @@ -1835,6 +1943,7 @@ static const TCGTargetOpDef x86_op_defs[] = { { INDEX_op_sar_i64, { "r", "0", "ci" } }, { INDEX_op_rotl_i64, { "r", "0", "ci" } }, { INDEX_op_rotr_i64, { "r", "0", "ci" } }, + { INDEX_op_deposit_i64, { "r", "0", "r" } }, { INDEX_op_brcond_i64, { "r", "re" } }, { INDEX_op_setcond_i64, { "r", "r", "re" } }, diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index bfafbfc..9f90d17 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -77,6 +77,7 @@ enum { /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 #define TCG_TARGET_HAS_rot_i32 +#define TCG_TARGET_HAS_deposit_i32 #define TCG_TARGET_HAS_ext8s_i32 #define TCG_TARGET_HAS_ext16s_i32 #define TCG_TARGET_HAS_ext8u_i32 @@ -94,6 +95,7 @@ enum { #if TCG_TARGET_REG_BITS == 64 #define TCG_TARGET_HAS_div2_i64 #define TCG_TARGET_HAS_rot_i64 +#define TCG_TARGET_HAS_deposit_i64 #define TCG_TARGET_HAS_ext8s_i64 #define TCG_TARGET_HAS_ext16s_i64 #define TCG_TARGET_HAS_ext32s_i64