From patchwork Wed Jan 6 00:31:11 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 42247 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id DF276B6EF0 for ; Wed, 6 Jan 2010 13:14:58 +1100 (EST) Received: from localhost ([127.0.0.1]:51311 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NSLQF-0007yz-0H for incoming@patchwork.ozlabs.org; Tue, 05 Jan 2010 21:14:55 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NSKLG-0001C1-Dh for qemu-devel@nongnu.org; Tue, 05 Jan 2010 20:05:42 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NSKLF-0001B5-18 for qemu-devel@nongnu.org; Tue, 05 Jan 2010 20:05:41 -0500 Received: from [199.232.76.173] (port=57915 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NSKLE-0001At-Sw for qemu-devel@nongnu.org; Tue, 05 Jan 2010 20:05:40 -0500 Received: from are.twiddle.net ([75.149.56.221]:43089) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NSKLE-00031N-3g for qemu-devel@nongnu.org; Tue, 05 Jan 2010 20:05:40 -0500 Received: by are.twiddle.net (Postfix, from userid 5000) id 9C9D2CBB; Tue, 5 Jan 2010 17:05:37 -0800 (PST) From: Richard Henderson Date: Tue, 5 Jan 2010 16:31:11 -0800 To: qemu-devel@nongnu.org Message-Id: <20100106010537.9C9D2CBB@are.twiddle.net> X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Cc: laurent.desnogues@gmail.com, aurelien@aurel32.net Subject: [Qemu-devel] [PATCH 2/2] tcg-x86_64: Avoid unnecessary REX.B prefixes. X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org A while ago Laurent pointed out that the setcc opcode emitted by the setcond patch had unnecessary REX prefixes. The existing P_REXB internal opcode flag unconditionally emits the REX prefix. Technically it's not needed if the register in question is %al, %bl, %cl, %dl. Eliding the prefix requires splitting the P_REXB flag into two, in order to indicate whether the byte register in question is in the REG or the R/M field. Within TCG, the byte register is in the REG field only for stores. Signed-off-by: Richard Henderson --- tcg/x86_64/tcg-target.c | 46 ++++++++++++++++++++++++++++++---------------- 1 files changed, 30 insertions(+), 16 deletions(-) diff --git a/tcg/x86_64/tcg-target.c b/tcg/x86_64/tcg-target.c index f584c94..8b6b68c 100644 --- a/tcg/x86_64/tcg-target.c +++ b/tcg/x86_64/tcg-target.c @@ -217,9 +217,10 @@ static inline int tcg_target_const_match(tcg_target_long val, #define JCC_JLE 0xe #define JCC_JG 0xf -#define P_EXT 0x100 /* 0x0f opcode prefix */ -#define P_REXW 0x200 /* set rex.w = 1 */ -#define P_REXB 0x400 /* force rex use for byte registers */ +#define P_EXT 0x100 /* 0x0f opcode prefix */ +#define P_REXW 0x200 /* set rex.w = 1 */ +#define P_REXB_R 0x400 /* REG field as byte register */ +#define P_REXB_RM 0x800 /* R/M field as byte register */ static const uint8_t tcg_cond_to_jcc[10] = { [TCG_COND_EQ] = JCC_JE, @@ -234,17 +235,30 @@ static const uint8_t tcg_cond_to_jcc[10] = { [TCG_COND_GTU] = JCC_JA, }; -static inline void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x) +static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x) { - int rex; - rex = ((opc >> 6) & 0x8) | ((r >> 1) & 0x4) | - ((x >> 2) & 2) | ((rm >> 3) & 1); - if (rex || (opc & P_REXB)) { + int rex = 0; + + rex |= (opc & P_REXW) >> 6; /* REX.W */ + rex |= (r & 8) >> 1; /* REX.R */ + rex |= (x & 8) >> 2; /* REX.X */ + rex |= (rm & 8) >> 3; /* REX.B */ + + /* P_REXB_{R,RM} indicates that the given register is the low byte. + For %[abcd]l we need no REX prefix, but for %{si,di,bp,sp}l we do, + as otherwise the encoding indicates %[abcd]h. Note that the values + that are ORed in merely indicate that the REX byte must be present; + those bits get discarded in output. */ + rex |= (r >= 4 ? opc & P_REXB_R : 0); + rex |= (rm >= 4 ? opc & P_REXB_RM : 0); + + if (rex) { tcg_out8(s, rex | 0x40); } - if (opc & P_EXT) + if (opc & P_EXT) { tcg_out8(s, 0x0f); - tcg_out8(s, opc & 0xff); + } + tcg_out8(s, opc); } static inline void tcg_out_modrm(TCGContext *s, int opc, int r, int rm) @@ -408,7 +422,7 @@ static inline void tgen_arithi32(TCGContext *s, int c, int r0, int32_t val) tcg_out8(s, val); } else if (c == ARITH_AND && val == 0xffu) { /* movzbl */ - tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, r0, r0); + tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB_RM, r0, r0); } else if (c == ARITH_AND && val == 0xffffu) { /* movzwl */ tcg_out_modrm(s, 0xb7 | P_EXT, r0, r0); @@ -776,7 +790,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, switch(opc) { case 0: /* movzbl */ - tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, TCG_REG_RSI, data_reg); + tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB_RM, TCG_REG_RSI, data_reg); break; case 1: /* movzwl */ @@ -829,7 +843,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, switch(opc) { case 0: /* movb */ - tcg_out_modrm_offset(s, 0x88 | P_REXB, data_reg, r0, offset); + tcg_out_modrm_offset(s, 0x88 | P_REXB_R, data_reg, r0, offset); break; case 1: if (bswap) { @@ -964,7 +978,7 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, case INDEX_op_st8_i32: case INDEX_op_st8_i64: /* movb */ - tcg_out_modrm_offset(s, 0x88 | P_REXB, args[0], args[1], args[2]); + tcg_out_modrm_offset(s, 0x88 | P_REXB_R, args[0], args[1], args[2]); break; case INDEX_op_st16_i32: case INDEX_op_st16_i64: @@ -1161,7 +1175,7 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, break; case INDEX_op_ext8s_i32: - tcg_out_modrm(s, 0xbe | P_EXT | P_REXB, args[0], args[1]); + tcg_out_modrm(s, 0xbe | P_EXT | P_REXB_RM, args[0], args[1]); break; case INDEX_op_ext16s_i32: tcg_out_modrm(s, 0xbf | P_EXT, args[0], args[1]); @@ -1177,7 +1191,7 @@ static inline void tcg_out_op(TCGContext *s, int opc, const TCGArg *args, break; case INDEX_op_ext8u_i32: case INDEX_op_ext8u_i64: - tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, args[0], args[1]); + tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB_RM, args[0], args[1]); break; case INDEX_op_ext16u_i32: case INDEX_op_ext16u_i64: