From patchwork Tue Feb 19 17:40:30 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 221919 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id B540A2C007E for ; Wed, 20 Feb 2013 10:46:09 +1100 (EST) Received: from localhost ([::1]:38688 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U7rDH-0007Cs-T1 for incoming@patchwork.ozlabs.org; Tue, 19 Feb 2013 12:42:43 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44639) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U7rD3-00072L-B3 for qemu-devel@nongnu.org; Tue, 19 Feb 2013 12:42:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1U7rCv-0008N4-Lu for qemu-devel@nongnu.org; Tue, 19 Feb 2013 12:42:29 -0500 Received: from mail-pb0-f53.google.com ([209.85.160.53]:54762) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1U7rCv-0008Mt-7l for qemu-devel@nongnu.org; Tue, 19 Feb 2013 12:42:21 -0500 Received: by mail-pb0-f53.google.com with SMTP id un1so2350410pbc.40 for ; Tue, 19 Feb 2013 09:42:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references; bh=AF5NFKZN0tKb51hSg/acuN4h1OhjjL0cXl+znaBa1JM=; b=RPFo2iIuB8H/GAQ5PbloRI4YMhAuFkkzVlp1yX4lmv8N/zz03V1UR7iDeelIx9lOn3 LZSgpCehswPCJj2V0DkhleDAsNsanQ/UfQeMrQyoPCXGNZ6eJ5+labbjSyVBfEDIk+I6 KeFCsmFvGBlZPKbgIy7CtK6iJwZNadkrc4bf73ceUmwBeay+XHeu1Y67WVo+4EcjWSWF oXBQgqUMvuybGw/JS700oBqni4OMbVNkc6oKE6dmj6ozfDiP4Uzt01mynmtxhFfaUHlH MJx7myf9X+J41YISpMWLzQsFLn4stQIPAfONGqmeYFjzh3RANcX3o0F87q7Rfk1ZP7E4 UAcA== X-Received: by 10.68.236.130 with SMTP id uu2mr42216673pbc.152.1361295740504; Tue, 19 Feb 2013 09:42:20 -0800 (PST) Received: from anchor.twiddle.net (50-194-63-110-static.hfc.comcastbusiness.net. [50.194.63.110]) by mx.google.com with ESMTPS id 1sm18659295pbg.18.2013.02.19.09.42.18 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 19 Feb 2013 09:42:19 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 19 Feb 2013 09:40:30 -0800 Message-Id: <1361295631-21316-57-git-send-email-rth@twiddle.net> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1361295631-21316-1-git-send-email-rth@twiddle.net> References: <1361295631-21316-1-git-send-email-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 209.85.160.53 Cc: blauwirbel@gmail.com, pbonzini@redhat.com, afaerber@suse.de, aurelien@aurel32.net Subject: [Qemu-devel] [PATCH 56/57] target-i386: Implement tzcnt and fix lzcnt X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org We weren't computing flags for lzcnt at all. At the same time, adjust the implementation of bsf/bsr to avoid the local branch, using movcond instead. Signed-off-by: Richard Henderson --- target-i386/helper.h | 5 ++- target-i386/int_helper.c | 11 ++----- target-i386/translate.c | 86 +++++++++++++++++++++++++++--------------------- 3 files changed, 54 insertions(+), 48 deletions(-) diff --git a/target-i386/helper.h b/target-i386/helper.h index e1ecdb8..26a0cc8 100644 --- a/target-i386/helper.h +++ b/target-i386/helper.h @@ -195,9 +195,8 @@ DEF_HELPER_3(frstor, void, env, tl, int) DEF_HELPER_3(fxsave, void, env, tl, int) DEF_HELPER_3(fxrstor, void, env, tl, int) -DEF_HELPER_FLAGS_1(bsf, TCG_CALL_NO_RWG_SE, tl, tl) -DEF_HELPER_FLAGS_1(bsr, TCG_CALL_NO_RWG_SE, tl, tl) -DEF_HELPER_FLAGS_2(lzcnt, TCG_CALL_NO_RWG_SE, tl, tl, int) +DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl) +DEF_HELPER_FLAGS_1(ctz, TCG_CALL_NO_RWG_SE, tl, tl) DEF_HELPER_FLAGS_2(pdep, TCG_CALL_NO_RWG_SE, tl, tl, tl) DEF_HELPER_FLAGS_2(pext, TCG_CALL_NO_RWG_SE, tl, tl, tl) diff --git a/target-i386/int_helper.c b/target-i386/int_helper.c index 7bec4eb..3b56075 100644 --- a/target-i386/int_helper.c +++ b/target-i386/int_helper.c @@ -456,19 +456,14 @@ void helper_idivq_EAX(CPUX86State *env, target_ulong t0) #endif /* bit operations */ -target_ulong helper_bsf(target_ulong t0) +target_ulong helper_ctz(target_ulong t0) { return ctztl(t0); } -target_ulong helper_lzcnt(target_ulong t0, int wordsize) +target_ulong helper_clz(target_ulong t0) { - return clztl(t0) - (TARGET_LONG_BITS - wordsize); -} - -target_ulong helper_bsr(target_ulong t0) -{ - return clztl(t0) ^ (TARGET_LONG_BITS - 1); + return clztl(t0); } target_ulong helper_pdep(target_ulong src, target_ulong mask) diff --git a/target-i386/translate.c b/target-i386/translate.c index 7edfb55..30e88da 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -7157,46 +7157,58 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, tcg_gen_movi_tl(cpu_cc_dst, 0); } break; - case 0x1bc: /* bsf */ - case 0x1bd: /* bsr */ - { - int label1; - TCGv t0; - - ot = dflag + OT_WORD; - modrm = cpu_ldub_code(env, s->pc++); - reg = ((modrm >> 3) & 7) | rex_r; - gen_ldst_modrm(env, s,modrm, ot, OR_TMP0, 0); - gen_extu(ot, cpu_T[0]); - t0 = tcg_temp_local_new(); - tcg_gen_mov_tl(t0, cpu_T[0]); - if ((b & 1) && (prefixes & PREFIX_REPZ) && - (s->cpuid_ext3_features & CPUID_EXT3_ABM)) { - switch(ot) { - case OT_WORD: gen_helper_lzcnt(cpu_T[0], t0, - tcg_const_i32(16)); break; - case OT_LONG: gen_helper_lzcnt(cpu_T[0], t0, - tcg_const_i32(32)); break; - case OT_QUAD: gen_helper_lzcnt(cpu_T[0], t0, - tcg_const_i32(64)); break; - } - gen_op_mov_reg_T0(ot, reg); + case 0x1bc: /* bsf / tzcnt */ + case 0x1bd: /* bsr / lzcnt */ + ot = dflag + OT_WORD; + modrm = cpu_ldub_code(env, s->pc++); + reg = ((modrm >> 3) & 7) | rex_r; + gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0); + gen_extu(ot, cpu_T[0]); + + /* Note that lzcnt and tzcnt are in different extensions. */ + if ((prefixes & PREFIX_REPZ) + && (b & 1 + ? s->cpuid_ext3_features & CPUID_EXT3_ABM + : s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1)) { + int size = 8 << ot; + tcg_gen_mov_tl(cpu_cc_src, cpu_T[0]); + if (b & 1) { + /* For lzcnt, reduce the target_ulong result by the + number of zeros that we expect to find at the top. */ + gen_helper_clz(cpu_T[0], cpu_T[0]); + tcg_gen_subi_tl(cpu_T[0], cpu_T[0], TARGET_LONG_BITS - size); } else { - label1 = gen_new_label(); - tcg_gen_movi_tl(cpu_cc_dst, 0); - tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, label1); - if (b & 1) { - gen_helper_bsr(cpu_T[0], t0); - } else { - gen_helper_bsf(cpu_T[0], t0); - } - gen_op_mov_reg_T0(ot, reg); - tcg_gen_movi_tl(cpu_cc_dst, 1); - gen_set_label(label1); - set_cc_op(s, CC_OP_LOGICB + ot); + /* For tzcnt, a zero input must return the operand size: + force all bits outside the operand size to 1. */ + target_ulong mask = (target_ulong)-2 << (size - 1); + tcg_gen_ori_tl(cpu_T[0], cpu_T[0], mask); + gen_helper_ctz(cpu_T[0], cpu_T[0]); + } + /* For lzcnt/tzcnt, C and Z bits are defined and are + related to the result. */ + gen_op_update1_cc(); + set_cc_op(s, CC_OP_BMILGB + ot); + } else { + /* For bsr/bsf, only the Z bit is defined and it is related + to the input and not the result. */ + tcg_gen_mov_tl(cpu_cc_dst, cpu_T[0]); + set_cc_op(s, CC_OP_LOGICB + ot); + if (b & 1) { + /* For bsr, return the bit index of the first 1 bit, + not the count of leading zeros. */ + gen_helper_clz(cpu_T[0], cpu_T[0]); + tcg_gen_xori_tl(cpu_T[0], cpu_T[0], TARGET_LONG_BITS - 1); + } else { + gen_helper_ctz(cpu_T[0], cpu_T[0]); } - tcg_temp_free(t0); + /* ??? The manual says that the output is undefined when the + input is zero, but real hardware leaves it unchanged, and + real programs appear to depend on that. */ + tcg_gen_movi_tl(cpu_tmp0, 0); + tcg_gen_movcond_tl(TCG_COND_EQ, cpu_T[0], cpu_cc_dst, cpu_tmp0, + cpu_regs[reg], cpu_T[0]); } + gen_op_mov_reg_T0(ot, reg); break; /************************/ /* bcd */