From patchwork Tue Sep 29 09:58:04 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Laurent Desnogues X-Patchwork-Id: 34415 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 3FB33B7C26 for ; Tue, 29 Sep 2009 19:59:01 +1000 (EST) Received: from localhost ([127.0.0.1]:38777 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MsZU1-0004i4-8s for incoming@patchwork.ozlabs.org; Tue, 29 Sep 2009 05:58:57 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MsZTE-0004hq-Cz for qemu-devel@nongnu.org; Tue, 29 Sep 2009 05:58:08 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MsZTD-0004he-DH for qemu-devel@nongnu.org; Tue, 29 Sep 2009 05:58:07 -0400 Received: from [199.232.76.173] (port=46107 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MsZTD-0004hb-57 for qemu-devel@nongnu.org; Tue, 29 Sep 2009 05:58:07 -0400 Received: from fg-out-1718.google.com ([72.14.220.155]:36291) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MsZTC-0001E7-DD for qemu-devel@nongnu.org; Tue, 29 Sep 2009 05:58:06 -0400 Received: by fg-out-1718.google.com with SMTP id e21so962552fga.10 for ; Tue, 29 Sep 2009 02:58:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=PTERzV2mxQoTZAw+Nb2L8VlHvZ5vSIVuWQgE4ZJaJ8w=; b=Q0pGwkOn5IUkLzgH1k0v3BAG4zeQeZZYa0OQ6gx5jJOY88BznLqdirNnpUGd7m5WWN p8qPgFUVZ83vwGU3P1SkRlVbk2AJnZwxZNM0q56oc8f1g1TuoNGar/inAGooY/+fYK0p 65dgZdXlNoQHSDA7bVKk8aP0J1X4mHFx4iU74= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=QcmEkGopKeHzSDniCrRuxYxW4Hwu1stDKEcMEJkPyH4xcxZQKSSSxr9mYn2R0MICFV t4pGMl8OxRoxCl3vD6PmmWOhF8UwWCx70Gw1cuk8K1AgBaWddLz95p8dxo4yztXpxez4 LVAK2ZAxmoKVJfHQ+ayCcfdLsIBkWjS0II+ac= MIME-Version: 1.0 Received: by 10.86.251.6 with SMTP id y6mr4145284fgh.24.1254218284917; Tue, 29 Sep 2009 02:58:04 -0700 (PDT) Date: Tue, 29 Sep 2009 11:58:04 +0200 Message-ID: <761ea48b0909290258x4313e24ahb8ca1e4237e62072@mail.gmail.com> From: Laurent Desnogues To: qemu-devel@nongnu.org X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Subject: [Qemu-devel] [PATCH] x86: use globals for CPU registers (rev 2) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Hello, this is a revision of my proposal to use globals for the 8 or 16 CPU registers on i386 and x86_64. The changes with my previous patch are: - take into account Aurélien's comments - get rid of USE_REGS define (thus the old way of accessing x86 is no more) - respect the original order of functions - simplify code in some places. It would probably be good if someone could give this a test for system mode, as I only tested user mode i386/x86_64 on a x86_64 host and i386 on i386. Laurent diff --git a/target-i386/translate.c b/target-i386/translate.c index 3eae9bc..5b11d7f 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -62,6 +62,7 @@ static TCGv_ptr cpu_env; static TCGv cpu_A0, cpu_cc_src, cpu_cc_dst, cpu_cc_tmp; static TCGv_i32 cpu_cc_op; +static TCGv cpu_regs[CPU_NB_REGS]; /* local temps */ static TCGv cpu_T[2], cpu_T3; /* local register indexes (only used inside old micro ops) */ @@ -271,32 +272,38 @@ static inline void gen_op_andl_A0_ffff(void) static inline void gen_op_mov_reg_v(int ot, int reg, TCGv t0) { + TCGv tmp; + switch(ot) { case OT_BYTE: + tmp = tcg_temp_new(); + tcg_gen_ext8u_tl(tmp, t0); if (reg < 4 X86_64_DEF( || reg >= 8 || x86_64_hregs)) { - tcg_gen_st8_tl(t0, cpu_env, offsetof(CPUState, regs[reg]) + REG_B_OFFSET); + tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xff); + tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp); } else { - tcg_gen_st8_tl(t0, cpu_env, offsetof(CPUState, regs[reg - 4]) + REG_H_OFFSET); + tcg_gen_shli_tl(tmp, tmp, 8); + tcg_gen_andi_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], ~0xff00); + tcg_gen_or_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], tmp); } + tcg_temp_free(tmp); break; case OT_WORD: - tcg_gen_st16_tl(t0, cpu_env, offsetof(CPUState, regs[reg]) + REG_W_OFFSET); + tmp = tcg_temp_new(); + tcg_gen_ext16u_tl(tmp, t0); + tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff); + tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp); + tcg_temp_free(tmp); break; -#ifdef TARGET_X86_64 + default: /* XXX this shouldn't be reached; abort? */ case OT_LONG: - tcg_gen_st32_tl(t0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); - /* high part of register set to zero */ - tcg_gen_movi_tl(cpu_tmp0, 0); - tcg_gen_st32_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg]) + REG_LH_OFFSET); + /* For x86_64, this sets the higher half of register to zero. + For i386, this is equivalent to a mov. */ + tcg_gen_ext32u_tl(cpu_regs[reg], t0); break; - default: +#ifdef TARGET_X86_64 case OT_QUAD: - tcg_gen_st_tl(t0, cpu_env, offsetof(CPUState, regs[reg])); - break; -#else - default: - case OT_LONG: - tcg_gen_st32_tl(t0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); + tcg_gen_mov_tl(cpu_regs[reg], t0); break; #endif } @@ -314,25 +321,25 @@ static inline void gen_op_mov_reg_T1(int ot, int reg) static inline void gen_op_mov_reg_A0(int size, int reg) { + TCGv tmp; + switch(size) { case 0: - tcg_gen_st16_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg]) + REG_W_OFFSET); + tmp = tcg_temp_new(); + tcg_gen_ext16u_tl(tmp, cpu_A0); + tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff); + tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp); + tcg_temp_free(tmp); break; -#ifdef TARGET_X86_64 + default: /* XXX this shouldn't be reached; abort? */ case 1: - tcg_gen_st32_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); - /* high part of register set to zero */ - tcg_gen_movi_tl(cpu_tmp0, 0); - tcg_gen_st32_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg]) + REG_LH_OFFSET); + /* For x86_64, this sets the higher half of register to zero. + For i386, this is equivalent to a mov. */ + tcg_gen_ext32u_tl(cpu_regs[reg], cpu_A0); break; - default: +#ifdef TARGET_X86_64 case 2: - tcg_gen_st_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg])); - break; -#else - default: - case 1: - tcg_gen_st32_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); + tcg_gen_mov_tl(cpu_regs[reg], cpu_A0); break; #endif } @@ -345,12 +352,13 @@ static inline void gen_op_mov_v_reg(int ot, TCGv t0, int reg) if (reg < 4 X86_64_DEF( || reg >= 8 || x86_64_hregs)) { goto std_case; } else { - tcg_gen_ld8u_tl(t0, cpu_env, offsetof(CPUState, regs[reg - 4]) + REG_H_OFFSET); + tcg_gen_shri_tl(t0, cpu_regs[reg - 4], 8); + tcg_gen_ext8u_tl(t0, t0); } break; default: std_case: - tcg_gen_ld_tl(t0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_mov_tl(t0, cpu_regs[reg]); break; } } @@ -362,7 +370,7 @@ static inline void gen_op_mov_TN_reg(int ot, int t_index, int reg) static inline void gen_op_movl_A0_reg(int reg) { - tcg_gen_ld32u_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); + tcg_gen_mov_tl(cpu_A0, cpu_regs[reg]); } static inline void gen_op_addl_A0_im(int32_t val) @@ -404,23 +412,21 @@ static inline void gen_op_add_reg_im(int size, int reg, int32_t val) { switch(size) { case 0: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, val); - tcg_gen_st16_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg]) + REG_W_OFFSET); + tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val); + tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0); + tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff); + tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0); break; case 1: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, val); -#ifdef TARGET_X86_64 - tcg_gen_andi_tl(cpu_tmp0, cpu_tmp0, 0xffffffff); -#endif - tcg_gen_st_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val); + /* For x86_64, this sets the higher half of register to zero. + For i386, this is equivalent to a nop. */ + tcg_gen_ext32u_tl(cpu_tmp0, cpu_tmp0); + tcg_gen_mov_tl(cpu_regs[reg], cpu_tmp0); break; #ifdef TARGET_X86_64 case 2: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, val); - tcg_gen_st_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_addi_tl(cpu_regs[reg], cpu_regs[reg], val); break; #endif } @@ -430,23 +436,21 @@ static inline void gen_op_add_reg_T0(int size, int reg) { switch(size) { case 0: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_add_tl(cpu_tmp0, cpu_tmp0, cpu_T[0]); - tcg_gen_st16_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg]) + REG_W_OFFSET); + tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]); + tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0); + tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff); + tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0); break; case 1: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_add_tl(cpu_tmp0, cpu_tmp0, cpu_T[0]); -#ifdef TARGET_X86_64 - tcg_gen_andi_tl(cpu_tmp0, cpu_tmp0, 0xffffffff); -#endif - tcg_gen_st_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]); + /* For x86_64, this sets the higher half of register to zero. + For i386, this is equivalent to a nop. */ + tcg_gen_ext32u_tl(cpu_tmp0, cpu_tmp0); + tcg_gen_mov_tl(cpu_regs[reg], cpu_tmp0); break; #ifdef TARGET_X86_64 case 2: - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - tcg_gen_add_tl(cpu_tmp0, cpu_tmp0, cpu_T[0]); - tcg_gen_st_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_add_tl(cpu_regs[reg], cpu_regs[reg], cpu_T[0]); break; #endif } @@ -459,13 +463,13 @@ static inline void gen_op_set_cc_op(int32_t val) static inline void gen_op_addl_A0_reg_sN(int shift, int reg) { - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - if (shift != 0) + tcg_gen_mov_tl(cpu_tmp0, cpu_regs[reg]); + if (shift != 0) tcg_gen_shli_tl(cpu_tmp0, cpu_tmp0, shift); tcg_gen_add_tl(cpu_A0, cpu_A0, cpu_tmp0); -#ifdef TARGET_X86_64 - tcg_gen_andi_tl(cpu_A0, cpu_A0, 0xffffffff); -#endif + /* For x86_64, this sets the higher half of register to zero. + For i386, this is equivalent to a nop. */ + tcg_gen_ext32u_tl(cpu_A0, cpu_A0); } static inline void gen_op_movl_A0_seg(int reg) @@ -496,13 +500,13 @@ static inline void gen_op_addq_A0_seg(int reg) static inline void gen_op_movq_A0_reg(int reg) { - tcg_gen_ld_tl(cpu_A0, cpu_env, offsetof(CPUState, regs[reg])); + tcg_gen_mov_tl(cpu_A0, cpu_regs[reg]); } static inline void gen_op_addq_A0_reg_sN(int shift, int reg) { - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg])); - if (shift != 0) + tcg_gen_mov_tl(cpu_tmp0, cpu_regs[reg]); + if (shift != 0) tcg_gen_shli_tl(cpu_tmp0, cpu_tmp0, shift); tcg_gen_add_tl(cpu_A0, cpu_A0, cpu_tmp0); } @@ -701,14 +705,14 @@ static void gen_exts(int ot, TCGv reg) static inline void gen_op_jnz_ecx(int size, int label1) { - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[R_ECX])); + tcg_gen_mov_tl(cpu_tmp0, cpu_regs[R_ECX]); gen_extu(size + 1, cpu_tmp0); tcg_gen_brcondi_tl(TCG_COND_NE, cpu_tmp0, 0, label1); } static inline void gen_op_jz_ecx(int size, int label1) { - tcg_gen_ld_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[R_ECX])); + tcg_gen_mov_tl(cpu_tmp0, cpu_regs[R_ECX]); gen_extu(size + 1, cpu_tmp0); tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_tmp0, 0, label1); } @@ -4834,8 +4838,7 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start) rm = 0; /* avoid warning */ } label1 = gen_new_label(); - tcg_gen_ld_tl(t2, cpu_env, offsetof(CPUState, regs[R_EAX])); - tcg_gen_sub_tl(t2, t2, t0); + tcg_gen_sub_tl(t2, cpu_regs[R_EAX], t0); gen_extu(ot, t2); tcg_gen_brcondi_tl(TCG_COND_EQ, t2, 0, label1); if (mod == 3) { @@ -5409,7 +5412,7 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start) val = ldub_code(s->pc++); tcg_gen_movi_tl(cpu_T3, val); } else { - tcg_gen_ld_tl(cpu_T3, cpu_env, offsetof(CPUState, regs[R_ECX])); + tcg_gen_mov_tl(cpu_T3, cpu_regs[R_ECX]); } gen_shiftd_rm_T1_T3(s, ot, opreg, op); break; @@ -6317,10 +6320,9 @@ static target_ulong disas_insn(DisasContext *s, target_ulong pc_start) /* XXX: specific Intel behaviour ? */ l1 = gen_new_label(); gen_jcc1(s, s->cc_op, b ^ 1, l1); - tcg_gen_st32_tl(t0, cpu_env, offsetof(CPUState, regs[reg]) + REG_L_OFFSET); + tcg_gen_mov_tl(cpu_regs[reg], t0); gen_set_label(l1); - tcg_gen_movi_tl(cpu_tmp0, 0); - tcg_gen_st32_tl(cpu_tmp0, cpu_env, offsetof(CPUState, regs[reg]) + REG_LH_OFFSET); + tcg_gen_ext32u_tl(cpu_regs[reg], cpu_regs[reg]); } else #endif { @@ -7588,6 +7590,58 @@ void optimize_flags_init(void) cpu_cc_tmp = tcg_global_mem_new(TCG_AREG0, offsetof(CPUState, cc_tmp), "cc_tmp"); +#ifdef TARGET_X86_64 + cpu_regs[R_EAX] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_EAX]), "rax"); + cpu_regs[R_ECX] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_ECX]), "rcx"); + cpu_regs[R_EDX] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_EDX]), "rdx"); + cpu_regs[R_EBX] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_EBX]), "rbx"); + cpu_regs[R_ESP] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_ESP]), "rsp"); + cpu_regs[R_EBP] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_EBP]), "rbp"); + cpu_regs[R_ESI] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_ESI]), "rsi"); + cpu_regs[R_EDI] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[R_EDI]), "rdi"); + cpu_regs[8] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[8]), "r8"); + cpu_regs[9] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[9]), "r9"); + cpu_regs[10] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[10]), "r10"); + cpu_regs[11] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[11]), "r11"); + cpu_regs[12] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[12]), "r12"); + cpu_regs[13] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[13]), "r13"); + cpu_regs[14] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[14]), "r14"); + cpu_regs[15] = tcg_global_mem_new_i64(TCG_AREG0, + offsetof(CPUState, regs[15]), "r15"); +#else + cpu_regs[R_EAX] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_EAX]), "eax"); + cpu_regs[R_ECX] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_ECX]), "ecx"); + cpu_regs[R_EDX] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_EDX]), "edx"); + cpu_regs[R_EBX] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_EBX]), "ebx"); + cpu_regs[R_ESP] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_ESP]), "esp"); + cpu_regs[R_EBP] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_EBP]), "ebp"); + cpu_regs[R_ESI] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_ESI]), "esi"); + cpu_regs[R_EDI] = tcg_global_mem_new_i32(TCG_AREG0, + offsetof(CPUState, regs[R_EDI]), "edi"); +#endif + /* register helpers */ #define GEN_HELPER 2 #include "helper.h"