From patchwork Sat Jul 14 10:23:34 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: YeongKyoon Lee X-Patchwork-Id: 170996 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id E7B6D2C00FD for ; Sat, 14 Jul 2012 20:24:29 +1000 (EST) Received: from localhost ([::1]:53142 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SpzWW-0000jr-0L for incoming@patchwork.ozlabs.org; Sat, 14 Jul 2012 06:24:28 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39039) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SpzWF-0000VP-Om for qemu-devel@nongnu.org; Sat, 14 Jul 2012 06:24:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SpzWD-0004C7-HC for qemu-devel@nongnu.org; Sat, 14 Jul 2012 06:24:11 -0400 Received: from mailout1.samsung.com ([203.254.224.24]:62795) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SpzWD-0004Bt-0A for qemu-devel@nongnu.org; Sat, 14 Jul 2012 06:24:09 -0400 Received: from epcpsbgm2.samsung.com (mailout1.samsung.com [203.254.224.24]) by mailout1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0M7500BL6BJQWI50@mailout1.samsung.com> for qemu-devel@nongnu.org; Sat, 14 Jul 2012 19:23:50 +0900 (KST) X-AuditID: cbfee61b-b7f566d000005c8a-26-500148b67986 Received: from epmmp1.local.host ( [203.254.227.16]) by epcpsbgm2.samsung.com (EPCPMTA) with SMTP id 65.45.23690.6B841005; Sat, 14 Jul 2012 19:23:50 +0900 (KST) Received: from localhost.localdomain ([182.198.1.3]) by mmp1.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPA id <0M7500G5YBITGN60@mmp1.samsung.com> for qemu-devel@nongnu.org; Sat, 14 Jul 2012 19:23:50 +0900 (KST) From: Yeongkyoon Lee To: qemu-devel@nongnu.org Date: Sat, 14 Jul 2012 19:23:34 +0900 Message-id: <1342261414-6069-4-git-send-email-yeongkyoon.lee@samsung.com> X-Mailer: git-send-email 1.7.4.1 In-reply-to: <1342261414-6069-1-git-send-email-yeongkyoon.lee@samsung.com> References: <1342261414-6069-1-git-send-email-yeongkyoon.lee@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrEJMWRmVeSWpSXmKPExsVy+t9jAd1tHowBBrunqlsc793B4sDo8eTa ZqYAxigum5TUnMyy1CJ9uwSujNdLbrEWXN3NWHH09EuWBsZFrYxdjJwcEgImEj1vetkhbDGJ C/fWs3UxcnEICSxilNjz9BIrSEJIYB6TxKM/0SA2m4ChxMzzT5hAbBEBSYnfXaeZQWxmgQyJ m81XwAYJCxRJLNu+jqWLkYODRUBV4tNuS5Awr4CnxLpVMLsUJBbce8sGUsIp4CXRekwTYpOn xIT5D1kmMPIuYGRYxSiaWpBcUJyUnmukV5yYW1yal66XnJ+7iRHs82fSOxhXNVgcYhTgYFTi 4c00ZwwQYk0sK67MPcQowcGsJMKrxwUU4k1JrKxKLcqPLyrNSS0+xCjNwaIkzmvi/dVfSCA9 sSQ1OzW1ILUIJsvEwSnVwLiZN6curOXFncg/izunXD7vpO8e1uTUtIr7cPa7CRMTTyxUSDyR cfxVWFWc0qbe4jYJb/1ZOoZNEsHzV9zke2f/penvno0SWg8rHz3fybLHlF2Z0a8zVHP13QAR /i2amxVeTr/Hrp5vuC36dd73jqt3U3hvOJqls66fZrY3axnHo1Oi/nOWBSmxFGckGmoxFxUn AgC5P8oy9QEAAA== X-TM-AS-MML: No X-detected-operating-system: by eggs.gnu.org: Solaris 10 (1203?) X-Received-From: 203.254.224.24 Cc: laurent.desnogues@gmail.com, peter.maydell@linaro.org, Yeongkyoon Lee Subject: [Qemu-devel] [RFC][PATCH v3 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Add optimized TCG qemu_ld/st generation which locates the code of TLB miss cases at the end of a block after generating the other IRs. Currently, this optimization supports only i386 and x86_64 hosts. --- tcg/i386/tcg-target.c | 475 +++++++++++++++++++++++++++++++------------------ tcg/tcg.c | 12 ++ tcg/tcg.h | 35 ++++ 3 files changed, 353 insertions(+), 169 deletions(-) -- 1.7.4.1 diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c index da17bba..ead60d6 100644 --- a/tcg/i386/tcg-target.c +++ b/tcg/i386/tcg-target.c @@ -966,43 +966,53 @@ static void tcg_out_jmp(TCGContext *s, tcg_target_long dest) #include "../../softmmu_defs.h" #ifdef CONFIG_TCG_PASS_AREG0 -/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr, - int mmu_idx) */ +/* extended helper signature: ext_helper_ld_mmu(CPUState *env, + target_ulong addr, int mmu_idx, uintptr_t raddr) */ static const void *qemu_ld_helpers[4] = { - helper_ldb_mmu, - helper_ldw_mmu, - helper_ldl_mmu, - helper_ldq_mmu, + ext_helper_ldb_mmu, + ext_helper_ldw_mmu, + ext_helper_ldl_mmu, + ext_helper_ldq_mmu, }; -/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr, - uintxx_t val, int mmu_idx) */ +/* extended helper signature: ext_helper_st_mmu(CPUState *env, + target_ulong addr, uintxx_t val, int mmu_idx, uintptr_t raddr) */ static const void *qemu_st_helpers[4] = { - helper_stb_mmu, - helper_stw_mmu, - helper_stl_mmu, - helper_stq_mmu, + ext_helper_stb_mmu, + ext_helper_stw_mmu, + ext_helper_stl_mmu, + ext_helper_stq_mmu, }; #else -/* legacy helper signature: __ld_mmu(target_ulong addr, int - mmu_idx) */ +/* extended legacy helper signature: ext_ld_mmu(target_ulong addr, + int mmu_idx, uintptr_t raddr) */ static void *qemu_ld_helpers[4] = { - __ldb_mmu, - __ldw_mmu, - __ldl_mmu, - __ldq_mmu, + ext_ldb_mmu, + ext_ldw_mmu, + ext_ldl_mmu, + ext_ldq_mmu, }; -/* legacy helper signature: __st_mmu(target_ulong addr, uintxx_t val, - int mmu_idx) */ +/* extended legacy helper signature: ext_st_mmu(target_ulong addr, + uintxx_t val, int mmu_idx, uintptr_t raddr) */ static void *qemu_st_helpers[4] = { - __stb_mmu, - __stw_mmu, - __stl_mmu, - __stq_mmu, + ext_stb_mmu, + ext_stw_mmu, + ext_stl_mmu, + ext_stq_mmu, }; #endif +static void add_qemu_ldst_label(TCGContext *s, + int opc_ext, + int data_reg, + int data_reg2, + int addrlo_reg, + int addrhi_reg, + int mem_index, + uint8_t *raddr, + uint8_t **label_ptr); + /* Perform the TLB load and compare. Inputs: @@ -1061,19 +1071,21 @@ static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx, tcg_out_mov(s, type, r0, addrlo); - /* jne label1 */ - tcg_out8(s, OPC_JCC_short + JCC_JNE); + /* jne slow_path */ + /* XXX: How to avoid using OPC_JCC_long for peephole optimization? */ + tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); label_ptr[0] = s->code_ptr; - s->code_ptr++; + s->code_ptr += 4; if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { /* cmp 4(r1), addrhi */ tcg_out_modrm_offset(s, OPC_CMP_GvEv, args[addrlo_idx+1], r1, 4); - /* jne label1 */ - tcg_out8(s, OPC_JCC_short + JCC_JNE); + /* jne slow_path */ + /* XXX: How to avoid using OPC_JCC_long for peephole optimization? */ + tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); label_ptr[1] = s->code_ptr; - s->code_ptr++; + s->code_ptr += 4; } /* TLB Hit. */ @@ -1171,12 +1183,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int addrlo_idx; #if defined(CONFIG_SOFTMMU) int mem_index, s_bits; -#if TCG_TARGET_REG_BITS == 64 - int arg_idx; -#else - int stack_adjust; -#endif - uint8_t *label_ptr[3]; + uint8_t *label_ptr[2]; #endif data_reg = args[0]; @@ -1197,101 +1204,16 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, tcg_out_qemu_ld_direct(s, data_reg, data_reg2, tcg_target_call_iarg_regs[0], 0, opc); - /* jmp label2 */ - tcg_out8(s, OPC_JMP_short); - label_ptr[2] = s->code_ptr; - s->code_ptr++; - - /* TLB Miss. */ - - /* label1: */ - *label_ptr[0] = s->code_ptr - label_ptr[0] - 1; - if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { - *label_ptr[1] = s->code_ptr - label_ptr[1] - 1; - } - - /* XXX: move that code at the end of the TB */ -#if TCG_TARGET_REG_BITS == 32 - tcg_out_pushi(s, mem_index); - stack_adjust = 4; - if (TARGET_LONG_BITS == 64) { - tcg_out_push(s, args[addrlo_idx + 1]); - stack_adjust += 4; - } - tcg_out_push(s, args[addrlo_idx]); - stack_adjust += 4; -#ifdef CONFIG_TCG_PASS_AREG0 - tcg_out_push(s, TCG_AREG0); - stack_adjust += 4; -#endif -#else - /* The first argument is already loaded with addrlo. */ - arg_idx = 1; - tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[arg_idx], - mem_index); -#ifdef CONFIG_TCG_PASS_AREG0 - /* XXX/FIXME: suboptimal */ - tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], - tcg_target_call_iarg_regs[2]); - tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2], - tcg_target_call_iarg_regs[1]); - tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1], - tcg_target_call_iarg_regs[0]); - tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], - TCG_AREG0); -#endif -#endif - - tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]); - -#if TCG_TARGET_REG_BITS == 32 - if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) { - /* Pop and discard. This is 2 bytes smaller than the add. */ - tcg_out_pop(s, TCG_REG_ECX); - } else if (stack_adjust != 0) { - tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust); - } -#endif - - switch(opc) { - case 0 | 4: - tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW); - break; - case 1 | 4: - tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW); - break; - case 0: - tcg_out_ext8u(s, data_reg, TCG_REG_EAX); - break; - case 1: - tcg_out_ext16u(s, data_reg, TCG_REG_EAX); - break; - case 2: - tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX); - break; -#if TCG_TARGET_REG_BITS == 64 - case 2 | 4: - tcg_out_ext32s(s, data_reg, TCG_REG_EAX); - break; -#endif - case 3: - if (TCG_TARGET_REG_BITS == 64) { - tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX); - } else if (data_reg == TCG_REG_EDX) { - /* xchg %edx, %eax */ - tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0); - tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX); - } else { - tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX); - tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX); - } - break; - default: - tcg_abort(); - } - - /* label2: */ - *label_ptr[2] = s->code_ptr - label_ptr[2] - 1; + /* Record the current context of a load into ldst label */ + add_qemu_ldst_label(s, + opc, + data_reg, + data_reg2, + args[addrlo_idx], + args[addrlo_idx + 1], + mem_index, + s->code_ptr, + label_ptr); #else { int32_t offset = GUEST_BASE; @@ -1385,8 +1307,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int addrlo_idx; #if defined(CONFIG_SOFTMMU) int mem_index, s_bits; - int stack_adjust; - uint8_t *label_ptr[3]; + uint8_t *label_ptr[2]; #endif data_reg = args[0]; @@ -1407,34 +1328,242 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, tcg_out_qemu_st_direct(s, data_reg, data_reg2, tcg_target_call_iarg_regs[0], 0, opc); - /* jmp label2 */ - tcg_out8(s, OPC_JMP_short); - label_ptr[2] = s->code_ptr; - s->code_ptr++; + /* Record the current context of a store into ldst label */ + add_qemu_ldst_label(s, + opc | HL_ST_MASK, + data_reg, + data_reg2, + args[addrlo_idx], + args[addrlo_idx + 1], + mem_index, + s->code_ptr, + label_ptr); +#else + { + int32_t offset = GUEST_BASE; + int base = args[addrlo_idx]; + + if (TCG_TARGET_REG_BITS == 64) { + /* ??? We assume all operations have left us with register + contents that are zero extended. So far this appears to + be true. If we want to enforce this, we can either do + an explicit zero-extension here, or (if GUEST_BASE == 0) + use the ADDR32 prefix. For now, do nothing. */ + + if (offset != GUEST_BASE) { + tcg_out_movi(s, TCG_TYPE_I64, + tcg_target_call_iarg_regs[0], GUEST_BASE); + tgen_arithr(s, ARITH_ADD + P_REXW, + tcg_target_call_iarg_regs[0], base); + base = tcg_target_call_iarg_regs[0]; + offset = 0; + } + } + + tcg_out_qemu_st_direct(s, data_reg, data_reg2, base, offset, opc); + } +#endif +} + +#if defined(CONFIG_SOFTMMU) +/* + * Record the context of a call to the out of line helper code for the slow path + * for a load or store, so that we can later generate the correct helper code + */ +static void add_qemu_ldst_label(TCGContext *s, + int opc_ext, + int data_reg, + int data_reg2, + int addrlo_reg, + int addrhi_reg, + int mem_index, + uint8_t *raddr, + uint8_t **label_ptr) +{ + int idx; + TCGLabelQemuLdst *label; - /* TLB Miss. */ + if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) { + tcg_abort(); + } - /* label1: */ - *label_ptr[0] = s->code_ptr - label_ptr[0] - 1; + idx = s->nb_qemu_ldst_labels++; + label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx]; + label->opc_ext = opc_ext; + label->datalo_reg = data_reg; + label->datahi_reg = data_reg2; + label->addrlo_reg = addrlo_reg; + label->addrhi_reg = addrhi_reg; + label->mem_index = mem_index; + label->raddr = raddr; + label->label_ptr[0] = label_ptr[0]; if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { - *label_ptr[1] = s->code_ptr - label_ptr[1] - 1; + label->label_ptr[1] = label_ptr[1]; } +} - /* XXX: move that code at the end of the TB */ +/* + * Generate code for the slow path for a load at the end of block + */ +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label) +{ + int s_bits; + int opc = label->opc_ext & HL_OPC_MASK; + int mem_index = label->mem_index; #if TCG_TARGET_REG_BITS == 32 - tcg_out_pushi(s, mem_index); + int stack_adjust; + int addrlo_reg = label->addrlo_reg; + int addrhi_reg = label->addrhi_reg; +#endif + int data_reg = label->datalo_reg; + int data_reg2 = label->datahi_reg; + uint8_t *raddr = label->raddr; + uint8_t **label_ptr = &label->label_ptr[0]; + + s_bits = opc & 3; + + /* resolve label address */ + *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4); + if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { + *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4); + } + + /* extended legacy helper signature (w/o CONFIG_TCG_PASS_AREG0): + ext_ld_mmu(target_ulong addr, int mmu_idx, uintptr_t raddr) */ +#if TCG_TARGET_REG_BITS == 32 + tcg_out_pushi(s, (tcg_target_ulong)(raddr - 1)); /* return address */ stack_adjust = 4; + tcg_out_pushi(s, mem_index); /* mmu index */ + stack_adjust += 4; + if (TARGET_LONG_BITS == 64) { + tcg_out_push(s, addrhi_reg); + stack_adjust += 4; + } + tcg_out_push(s, addrlo_reg); /* guest addr */ + stack_adjust += 4; +#ifdef CONFIG_TCG_PASS_AREG0 + /* extended helper signature (w/ CONFIG_TCG_PASS_AREG0): + ext_helper_ld_mmu(CPUState *env, target_ulong addr, int mmu_idx, + uintptr_t raddr) */ + tcg_out_push(s, TCG_AREG0); + stack_adjust += 4; +#endif +#else + /* The first argument is already loaded with addrlo. */ + tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[1], + mem_index); + tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], + (tcg_target_ulong)(raddr - 1)); /* return address */ +#ifdef CONFIG_TCG_PASS_AREG0 + /* XXX/FIXME: suboptimal */ + tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], + tcg_target_call_iarg_regs[2]); + tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2], + tcg_target_call_iarg_regs[1]); + tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[1], + tcg_target_call_iarg_regs[0]); + tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], + TCG_AREG0); +#endif +#endif + + tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]); + +#if TCG_TARGET_REG_BITS == 32 + if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) { + /* Pop and discard. This is 2 bytes smaller than the add. */ + tcg_out_pop(s, TCG_REG_ECX); + } else if (stack_adjust != 0) { + tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust); + } +#endif + + switch (opc) { + case 0 | 4: + tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW); + break; + case 1 | 4: + tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW); + break; + case 0: + tcg_out_ext8u(s, data_reg, TCG_REG_EAX); + break; + case 1: + tcg_out_ext16u(s, data_reg, TCG_REG_EAX); + break; + case 2: + tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX); + break; +#if TCG_TARGET_REG_BITS == 64 + case 2 | 4: + tcg_out_ext32s(s, data_reg, TCG_REG_EAX); + break; +#endif + case 3: + if (TCG_TARGET_REG_BITS == 64) { + tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX); + } else if (data_reg == TCG_REG_EDX) { + /* xchg %edx, %eax */ + tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0); + tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX); + } else { + tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX); + tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX); + } + break; + default: + tcg_abort(); + } + + /* Jump back to the original code accessing a guest memory */ + tcg_out_jmp(s, (tcg_target_long) raddr); +} + +/* + * Generate code for the slow path for a store at the end of block + */ +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label) +{ + int s_bits; + int stack_adjust; + int opc = label->opc_ext & HL_OPC_MASK; + int mem_index = label->mem_index; + int data_reg = label->datalo_reg; +#if TCG_TARGET_REG_BITS == 32 + int data_reg2 = label->datahi_reg; + int addrlo_reg = label->addrlo_reg; + int addrhi_reg = label->addrhi_reg; +#endif + uint8_t *raddr = label->raddr; + uint8_t **label_ptr = &label->label_ptr[0]; + + s_bits = opc & 3; + + /* resolve label address */ + *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4); + if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) { + *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4); + } + + /* extended legacy helper signature (w/o CONFIG_TCG_PASS_AREG0): + ext_st_mmu(target_ulong addr, uintxx_t val, int mmu_idx, + uintptr_t raddr) */ +#if TCG_TARGET_REG_BITS == 32 + tcg_out_pushi(s, (tcg_target_ulong)(raddr - 1)); /* return address */ + stack_adjust = 4; + tcg_out_pushi(s, mem_index); /* mmu index */ + stack_adjust += 4; if (opc == 3) { tcg_out_push(s, data_reg2); stack_adjust += 4; } - tcg_out_push(s, data_reg); + tcg_out_push(s, data_reg); /* guest data */ stack_adjust += 4; if (TARGET_LONG_BITS == 64) { - tcg_out_push(s, args[addrlo_idx + 1]); + tcg_out_push(s, addrhi_reg); stack_adjust += 4; } - tcg_out_push(s, args[addrlo_idx]); + tcg_out_push(s, addrlo_reg); /* guest addr */ stack_adjust += 4; #ifdef CONFIG_TCG_PASS_AREG0 tcg_out_push(s, TCG_AREG0); @@ -1444,9 +1573,23 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32), tcg_target_call_iarg_regs[1], data_reg); tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], mem_index); +#if defined(_WIN64) + tcg_out_pushi(s, (tcg_target_ulong)(raddr - 1)); /* return address */ + stack_adjust += 8; +#else + tcg_out_movi(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], + (tcg_target_ulong)(raddr - 1)); /* return address */ stack_adjust = 0; +#endif #ifdef CONFIG_TCG_PASS_AREG0 + /* extended helper signature (w/ CONFIG_TCG_PASS_AREG0): + ext_helper_st_mmu(CPUState *env, target_ulong addr, uintxx_t val, + int mmu_idx, uintptr_t raddr) */ /* XXX/FIXME: suboptimal */ +#if !defined(_WIN64) + tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[4], + tcg_target_call_iarg_regs[3]); +#endif tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[3], tcg_target_call_iarg_regs[2]); tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2], @@ -1467,34 +1610,28 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust); } - /* label2: */ - *label_ptr[2] = s->code_ptr - label_ptr[2] - 1; -#else - { - int32_t offset = GUEST_BASE; - int base = args[addrlo_idx]; + /* Jump back to the original code accessing a guest memory */ + tcg_out_jmp(s, (tcg_target_long) raddr); +} - if (TCG_TARGET_REG_BITS == 64) { - /* ??? We assume all operations have left us with register - contents that are zero extended. So far this appears to - be true. If we want to enforce this, we can either do - an explicit zero-extension here, or (if GUEST_BASE == 0) - use the ADDR32 prefix. For now, do nothing. */ +/* + * Generate all of the slow paths of qemu_ld/st at the end of block + */ +void tcg_out_qemu_ldst_slow_path(TCGContext *s) +{ + int i; + TCGLabelQemuLdst *label; - if (offset != GUEST_BASE) { - tcg_out_movi(s, TCG_TYPE_I64, - tcg_target_call_iarg_regs[0], GUEST_BASE); - tgen_arithr(s, ARITH_ADD + P_REXW, - tcg_target_call_iarg_regs[0], base); - base = tcg_target_call_iarg_regs[0]; - offset = 0; + for (i = 0; i < s->nb_qemu_ldst_labels; i++) { + label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[i]; + if (IS_QEMU_LD_LABEL(label)) { + tcg_out_qemu_ld_slow_path(s, label); + } else { + tcg_out_qemu_st_slow_path(s, label); } } - - tcg_out_qemu_st_direct(s, data_reg, data_reg2, base, offset, opc); - } -#endif } +#endif /* CONFIG_SOFTMMU */ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args, const int *const_args) diff --git a/tcg/tcg.c b/tcg/tcg.c index 8386b70..bd4a0db 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -301,6 +301,13 @@ void tcg_func_start(TCGContext *s) gen_opc_ptr = gen_opc_buf; gen_opparam_ptr = gen_opparam_buf; +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) + /* Initialize qemu_ld/st labels to assist code generation at the end of TB + for TLB miss cases at the end of TB */ + s->qemu_ldst_labels = tcg_malloc(sizeof(TCGLabelQemuLdst) * + TCG_MAX_QEMU_LDST); + s->nb_qemu_ldst_labels = 0; +#endif } static inline void tcg_temp_alloc(TCGContext *s, int n) @@ -2169,6 +2176,11 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf, #endif } the_end: +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) + /* Generate slow paths of qemu_ld/st IRs which call MMU helpers at + the end of block */ + tcg_out_qemu_ldst_slow_path(s); +#endif return -1; } diff --git a/tcg/tcg.h b/tcg/tcg.h index d710694..1a8bb5e 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -187,6 +187,29 @@ typedef tcg_target_ulong TCGArg; are aliases for target_ulong and host pointer sized values respectively. */ +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) +/* Macros/structures for qemu_ld/st IR code optimization: + TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in exec-all.h. */ +#define TCG_MAX_QEMU_LDST 640 +#define HL_LDST_SHIFT 4 +#define HL_LDST_MASK (1 << HL_LDST_SHIFT) +#define HL_ST_MASK HL_LDST_MASK +#define HL_OPC_MASK (HL_LDST_MASK - 1) +#define IS_QEMU_LD_LABEL(L) (!((L)->opc_ext & HL_LDST_MASK)) +#define IS_QEMU_ST_LABEL(L) ((L)->opc_ext & HL_LDST_MASK) + +typedef struct TCGLabelQemuLdst { + int opc_ext; /* | 27bit(reserved) | 1bit(ld/st) | 4bit(opc) | */ + int addrlo_reg; /* reg index for low word of guest virtual addr */ + int addrhi_reg; /* reg index for high word of guest virtual addr */ + int datalo_reg; /* reg index for low word to be loaded or stored */ + int datahi_reg; /* reg index for high word to be loaded or stored */ + int mem_index; /* soft MMU memory index */ + uint8_t *raddr; /* gen code addr corresponding to qemu_ld/st IR */ + uint8_t *label_ptr[2]; /* label pointers to be updated */ +} TCGLabelQemuLdst; +#endif /* CONFIG_QEMU_LDST_OPTIMIZATION */ + #ifdef CONFIG_DEBUG_TCG #define DEBUG_TCGV 1 #endif @@ -389,6 +412,13 @@ struct TCGContext { #ifdef CONFIG_DEBUG_TCG int temps_in_use; #endif + +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) + /* labels info for qemu_ld/st IRs + The labels help to generate TLB miss case codes at the end of TB */ + TCGLabelQemuLdst *qemu_ldst_labels; + int nb_qemu_ldst_labels; +#endif }; extern TCGContext tcg_ctx; @@ -588,3 +618,8 @@ extern uint8_t code_gen_prologue[]; #endif void tcg_register_jit(void *buf, size_t buf_size); + +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) +/* Generate all of the slow paths of qemu_ld/st at the end of block */ +void tcg_out_qemu_ldst_slow_path(TCGContext *s); +#endif