From patchwork Thu Oct 31 20:22:04 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 287613 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id C91A52C011F for ; Fri, 1 Nov 2013 07:42:06 +1100 (EST) Received: from localhost ([::1]:59155 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vbyne-0005WS-NH for incoming@patchwork.ozlabs.org; Thu, 31 Oct 2013 16:25:02 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53237) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbylN-00033Z-KB for qemu-devel@nongnu.org; Thu, 31 Oct 2013 16:22:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VbylH-0000TD-J3 for qemu-devel@nongnu.org; Thu, 31 Oct 2013 16:22:41 -0400 Received: from mail-pd0-x230.google.com ([2607:f8b0:400e:c02::230]:58219) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbylH-0000T5-7Z for qemu-devel@nongnu.org; Thu, 31 Oct 2013 16:22:35 -0400 Received: by mail-pd0-f176.google.com with SMTP id g10so2873358pdj.21 for ; Thu, 31 Oct 2013 13:22:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=bBM7WxfjJSAYrhs6h4hJ+kY8lgqiSagvs2pQwMLMy5I=; b=zn9AzWRX15NiAWJGV8Mo09BtKBLeeDabKfOrbXd5ZtwtGlZE2lHdlLmZPmxQO5s+3i 7DY3QgfNSZ5McYY8Hj0HJjw8+l2yWZ2FC6mkVQpJBT1k/hvoqehy499GPQ1TeqWNQo6L yUi70yOOgglQVuQQoBeskOQ4DJv9hvaTDT3hyu9lb8alFYPcgWk+beJ/ZQsAZ4D3qFCR kTdOMC0GCKEPl9bjOJ8pM/18ytoud4TcmKNF7Hs2PBKLqn3Cu1a0Q4fxCCmvwzdqZFh2 2RU3Rpd6Kl25kC4U5Y2b69zAMuYoh1s8Nexnew2Q+6Ium3Cgjw6c4jTKpMix2BN6JR9E VxBA== X-Received: by 10.66.234.131 with SMTP id ue3mr3781970pac.35.1383250954351; Thu, 31 Oct 2013 13:22:34 -0700 (PDT) Received: from pebble.twiddle.net (50-194-63-110-static.hfc.comcastbusiness.net. [50.194.63.110]) by mx.google.com with ESMTPSA id ve9sm6109179pbc.19.2013.10.31.13.22.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Oct 2013 13:22:33 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 31 Oct 2013 13:22:04 -0700 Message-Id: <1383250929-10288-16-git-send-email-rth@twiddle.net> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1383250929-10288-1-git-send-email-rth@twiddle.net> References: <1383250929-10288-1-git-send-email-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:400e:c02::230 Cc: aliguori@amazon.com, aurelien@aurel32.net Subject: [Qemu-devel] [PATCH 15/20] tcg-ia64: Move bswap for store into tlb load X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Saving at least two cycles per store, and cleaning up the code. Signed-off-by: Richard Henderson --- tcg/ia64/tcg-target.c | 96 +++++++++++++++++---------------------------------- 1 file changed, 32 insertions(+), 64 deletions(-) diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c index b4bb305..985e213 100644 --- a/tcg/ia64/tcg-target.c +++ b/tcg/ia64/tcg-target.c @@ -1565,9 +1565,11 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1]) /* Load and compare a TLB entry, and return the result in (p6, p7). R2 is loaded with the address of the addend TLB entry. R57 is loaded with the address, zero extented on 32-bit targets. - R1, R3 are clobbered. */ + R1, R3 are clobbered, leaving R56 free for... + BSWAP_1, BSWAP_2 and I-slot insns for swapping data for store. */ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg, - TCGMemOp s_bits, int off_rw, int off_add) + TCGMemOp s_bits, int off_rw, int off_add, + uint64_t bswap1, uint64_t bswap2) { /* .mii @@ -1615,12 +1617,12 @@ static inline void tcg_out_qemu_tlb(TCGContext *s, TCGReg addr_reg, (TARGET_LONG_BITS == 32 ? OPC_LD4_M3 : OPC_LD8_M3), TCG_REG_R3, TCG_REG_R2, off_add - off_rw), - INSN_NOP_I); + bswap1); tcg_out_bundle(s, mmI, INSN_NOP_M, tcg_opc_a6 (TCG_REG_P0, OPC_CMP_EQ_A6, TCG_REG_P6, TCG_REG_P7, TCG_REG_R1, TCG_REG_R3), - INSN_NOP_I); + bswap2); } /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr, @@ -1650,7 +1652,8 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, /* Read the TLB entry */ tcg_out_qemu_tlb(s, addr_reg, s_bits, offsetof(CPUArchState, tlb_table[mem_index][0].addr_read), - offsetof(CPUArchState, tlb_table[mem_index][0].addend)); + offsetof(CPUArchState, tlb_table[mem_index][0].addend), + INSN_NOP_I, INSN_NOP_I); /* P6 is the fast path, and P7 the slow path */ tcg_out_bundle(s, mLX, @@ -1721,17 +1724,31 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, static const uint64_t opc_st_m4[4] = { OPC_ST1_M4, OPC_ST2_M4, OPC_ST4_M4, OPC_ST8_M4 }; - int addr_reg, data_reg, mem_index; + TCGReg addr_reg, data_reg, store_reg; + int mem_index; + uint64_t bswap1, bswap2; TCGMemOp s_bits; - data_reg = *args++; + store_reg = data_reg = *args++; addr_reg = *args++; mem_index = *args; s_bits = opc & MO_SIZE; + bswap1 = bswap2 = INSN_NOP_I; + if (opc & MO_BSWAP) { + store_reg = TCG_REG_R56; + bswap1 = tcg_opc_bswap64_i(TCG_REG_P0, store_reg, data_reg); + if (s_bits < MO_64) { + int shift = 64 - (8 << s_bits); + bswap2 = tcg_opc_i11(TCG_REG_P0, OPC_EXTR_U_I11, + store_reg, store_reg, shift, 63 - shift); + } + } + tcg_out_qemu_tlb(s, addr_reg, s_bits, offsetof(CPUArchState, tlb_table[mem_index][0].addr_write), - offsetof(CPUArchState, tlb_table[mem_index][0].addend)); + offsetof(CPUArchState, tlb_table[mem_index][0].addend), + bswap1, bswap2); /* P6 is the fast path, and P7 the slow path */ tcg_out_bundle(s, mLX, @@ -1746,63 +1763,14 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCG_REG_R3, TCG_REG_R57), tcg_opc_i21(TCG_REG_P7, OPC_MOV_I21, TCG_REG_B6, TCG_REG_R3, 0)); - - switch (opc) { - case MO_8: - case MO_16: - case MO_32: - case MO_64: - tcg_out_bundle(s, mii, - tcg_opc_m1 (TCG_REG_P7, OPC_LD8_M1, - TCG_REG_R1, TCG_REG_R2), - tcg_opc_mov_a(TCG_REG_P7, TCG_REG_R58, data_reg), - INSN_NOP_I); - break; - - case MO_16 | MO_BSWAP: - tcg_out_bundle(s, miI, - tcg_opc_m1 (TCG_REG_P7, OPC_LD8_M1, - TCG_REG_R1, TCG_REG_R2), - INSN_NOP_I, - tcg_opc_i12(TCG_REG_P6, OPC_DEP_Z_I12, - TCG_REG_R2, data_reg, 15, 15)); - tcg_out_bundle(s, miI, - tcg_opc_mov_a(TCG_REG_P7, TCG_REG_R58, data_reg), - INSN_NOP_I, - tcg_opc_bswap64_i(TCG_REG_P6, TCG_REG_R2, TCG_REG_R2)); - data_reg = TCG_REG_R2; - break; - - case MO_32 | MO_BSWAP: - tcg_out_bundle(s, miI, - tcg_opc_m1 (TCG_REG_P7, OPC_LD8_M1, - TCG_REG_R1, TCG_REG_R2), - INSN_NOP_I, - tcg_opc_i12(TCG_REG_P6, OPC_DEP_Z_I12, - TCG_REG_R2, data_reg, 31, 31)); - tcg_out_bundle(s, miI, - tcg_opc_mov_a(TCG_REG_P7, TCG_REG_R58, data_reg), - INSN_NOP_I, - tcg_opc_bswap64_i(TCG_REG_P6, TCG_REG_R2, TCG_REG_R2)); - data_reg = TCG_REG_R2; - break; - - case MO_64 | MO_BSWAP: - tcg_out_bundle(s, miI, - tcg_opc_m1 (TCG_REG_P7, OPC_LD8_M1, - TCG_REG_R1, TCG_REG_R2), - tcg_opc_mov_a(TCG_REG_P7, TCG_REG_R58, data_reg), - tcg_opc_bswap64_i(TCG_REG_P6, TCG_REG_R2, data_reg)); - data_reg = TCG_REG_R2; - break; - - default: - tcg_abort(); - } - + tcg_out_bundle(s, mii, + tcg_opc_m1 (TCG_REG_P7, OPC_LD8_M1, + TCG_REG_R1, TCG_REG_R2), + tcg_opc_mov_a(TCG_REG_P7, TCG_REG_R58, data_reg), + INSN_NOP_I); tcg_out_bundle(s, miB, - tcg_opc_m4 (TCG_REG_P6, opc_st_m4[opc], - data_reg, TCG_REG_R3), + tcg_opc_m4 (TCG_REG_P6, opc_st_m4[s_bits], + store_reg, TCG_REG_R3), tcg_opc_movi_a(TCG_REG_P7, TCG_REG_R59, mem_index), tcg_opc_b5 (TCG_REG_P7, OPC_BR_CALL_SPTK_MANY_B5, TCG_REG_B0, TCG_REG_B6));