From patchwork Wed Oct 21 10:18:10 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Juha.Riihimaki@nokia.com X-Patchwork-Id: 36523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 1AD26B7B9A for ; Wed, 21 Oct 2009 21:38:45 +1100 (EST) Received: from localhost ([127.0.0.1]:46032 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N0YaY-0005Wq-0g for incoming@patchwork.ozlabs.org; Wed, 21 Oct 2009 06:38:42 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N0YGe-0006OX-IR for qemu-devel@nongnu.org; Wed, 21 Oct 2009 06:18:08 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N0YGY-0006Ko-Ub for qemu-devel@nongnu.org; Wed, 21 Oct 2009 06:18:07 -0400 Received: from [199.232.76.173] (port=43282 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N0YGY-0006KV-JQ for qemu-devel@nongnu.org; Wed, 21 Oct 2009 06:18:02 -0400 Received: from smtp.nokia.com ([192.100.122.233]:27359 helo=mgw-mx06.nokia.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N0YGX-0000Wh-CP for qemu-devel@nongnu.org; Wed, 21 Oct 2009 06:18:02 -0400 Received: from esebh106.NOE.Nokia.com (esebh106.ntc.nokia.com [172.21.138.213]) by mgw-mx06.nokia.com (Switch-3.3.3/Switch-3.3.3) with ESMTP id n9LAHjbk013269 for ; Wed, 21 Oct 2009 13:17:58 +0300 Received: from esebh102.NOE.Nokia.com ([172.21.138.183]) by esebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 21 Oct 2009 13:17:57 +0300 Received: from smtp.mgd.nokia.com ([65.54.30.5]) by esebh102.NOE.Nokia.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Wed, 21 Oct 2009 13:17:57 +0300 Received: from NOK-EUMSG-05.mgdnok.nokia.com ([65.54.30.90]) by nok-am1mhub-01.mgdnok.nokia.com ([65.54.30.5]) with mapi; Wed, 21 Oct 2009 12:17:56 +0200 From: To: Date: Wed, 21 Oct 2009 12:18:10 +0200 Thread-Topic: [PATCH 11/12] target-arm: optimize neon vld/vst ops Thread-Index: AcpSN8HFbxJ8gXqtQ1W6Xfhofx7miQ== Message-ID: Accept-Language: en, en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: acceptlanguage: en, en-US MIME-Version: 1.0 X-OriginalArrivalTime: 21 Oct 2009 10:17:57.0168 (UTC) FILETIME=[C266DB00:01CA5237] X-Nokia-AV: Clean X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 1) Subject: [Qemu-devel] [PATCH 11/12] target-arm: optimize neon vld/vst ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Reduce the amount of tcg ops generated from NEON vld/vst instructions by simplifying the code generation. Signed-off-by: Juha Riihimäki --- return 1; @@ -3710,6 +3711,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) interleave = neon_ls_element_type[op].interleave; load_reg_var(s, addr, rn); stride = (1 << size) * interleave; + stride_v = tcg_const_i32(stride); for (reg = 0; reg < nregs; reg++) { if (interleave > 2 || (interleave == 2 && nregs == 2)) { load_reg_var(s, addr, rn); @@ -3728,7 +3730,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) neon_load_reg64(tmp64, rd); gen_st64(tmp64, addr, IS_USER(s)); } - tcg_gen_addi_i32(addr, addr, stride); + tcg_gen_add_i32(addr, addr, stride_v); } else { for (pass = 0; pass < 2; pass++) { if (size == 2) { @@ -3739,58 +3741,57 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) tmp = neon_load_reg(rd, pass); gen_st32(tmp, addr, IS_USER(s)); } - tcg_gen_addi_i32(addr, addr, stride); + tcg_gen_add_i32(addr, addr, stride_v); } else if (size == 1) { if (load) { tmp = gen_ld16u(addr, IS_USER(s)); tcg_gen_addi_i32(addr, addr, stride); tmp2 = gen_ld16u(addr, IS_USER(s)); - tcg_gen_addi_i32(addr, addr, stride); - gen_bfi(tmp, tmp, tmp2, 16, 0xffff); + tcg_gen_add_i32(addr, addr, stride_v); + tcg_gen_shli_i32(tmp2, tmp2, 16); + tcg_gen_or_i32(tmp, tmp, tmp2); dead_tmp(tmp2); neon_store_reg(rd, pass, tmp); } else { tmp = neon_load_reg(rd, pass); - tmp2 = new_tmp(); - tcg_gen_shri_i32(tmp2, tmp, 16); - gen_st16(tmp, addr, IS_USER(s)); - tcg_gen_addi_i32(addr, addr, stride); - gen_st16(tmp2, addr, IS_USER(s)); - tcg_gen_addi_i32(addr, addr, stride); + tcg_gen_qemu_st16(tmp, addr, IS_USER(s)); + tcg_gen_add_i32(addr, addr, stride_v); + tcg_gen_shri_i32(tmp, tmp, 16); + tcg_gen_qemu_st16(tmp, addr, IS_USER(s)); + tcg_gen_add_i32(addr, addr, stride_v); + dead_tmp(tmp); } } else /* size == 0 */ { if (load) { - TCGV_UNUSED(tmp2); - for (n = 0; n < 4; n++) { - tmp = gen_ld8u(addr, IS_USER(s)); - tcg_gen_addi_i32(addr, addr, stride); - if (n == 0) { - tmp2 = tmp; - } else { - gen_bfi(tmp2, tmp2, tmp, n * 8, 0xff); - dead_tmp(tmp); - } + tmp = gen_ld8u(addr, IS_USER(s)); + tcg_gen_add_i32(addr, addr, stride_v); + for (n = 1; n < 4; n++) { + tmp2 = gen_ld8u(addr, IS_USER(s)); + tcg_gen_add_i32(addr, addr, stride_v); + tcg_gen_shli_i32(tmp2, tmp2, n * 8); + tcg_gen_or_i32(tmp, tmp, tmp2); + dead_tmp(tmp2); } - neon_store_reg(rd, pass, tmp2); + neon_store_reg(rd, pass, tmp); } else { - tmp2 = neon_load_reg(rd, pass); - for (n = 0; n < 4; n++) { - tmp = new_tmp(); - if (n == 0) { - tcg_gen_mov_i32(tmp, tmp2); - } else { - tcg_gen_shri_i32(tmp, tmp2, n * 8); - } - gen_st8(tmp, addr, IS_USER(s)); - tcg_gen_addi_i32(addr, addr, stride); + tmp2 = tcg_const_i32(8); + tmp = neon_load_reg(rd, pass); + for (n = 0; n < 3; n++) { + tcg_gen_qemu_st8(tmp, addr, IS_USER (s)); + tcg_gen_add_i32(addr, addr, stride_v); + tcg_gen_shr_i32(tmp, tmp, tmp2); } - dead_tmp(tmp2); + tcg_gen_qemu_st8(tmp, addr, IS_USER(s)); + tcg_gen_add_i32(addr, addr, stride_v); + dead_tmp(tmp); + tcg_temp_free_i32(tmp2); } } } } rd += neon_ls_element_type[op].spacing; } + tcg_temp_free_i32(stride_v); stride = nregs * 8; } else { size = (insn >> 10) & 3; diff --git a/target-arm/translate.c b/target-arm/translate.c index 1734fae..fa03df8 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -3692,6 +3692,7 @@ static int disas_neon_ls_insn(CPUState * env, DisasContext *s, uint32_t insn) TCGv tmp; TCGv tmp2; TCGv_i64 tmp64; + TCGv stride_v; if (!vfp_enabled(env))