From patchwork Sun Sep 22 17:28:59 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Filippov X-Patchwork-Id: 277004 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id BB0DE2C009F for ; Mon, 23 Sep 2013 03:29:49 +1000 (EST) Received: from localhost ([::1]:36102 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VNnTd-0006ZU-1D for incoming@patchwork.ozlabs.org; Sun, 22 Sep 2013 13:29:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34880) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VNnTE-0006YF-4o for qemu-devel@nongnu.org; Sun, 22 Sep 2013 13:29:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VNnT8-0003xU-8W for qemu-devel@nongnu.org; Sun, 22 Sep 2013 13:29:20 -0400 Received: from mail-la0-x22b.google.com ([2a00:1450:4010:c03::22b]:50158) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VNnT7-0003xK-RT for qemu-devel@nongnu.org; Sun, 22 Sep 2013 13:29:14 -0400 Received: by mail-la0-f43.google.com with SMTP id ep20so1790026lab.2 for ; Sun, 22 Sep 2013 10:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=qUJCZQs/M/bVLUkFRzHi1v7ppnVZZdZPsANdn/yRfxo=; b=RMadATeqS17K9PhSJIb1zlBy9gy0w4la5n+up8tB2Jmp4vGQ76CrAojticUTQz1jxT hi5zDKF2NrXVoH6fobNsfUHl/yjhpGALxKn/ueE2cpEp7i/KIy/S5M1VpLE4PUFLI5h6 uvUalquSPD8XVkc3Vc3t4JN9Z5SubBkijZFSH0+SLXZV61q4VO2yM11lOyLBMcpEdPKm vlxi+xNceWlz5k/CZlH7KXB+79zksuYk2EdBmuKkGSnHik4gzKIx/omhDP0mbSZkMnre 1/CmIUuD6OjLjXi5+njST28SLbPVp3JB+QzLzmE2GMwR7MVWkGBkBwtnlY7c8UK0zPrK Awfg== X-Received: by 10.112.154.70 with SMTP id vm6mr15844585lbb.1.1379870952457; Sun, 22 Sep 2013 10:29:12 -0700 (PDT) Received: from octofox.metropolis ([188.134.19.124]) by mx.google.com with ESMTPSA id n15sm10715758laa.2.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 22 Sep 2013 10:29:10 -0700 (PDT) Message-ID: <523F28DB.5080703@gmail.com> Date: Sun, 22 Sep 2013 21:28:59 +0400 From: Max Filippov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Richard Henderson References: <1379625908-27964-1-git-send-email-rth@twiddle.net> In-Reply-To: <1379625908-27964-1-git-send-email-rth@twiddle.net> X-Enigmail-Version: 1.5.2 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:4010:c03::22b Cc: Blue Swirl , qemu-devel , Aurelien Jarno Subject: Re: [Qemu-devel] [RFC 00/16] TCG indirect registers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Fri, Sep 20, 2013 at 1:24 AM, Richard Henderson wrote: > This is an attempt to improve performance of target-sparc > by exposing the windowed registers as TCG globals, and all > the optimization that we can do there. > > This is done via allowing tcg_global_mem_new to be used > with any base pointer, not just off of a fixed register. > Thus the sparc windowed registers are globals off cpu_regwptr. > > In the process of working through this, I attempt to remove > as many uses of "int" as I can throughout the TCG code gen > paths, replacing them with TCGReg when we're talking about > hard registers, and TCGTemp pointers when we're talking about > temporaries. This, IMO, reduces confusion as to what kind of > "int" we mean at any given time. > > By the time we get to patch 14, actually implementing the > indirect temps, it's fairly easy to recurse in order to > load the base pointer when we need to load or store an > indirect temp. > > I've not yet tried to measure the performance. As far as > testing, linux-user-0.3 and sparc-test-0.2 works. I've > scanned some of the dumps from those. In the cases where > no real optimization was possible, we generate practically > the same code -- usually with different registers selected. > In the cases where we can optimize, I've seen some TB's > cut in half. > > Anyway, I wanted some feedback before I take this any further. Hi Richard, I've reimplemented xtensa windowed registers in the same way as done for sparc on top of this series. Haven't got any measurable performance change. From op,out_asm output most TBs got longer by 1-4 instructions and all temp indices got doubled. --->8--- From 73300be7dd6b3d31cbfa45225714d5e43c52f077 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Sun, 22 Sep 2013 18:54:53 +0400 Subject: [PATCH] target-xtensa: reimplement windowed registers Signed-off-by: Max Filippov --- target-xtensa/cpu.c | 1 + target-xtensa/cpu.h | 5 +++-- target-xtensa/op_helper.c | 46 ++++++++++------------------------------------ target-xtensa/translate.c | 7 +++++-- 4 files changed, 19 insertions(+), 40 deletions(-) int i; cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env"); + cpu_regs = tcg_global_mem_new_ptr(cpu_env, + offsetof(CPUXtensaState, regs), "regs"); cpu_pc = tcg_global_mem_new_i32(cpu_env, offsetof(CPUXtensaState, pc), "pc"); for (i = 0; i < 16; i++) { - cpu_R[i] = tcg_global_mem_new_i32(cpu_env, - offsetof(CPUXtensaState, regs[i]), + cpu_R[i] = tcg_global_mem_new_i32(cpu_regs, + i * sizeof(uint32_t), regnames[i]); } -- 1.8.1.4 diff --git a/target-xtensa/cpu.c b/target-xtensa/cpu.c index c19d17a..a30511d 100644 --- a/target-xtensa/cpu.c +++ b/target-xtensa/cpu.c @@ -59,6 +59,7 @@ static void xtensa_cpu_reset(CPUState *s) env->sregs[CACHEATTR] = 0x22222222; env->sregs[ATOMCTL] = xtensa_option_enabled(env->config, XTENSA_OPTION_ATOMCTL) ? 0x28 : 0x15; + rotate_window_abs(env, env->sregs[WINDOW_BASE]); env->pending_irq_level = 0; reset_mmu(env); diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h index 95103e9..8100f18 100644 --- a/target-xtensa/cpu.h +++ b/target-xtensa/cpu.h @@ -334,11 +334,11 @@ typedef struct XtensaConfigList { typedef struct CPUXtensaState { const XtensaConfig *config; - uint32_t regs[16]; + uint32_t *regs; uint32_t pc; uint32_t sregs[256]; uint32_t uregs[256]; - uint32_t phys_regs[MAX_NAREG]; + uint32_t phys_regs[MAX_NAREG + 12]; float32 fregs[16]; float_status fp_status; @@ -396,6 +396,7 @@ void xtensa_timer_irq(CPUXtensaState *env, uint32_t id, uint32_t active); void xtensa_rearm_ccompare_timer(CPUXtensaState *env); int cpu_xtensa_signal_handler(int host_signum, void *pinfo, void *puc); void xtensa_cpu_list(FILE *f, fprintf_function cpu_fprintf); +void rotate_window_abs(CPUXtensaState *env, uint32_t position); void xtensa_sync_window_from_phys(CPUXtensaState *env); void xtensa_sync_phys_from_window(CPUXtensaState *env); uint32_t xtensa_tlb_get_addr_mask(const CPUXtensaState *env, bool dtlb, uint32_t way); diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c index cf97025..ee21550 100644 --- a/target-xtensa/op_helper.c +++ b/target-xtensa/op_helper.c @@ -166,39 +166,6 @@ uint32_t HELPER(nsau)(uint32_t v) return v ? clz32(v) : 32; } -static void copy_window_from_phys(CPUXtensaState *env, - uint32_t window, uint32_t phys, uint32_t n) -{ - assert(phys < env->config->nareg); - if (phys + n <= env->config->nareg) { - memcpy(env->regs + window, env->phys_regs + phys, - n * sizeof(uint32_t)); - } else { - uint32_t n1 = env->config->nareg - phys; - memcpy(env->regs + window, env->phys_regs + phys, - n1 * sizeof(uint32_t)); - memcpy(env->regs + window + n1, env->phys_regs, - (n - n1) * sizeof(uint32_t)); - } -} - -static void copy_phys_from_window(CPUXtensaState *env, - uint32_t phys, uint32_t window, uint32_t n) -{ - assert(phys < env->config->nareg); - if (phys + n <= env->config->nareg) { - memcpy(env->phys_regs + phys, env->regs + window, - n * sizeof(uint32_t)); - } else { - uint32_t n1 = env->config->nareg - phys; - memcpy(env->phys_regs + phys, env->regs + window, - n1 * sizeof(uint32_t)); - memcpy(env->phys_regs, env->regs + window + n1, - (n - n1) * sizeof(uint32_t)); - } -} - - static inline unsigned windowbase_bound(unsigned a, const CPUXtensaState *env) { return a & (env->config->nareg / 4 - 1); @@ -211,18 +178,25 @@ static inline unsigned windowstart_bit(unsigned a, const CPUXtensaState *env) void xtensa_sync_window_from_phys(CPUXtensaState *env) { - copy_window_from_phys(env, 0, env->sregs[WINDOW_BASE] * 4, 16); + if (env->sregs[WINDOW_BASE] * 4 + 16 > env->config->nareg) + memcpy(env->phys_regs + env->config->nareg, env->phys_regs, + (env->sregs[WINDOW_BASE] * 4 + 16 - env->config->nareg) * + sizeof(uint32_t)); } void xtensa_sync_phys_from_window(CPUXtensaState *env) { - copy_phys_from_window(env, env->sregs[WINDOW_BASE] * 4, 0, 16); + if (env->sregs[WINDOW_BASE] * 4 + 16 > env->config->nareg) + memcpy(env->phys_regs, env->phys_regs + env->config->nareg, + (env->sregs[WINDOW_BASE] * 4 + 16 - env->config->nareg) * + sizeof(uint32_t)); } -static void rotate_window_abs(CPUXtensaState *env, uint32_t position) +void rotate_window_abs(CPUXtensaState *env, uint32_t position) { xtensa_sync_phys_from_window(env); env->sregs[WINDOW_BASE] = windowbase_bound(position, env); + env->regs = env->phys_regs + env->sregs[WINDOW_BASE] * 4; xtensa_sync_window_from_phys(env); } diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c index bb7dfd0..61be622 100644 --- a/target-xtensa/translate.c +++ b/target-xtensa/translate.c @@ -70,6 +70,7 @@ typedef struct DisasContext { } DisasContext; static TCGv_ptr cpu_env; +static TCGv_ptr cpu_regs; static TCGv_i32 cpu_pc; static TCGv_i32 cpu_R[16]; static TCGv_i32 cpu_FR[16]; @@ -208,12 +209,14 @@ void xtensa_translate_init(void)