From patchwork Fri Jun 24 03:48:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 639999 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rbPfY58HTz9sDC for ; Fri, 24 Jun 2016 13:57:09 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=ShKFdhbw; dkim-atps=neutral Received: from localhost ([::1]:40595 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bGIEt-000795-J1 for incoming@patchwork.ozlabs.org; Thu, 23 Jun 2016 23:57:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bGI6j-0007Zj-0A for qemu-devel@nongnu.org; Thu, 23 Jun 2016 23:48:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bGI6b-0001o1-N2 for qemu-devel@nongnu.org; Thu, 23 Jun 2016 23:48:40 -0400 Received: from mail-pa0-x242.google.com ([2607:f8b0:400e:c03::242]:36047) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bGI6b-0001no-CL for qemu-devel@nongnu.org; Thu, 23 Jun 2016 23:48:33 -0400 Received: by mail-pa0-x242.google.com with SMTP id av7so1806760pac.3 for ; Thu, 23 Jun 2016 20:48:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=2LJzHzLUegMJxqbwj9L3SaUnHTk3C3NiLJ9R3RIrbWs=; b=ShKFdhbwxAMJbUbEfLCaJQQl3/uyl51pA8pvGKlc2Yq/99dIrZRuUdQMnN1tjtehuB BuqFqGjc2CPNNGRT10aAzl/085dU00nlXQ/Q7jtQdY0wL/I8gGlusdM1lxEGYd+Sv15y u4py3CPRuuluqabhFceYbVqulCnXO0YjYYZJVv6cRkaTD+SOaBF/V5o+6olpNvmrK2VQ u94qTWDvn1aPF5XrXdhWsMtuqA/OiOTc66GPl6I1RTunDDgnFLNTk4YLus8gPtLL3D78 RDxEOsg+hE4xD8EN4nl3QgV8S57vG3Tf+f9CvcfIWw/Gm85rS1DSDM9D/SDC28qcK9vI LHeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=2LJzHzLUegMJxqbwj9L3SaUnHTk3C3NiLJ9R3RIrbWs=; b=Aq8ltHyRyZl6eg2QCbuUqwa/lQlwHbm3M/n9n6c1TGoK9oewJ0ib9iLWnQfviVylEW qNgaDrkLN/ZIJMkHcN3wJkIhXUEo1x5Ii4FLoBMtXav7vmJZu5+nXqnELIe+C3PBPaf5 F0b9KWaT/iNL4/rCkH61JBPvfobIT2QylCCD17dHdEGkliDDaQx/Gnj8YUFUOrGkM82D ypAlFfBLxEoJn3YgHoEa4WBlQnEswnyGsINc2vMCoKtfMEwEbkY6oSD8+6FtBvaS01LL Am/B6KWIM4foPXY6g5Rl5AqHhfmncCqXLwAQpeBBv1upgADn+HZksAXYbR0SbTnw4R4M zr2w== X-Gm-Message-State: ALyK8tKCjU9fRiNw0gCFnyku4Fzq170oKtlkiH9h9OtC0lp4lc1Ml5JlW78b73xSx+swxA== X-Received: by 10.66.43.78 with SMTP id u14mr3637715pal.150.1466740112420; Thu, 23 Jun 2016 20:48:32 -0700 (PDT) Received: from pike.twiddle.net (71-37-54-227.tukw.qwest.net. [71.37.54.227]) by smtp.gmail.com with ESMTPSA id p129sm3309134pfb.73.2016.06.23.20.48.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Jun 2016 20:48:32 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 23 Jun 2016 20:48:22 -0700 Message-Id: <1466740107-15042-5-git-send-email-rth@twiddle.net> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1466740107-15042-1-git-send-email-rth@twiddle.net> References: <1466740107-15042-1-git-send-email-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400e:c03::242 Subject: [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.cave-ayland@ilande.co.uk, atar4qemu@gmail.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" This reduces both memory usage and per-insn cacheline usage during code generation. Signed-off-by: Richard Henderson Reviewed-by: Aurelien Jarno --- tcg/tcg.c | 58 ++++++++++++++++++++++------------------------------------ tcg/tcg.h | 16 ++++++++++------ 2 files changed, 32 insertions(+), 42 deletions(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index 64060c6..400e69c 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1329,7 +1329,7 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps, } } -/* Liveness analysis : update the opc_dead_args array to tell if a +/* Liveness analysis : update the opc_arg_life array to tell if a given input arguments is dead. Instructions updating dead temporaries are removed. */ static void tcg_liveness_analysis(TCGContext *s) @@ -1338,9 +1338,8 @@ static void tcg_liveness_analysis(TCGContext *s) int oi, oi_prev, nb_ops; nb_ops = s->gen_next_op_idx; - s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t)); - s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t)); - + s->op_arg_life = tcg_malloc(nb_ops * sizeof(TCGLifeData)); + dead_temps = tcg_malloc(s->nb_temps); mem_temps = tcg_malloc(s->nb_temps); tcg_la_func_end(s, dead_temps, mem_temps); @@ -1349,8 +1348,7 @@ static void tcg_liveness_analysis(TCGContext *s) int i, nb_iargs, nb_oargs; TCGOpcode opc_new, opc_new2; bool have_opc_new2; - uint16_t dead_args; - uint8_t sync_args; + TCGLifeData arg_life = 0; TCGArg arg; TCGOp * const op = &s->gen_op_buf[oi]; @@ -1382,15 +1380,13 @@ static void tcg_liveness_analysis(TCGContext *s) do_not_remove_call: /* output args are dead */ - dead_args = 0; - sync_args = 0; for (i = 0; i < nb_oargs; i++) { arg = args[i]; if (dead_temps[arg]) { - dead_args |= (1 << i); + arg_life |= DEAD_ARG << i; } if (mem_temps[arg]) { - sync_args |= (1 << i); + arg_life |= SYNC_ARG << i; } dead_temps[arg] = 1; mem_temps[arg] = 0; @@ -1411,7 +1407,7 @@ static void tcg_liveness_analysis(TCGContext *s) arg = args[i]; if (arg != TCG_CALL_DUMMY_ARG) { if (dead_temps[arg]) { - dead_args |= (1 << i); + arg_life |= DEAD_ARG << i; } } } @@ -1420,8 +1416,6 @@ static void tcg_liveness_analysis(TCGContext *s) arg = args[i]; dead_temps[arg] = 0; } - s->op_dead_args[oi] = dead_args; - s->op_sync_args[oi] = sync_args; } } break; @@ -1532,15 +1526,13 @@ static void tcg_liveness_analysis(TCGContext *s) } else { do_not_remove: /* output args are dead */ - dead_args = 0; - sync_args = 0; for (i = 0; i < nb_oargs; i++) { arg = args[i]; if (dead_temps[arg]) { - dead_args |= (1 << i); + arg_life |= DEAD_ARG << i; } if (mem_temps[arg]) { - sync_args |= (1 << i); + arg_life |= SYNC_ARG << i; } dead_temps[arg] = 1; mem_temps[arg] = 0; @@ -1558,7 +1550,7 @@ static void tcg_liveness_analysis(TCGContext *s) for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { arg = args[i]; if (dead_temps[arg]) { - dead_args |= (1 << i); + arg_life |= DEAD_ARG << i; } } /* input arguments are live for preceding opcodes */ @@ -1566,11 +1558,10 @@ static void tcg_liveness_analysis(TCGContext *s) arg = args[i]; dead_temps[arg] = 0; } - s->op_dead_args[oi] = dead_args; - s->op_sync_args[oi] = sync_args; } break; } + s->op_arg_life[oi] = arg_life; } } @@ -1891,11 +1882,11 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs) save_globals(s, allocated_regs); } -#define IS_DEAD_ARG(n) ((dead_args >> (n)) & 1) -#define NEED_SYNC_ARG(n) ((sync_args >> (n)) & 1) +#define IS_DEAD_ARG(n) (arg_life & (DEAD_ARG << (n))) +#define NEED_SYNC_ARG(n) (arg_life & (SYNC_ARG << (n))) static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args, - uint16_t dead_args, uint8_t sync_args) + TCGLifeData arg_life) { TCGTemp *ots; tcg_target_ulong val; @@ -1924,8 +1915,7 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args, } static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def, - const TCGArg *args, uint16_t dead_args, - uint8_t sync_args) + const TCGArg *args, TCGLifeData arg_life) { TCGRegSet allocated_regs; TCGTemp *ts, *ots; @@ -2010,8 +2000,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def, static void tcg_reg_alloc_op(TCGContext *s, const TCGOpDef *def, TCGOpcode opc, - const TCGArg *args, uint16_t dead_args, - uint8_t sync_args) + const TCGArg *args, TCGLifeData arg_life) { TCGRegSet allocated_regs; int i, k, nb_iargs, nb_oargs; @@ -2176,8 +2165,7 @@ static void tcg_reg_alloc_op(TCGContext *s, #endif static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs, - const TCGArg * const args, uint16_t dead_args, - uint8_t sync_args) + const TCGArg * const args, TCGLifeData arg_life) { int flags, nb_regs, i; TCGReg reg; @@ -2397,8 +2385,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) TCGArg * const args = &s->gen_opparam_buf[op->args]; TCGOpcode opc = op->opc; const TCGOpDef *def = &tcg_op_defs[opc]; - uint16_t dead_args = s->op_dead_args[oi]; - uint8_t sync_args = s->op_sync_args[oi]; + TCGLifeData arg_life = s->op_arg_life[oi]; oi_next = op->next; #ifdef CONFIG_PROFILER @@ -2408,11 +2395,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) switch (opc) { case INDEX_op_mov_i32: case INDEX_op_mov_i64: - tcg_reg_alloc_mov(s, def, args, dead_args, sync_args); + tcg_reg_alloc_mov(s, def, args, arg_life); break; case INDEX_op_movi_i32: case INDEX_op_movi_i64: - tcg_reg_alloc_movi(s, args, dead_args, sync_args); + tcg_reg_alloc_movi(s, args, arg_life); break; case INDEX_op_insn_start: if (num_insns >= 0) { @@ -2437,8 +2424,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) tcg_out_label(s, arg_label(args[0]), s->code_ptr); break; case INDEX_op_call: - tcg_reg_alloc_call(s, op->callo, op->calli, args, - dead_args, sync_args); + tcg_reg_alloc_call(s, op->callo, op->calli, args, arg_life); break; default: /* Sanity check that we've not introduced any unhandled opcodes. */ @@ -2448,7 +2434,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) /* Note: in order to speed up the code, it would be much faster to have specialized register allocator functions for some common argument patterns */ - tcg_reg_alloc_op(s, def, opc, args, dead_args, sync_args); + tcg_reg_alloc_op(s, def, opc, args, arg_life); break; } #ifdef CONFIG_DEBUG_TCG diff --git a/tcg/tcg.h b/tcg/tcg.h index 66d7fc0..cc14560 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -505,6 +505,14 @@ typedef struct TCGTempSet { unsigned long l[BITS_TO_LONGS(TCG_MAX_TEMPS)]; } TCGTempSet; +/* While we limit helpers to 6 arguments, for 32-bit hosts, with padding, + this imples a max of 6*2 (64-bit in) + 2 (64-bit out) = 14 operands. + There are never more than 2 outputs, which means that we can store all + dead + sync data within 16 bits. */ +#define DEAD_ARG 4 +#define SYNC_ARG 1 +typedef uint16_t TCGLifeData; + typedef struct TCGOp { TCGOpcode opc : 8; @@ -538,12 +546,8 @@ struct TCGContext { uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */ /* liveness analysis */ - uint16_t *op_dead_args; /* for each operation, each bit tells if the - corresponding argument is dead */ - uint8_t *op_sync_args; /* for each operation, each bit tells if the - corresponding output argument needs to be - sync to memory. */ - + TCGLifeData *op_arg_life; + TCGRegSet reserved_regs; intptr_t current_frame_offset; intptr_t frame_start;