From patchwork Thu Jun 23 18:03:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 639818 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rb8h83nYdz9sDC for ; Fri, 24 Jun 2016 04:12:40 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=xM3lMc3S; dkim-atps=neutral Received: from localhost ([::1]:38549 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bG97G-0002Nx-Cf for incoming@patchwork.ozlabs.org; Thu, 23 Jun 2016 14:12:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40365) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bG8yZ-0001fW-CF for qemu-devel@nongnu.org; Thu, 23 Jun 2016 14:03:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bG8yX-0000C5-Pb for qemu-devel@nongnu.org; Thu, 23 Jun 2016 14:03:39 -0400 Received: from mail-qk0-x243.google.com ([2607:f8b0:400d:c09::243]:36495) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bG8yX-0000C0-Jw for qemu-devel@nongnu.org; Thu, 23 Jun 2016 14:03:37 -0400 Received: by mail-qk0-x243.google.com with SMTP id l81so17138831qke.3 for ; Thu, 23 Jun 2016 11:03:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=lgzIiQGdrd7SnEDCMBfJrtV1HJzFS2nFd7XvdjD7QCM=; b=xM3lMc3SdKlPHJyQeOsVGX7H6BDIQRFCITDoZ0BlsqUrEHF2Sa2lD5UyxE0xtiKB3L MyNtMg2YBlTv7iWxjJJTbJIyZTLedkMMTDPIYIXx86Q/ZPOUc8MDBvwUvRkpeNNsrweM ps2dbbXYTBpCaYluVNr7++bItIvJ5P50+CwPS5GJ09DHLgkHCfCuzL66wEPag4+mBTvq h9ngZZg7ncKJIF9fRsOGvtiH/YMINnovtvBqzra5QbiyU9bWv4tuuEhDVs58PWBilZdV IOkzaim+yqM4HOId3l9jOvdzjzuxarHhYoUbI5wdfgYvXYHBYmlkXd8aA/rVasFbvuLo DI6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=lgzIiQGdrd7SnEDCMBfJrtV1HJzFS2nFd7XvdjD7QCM=; b=XIwiRAUdG3D3fkA5lVOZSjfTVvwsn5SZCzEwrW3OH1bg4pQNz63KNPwnubU6HJd8Hv o1hs90rHibk50rCxbZqp1sN1gbdZH5e6ksvATVdb1QkIyNNt9haNrcInuD6cailYQBR2 8E/aeSeI9P6nl/K0f9oRB/wjT0cHKAKST67LGWgpBH1ZnEdCEDHVVV9Tj9H9tAZnLRxi ypR9yG4C8lSvpTlB/Vc2oVywXyAYdvCGJjbnsg9pEW3PwYC0MiY5T9aSVP5GB6PhPdoQ U2qkz5a4PGPEknrX5u1JHODGuCn1hEcx/xTuwiUZkiKagwB8ybrS6ihAFIAnMiDqO/Pm 8wpg== X-Gm-Message-State: ALyK8tLVvLfO5eV2bvxSwPH+MEITlgS3WAarte3cDbcQTMpGpoTQ8lxTxLNmHbrF+Tz+ug== X-Received: by 10.237.50.6 with SMTP id y6mr46644841qtd.84.1466705016866; Thu, 23 Jun 2016 11:03:36 -0700 (PDT) Received: from bigtime.com (71-37-54-227.tukw.qwest.net. [71.37.54.227]) by smtp.gmail.com with ESMTPSA id 13sm519937qki.3.2016.06.23.11.03.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Jun 2016 11:03:36 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 23 Jun 2016 11:03:02 -0700 Message-Id: <1466704982-5919-5-git-send-email-rth@twiddle.net> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1466704982-5919-1-git-send-email-rth@twiddle.net> References: <1466704982-5919-1-git-send-email-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400d:c09::243 Subject: [Qemu-devel] [PATCH 4/4] tcg: Compress dead_temps and mem_temps into a single array X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aurelien@aurel32.net Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" We only need two bits per temporary. Fold the two bytes into one, and reduce the memory and cachelines required during compilation. Signed-off-by: Richard Henderson --- tcg/tcg.c | 119 +++++++++++++++++++++++++++++++------------------------------- 1 file changed, 60 insertions(+), 59 deletions(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index b0c9dca..6397a37 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -333,7 +333,7 @@ void tcg_context_init(TCGContext *s) memset(s, 0, sizeof(*s)); s->nb_globals = 0; - + /* Count total number of arguments and allocate the corresponding space */ total_args = 0; @@ -825,16 +825,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret, real_args++; } #endif - /* If stack grows up, then we will be placing successive - arguments at lower addresses, which means we need to - reverse the order compared to how we would normally - treat either big or little-endian. For those arguments - that will wind up in registers, this still works for - HPPA (the only current STACK_GROWSUP target) since the - argument registers are *also* allocated in decreasing - order. If another such target is added, this logic may - have to get more complicated to differentiate between - stack arguments and register arguments. */ + /* If stack grows up, then we will be placing successive + arguments at lower addresses, which means we need to + reverse the order compared to how we would normally + treat either big or little-endian. For those arguments + that will wind up in registers, this still works for + HPPA (the only current STACK_GROWSUP target) since the + argument registers are *also* allocated in decreasing + order. If another such target is added, this logic may + have to get more complicated to differentiate between + stack arguments and register arguments. */ #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP) s->gen_opparam_buf[pi++] = args[i] + 1; s->gen_opparam_buf[pi++] = args[i]; @@ -1299,27 +1299,29 @@ void tcg_op_remove(TCGContext *s, TCGOp *op) } #ifdef USE_LIVENESS_ANALYSIS + +#define TS_DEAD 1 +#define TS_SYNC 2 + /* liveness analysis: end of function: all temps are dead, and globals should be in memory. */ -static inline void tcg_la_func_end(TCGContext *s, uint8_t *dead_temps, - uint8_t *mem_temps) +static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state) { - memset(dead_temps, 1, s->nb_temps); - memset(mem_temps, 1, s->nb_globals); - memset(mem_temps + s->nb_globals, 0, s->nb_temps - s->nb_globals); + memset(temp_state, TS_DEAD | TS_SYNC, s->nb_globals); + memset(temp_state + s->nb_globals, TS_DEAD, s->nb_temps - s->nb_globals); } /* liveness analysis: end of basic block: all temps are dead, globals and local temps should be in memory. */ -static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps, - uint8_t *mem_temps) +static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state) { - int i; + int i, n; - memset(dead_temps, 1, s->nb_temps); - memset(mem_temps, 1, s->nb_globals); - for(i = s->nb_globals; i < s->nb_temps; i++) { - mem_temps[i] = s->temps[i].temp_local; + tcg_la_func_end(s, temp_state); + for (i = s->nb_globals, n = s->nb_temps; i < n; i++) { + if (s->temps[i].temp_local) { + temp_state[i] |= TS_SYNC; + } } } @@ -1328,12 +1330,12 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps, temporaries are removed. */ static void tcg_liveness_analysis(TCGContext *s) { - uint8_t *dead_temps, *mem_temps; + uint8_t *temp_state; int oi, oi_prev; + int nb_globals = s->nb_globals; - dead_temps = tcg_malloc(s->nb_temps); - mem_temps = tcg_malloc(s->nb_temps); - tcg_la_func_end(s, dead_temps, mem_temps); + temp_state = tcg_malloc(s->nb_temps); + tcg_la_func_end(s, temp_state); for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) { int i, nb_iargs, nb_oargs; @@ -1362,7 +1364,7 @@ static void tcg_liveness_analysis(TCGContext *s) if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) { for (i = 0; i < nb_oargs; i++) { arg = args[i]; - if (!dead_temps[arg] || mem_temps[arg]) { + if (temp_state[arg] != TS_DEAD) { goto do_not_remove_call; } } @@ -1373,39 +1375,41 @@ static void tcg_liveness_analysis(TCGContext *s) /* output args are dead */ for (i = 0; i < nb_oargs; i++) { arg = args[i]; - if (dead_temps[arg]) { + if (temp_state[arg] & TS_DEAD) { arg_life |= DEAD_ARG << i; } - if (mem_temps[arg]) { + if (temp_state[arg] & TS_SYNC) { arg_life |= SYNC_ARG << i; } - dead_temps[arg] = 1; - mem_temps[arg] = 0; + temp_state[arg] = TS_DEAD; } - if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) { - /* globals should be synced to memory */ - memset(mem_temps, 1, s->nb_globals); - } if (!(call_flags & (TCG_CALL_NO_WRITE_GLOBALS | TCG_CALL_NO_READ_GLOBALS))) { /* globals should go back to memory */ - memset(dead_temps, 1, s->nb_globals); + memset(temp_state, TS_DEAD | TS_SYNC, nb_globals); + } else if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) { + /* globals should be synced to memory */ + for (i = 0; i < nb_globals; i++) { + temp_state[i] |= TS_SYNC; + } } /* record arguments that die in this helper */ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { arg = args[i]; if (arg != TCG_CALL_DUMMY_ARG) { - if (dead_temps[arg]) { + if (temp_state[arg] & TS_DEAD) { arg_life |= DEAD_ARG << i; } } } /* input arguments are live for preceding opcodes */ - for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { + for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) { arg = args[i]; - dead_temps[arg] = 0; + if (arg != TCG_CALL_DUMMY_ARG) { + temp_state[arg] &= ~TS_DEAD; + } } } } @@ -1414,8 +1418,7 @@ static void tcg_liveness_analysis(TCGContext *s) break; case INDEX_op_discard: /* mark the temporary as dead */ - dead_temps[args[0]] = 1; - mem_temps[args[0]] = 0; + temp_state[args[0]] = TS_DEAD; break; case INDEX_op_add2_i32: @@ -1436,8 +1439,8 @@ static void tcg_liveness_analysis(TCGContext *s) the low part. The result can be optimized to a simple add or sub. This happens often for x86_64 guest when the cpu mode is set to 32 bit. */ - if (dead_temps[args[1]] && !mem_temps[args[1]]) { - if (dead_temps[args[0]] && !mem_temps[args[0]]) { + if (temp_state[args[1]] != TS_DEAD) { + if (temp_state[args[0]] != TS_DEAD) { goto do_remove; } /* Replace the opcode and adjust the args in place, @@ -1474,8 +1477,8 @@ static void tcg_liveness_analysis(TCGContext *s) do_mul2: nb_iargs = 2; nb_oargs = 2; - if (dead_temps[args[1]] && !mem_temps[args[1]]) { - if (dead_temps[args[0]] && !mem_temps[args[0]]) { + if (temp_state[args[1]] != TS_DEAD) { + if (temp_state[args[0]] != TS_DEAD) { /* Both parts of the operation are dead. */ goto do_remove; } @@ -1483,8 +1486,7 @@ static void tcg_liveness_analysis(TCGContext *s) op->opc = opc = opc_new; args[1] = args[2]; args[2] = args[3]; - } else if (have_opc_new2 && dead_temps[args[0]] - && !mem_temps[args[0]]) { + } else if (temp_state[args[0]] != TS_DEAD && have_opc_new2) { /* The low part of the operation is dead; generate the high. */ op->opc = opc = opc_new2; args[0] = args[1]; @@ -1507,8 +1509,7 @@ static void tcg_liveness_analysis(TCGContext *s) implies side effects */ if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) { for (i = 0; i < nb_oargs; i++) { - arg = args[i]; - if (!dead_temps[arg] || mem_temps[arg]) { + if (temp_state[args[i]] != TS_DEAD) { goto do_not_remove; } } @@ -1519,35 +1520,35 @@ static void tcg_liveness_analysis(TCGContext *s) /* output args are dead */ for (i = 0; i < nb_oargs; i++) { arg = args[i]; - if (dead_temps[arg]) { + if (temp_state[arg] & TS_DEAD) { arg_life |= DEAD_ARG << i; } - if (mem_temps[arg]) { + if (temp_state[arg] & TS_SYNC) { arg_life |= SYNC_ARG << i; } - dead_temps[arg] = 1; - mem_temps[arg] = 0; + temp_state[arg] = TS_DEAD; } /* if end of basic block, update */ if (def->flags & TCG_OPF_BB_END) { - tcg_la_bb_end(s, dead_temps, mem_temps); + tcg_la_bb_end(s, temp_state); } else if (def->flags & TCG_OPF_SIDE_EFFECTS) { /* globals should be synced to memory */ - memset(mem_temps, 1, s->nb_globals); + for (i = 0; i < nb_globals; i++) { + temp_state[i] |= TS_SYNC; + } } /* record arguments that die in this opcode */ for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { arg = args[i]; - if (dead_temps[arg]) { + if (temp_state[arg] & TS_DEAD) { arg_life |= DEAD_ARG << i; } } /* input arguments are live for preceding opcodes */ for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) { - arg = args[i]; - dead_temps[arg] = 0; + temp_state[args[i]] &= ~TS_DEAD; } } break;