From patchwork Fri Dec 16 20:48:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Simpson X-Patchwork-Id: 1716681 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=COUL0RCj; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NYhG16TZZz240h for ; Sat, 17 Dec 2022 07:55:53 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p6Hdy-0002k2-0s; Fri, 16 Dec 2022 15:49:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p6Hdx-0002jo-2P for qemu-devel@nongnu.org; Fri, 16 Dec 2022 15:49:21 -0500 Received: from mx0a-0031df01.pphosted.com ([205.220.168.131]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p6Hdu-0004PC-La for qemu-devel@nongnu.org; Fri, 16 Dec 2022 15:49:20 -0500 Received: from pps.filterd (m0279865.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BGKaUCo024937; Fri, 16 Dec 2022 20:49:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=qcppdkim1; bh=N+lOjYqD3o/v7v1GRwB5KR2los9HFrQfm5Ekr3zArpk=; b=COUL0RCjUsMIic7akSzr4T2ST/emCMxJCrCgAugiXziXX3ovMpEzAguIu2rADLxtCVk/ YdOpaTWn8QTJU1iTq54SA/43SJmHYxURo6/GWTimO0gg+cDa+hduHNBfF+sEUhfySF3x OvGee0hnYGkZBYFoy64VTr+MB/cnwSF7MQRVTBxd4AoCVoWhyw214EzX6M/MmsStpFga IBkfGbFokfM+sT0ktFmVzJQppEC0uQfTwHvBuiFyJ/PTVgN95bwU7qyjgET50I8+Nrsw EXPJ93bCGNAURxIkRZNYA4+UELDjPd1VfolLdDNDSPPiKJYoPxjhz/otfzL/B4GZ1G29 dQ== Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3mg2vwd6cn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Dec 2022 20:49:14 +0000 Received: from pps.filterd (NALASPPMTA04.qualcomm.com [127.0.0.1]) by NALASPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 2BGKgNrO016068; Fri, 16 Dec 2022 20:49:14 GMT Received: from pps.reinject (localhost [127.0.0.1]) by NALASPPMTA04.qualcomm.com (PPS) with ESMTP id 3mgw840n74-1; Fri, 16 Dec 2022 20:49:14 +0000 Received: from NALASPPMTA04.qualcomm.com (NALASPPMTA04.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2BGKiGTo017732; Fri, 16 Dec 2022 20:49:13 GMT Received: from hu-devc-lv-u18-c.qualcomm.com (hu-tsimpson-lv.qualcomm.com [10.47.235.220]) by NALASPPMTA04.qualcomm.com (PPS) with ESMTP id 2BGKnDhS024103; Fri, 16 Dec 2022 20:49:13 +0000 Received: by hu-devc-lv-u18-c.qualcomm.com (Postfix, from userid 47164) id F39CA500118; Fri, 16 Dec 2022 12:49:12 -0800 (PST) From: Taylor Simpson To: qemu-devel@nongnu.org Cc: tsimpson@quicinc.com, richard.henderson@linaro.org, philmd@linaro.org, peter.maydell@linaro.org, bcain@quicinc.com, quic_mathbern@quicinc.com, stefanha@redhat.com Subject: [PULL 11/21] Hexagon (target/hexagon) Use direct block chaining for tight loops Date: Fri, 16 Dec 2022 12:48:35 -0800 Message-Id: <20221216204845.19290-12-tsimpson@quicinc.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221216204845.19290-1-tsimpson@quicinc.com> References: <20221216204845.19290-1-tsimpson@quicinc.com> MIME-Version: 1.0 X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: uXZSkEGj1WzEhXkGzl_AD5Zu0E1O0uk0 X-Proofpoint-ORIG-GUID: uXZSkEGj1WzEhXkGzl_AD5Zu0E1O0uk0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-16_14,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 clxscore=1015 phishscore=0 malwarescore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 impostorscore=0 mlxlogscore=608 adultscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212160185 Received-SPF: pass client-ip=205.220.168.131; envelope-from=tsimpson@qualcomm.com; helo=mx0a-0031df01.pphosted.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Direct block chaining is documented here https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining Hexagon inner loops end with the endloop0 instruction To go back to the beginning of the loop, this instructions writes to PC from register SA0 (start address 0). To use direct block chaining, we have to assign PC with a constant value. So, we specialize the code generation when the start of the translation block is equal to SA0. When this is the case, we defer the compare/branch from endloop0 to gen_end_tb. When this is done, we can assign the start address of the TB to PC. Reviewed-by: Richard Henderson Signed-off-by: Taylor Simpson Message-Id: <20221108162906.3166-12-tsimpson@quicinc.com> --- target/hexagon/cpu.h | 13 +++--- target/hexagon/gen_tcg.h | 3 ++ target/hexagon/translate.h | 1 + target/hexagon/genptr.c | 84 ++++++++++++++++++++++++++++++++++++++ target/hexagon/translate.c | 33 +++++++++++++++ 5 files changed, 129 insertions(+), 5 deletions(-) diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index ff8c26272d..1d89e11a1a 100644 --- a/target/hexagon/cpu.h +++ b/target/hexagon/cpu.h @@ -25,6 +25,7 @@ #include "mmvec/mmvec.h" #include "qom/object.h" #include "hw/core/cpu.h" +#include "hw/registerfields.h" #define NUM_PREGS 4 #define TOTAL_PER_THREAD_REGS 64 @@ -152,16 +153,18 @@ struct ArchCPU { #include "cpu_bits.h" +FIELD(TB_FLAGS, IS_TIGHT_LOOP, 0, 1) + static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, target_ulong *pc, target_ulong *cs_base, uint32_t *flags) { + uint32_t hex_flags = 0; *pc = env->gpr[HEX_REG_PC]; *cs_base = 0; -#ifdef CONFIG_USER_ONLY - *flags = 0; -#else -#error System mode not supported on Hexagon yet -#endif + if (*pc == env->gpr[HEX_REG_SA0]) { + hex_flags = FIELD_DP32(hex_flags, TB_FLAGS, IS_TIGHT_LOOP, 1); + } + *flags = hex_flags; } static inline int cpu_mmu_index(CPUHexagonState *env, bool ifetch) diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h index 3583f390e9..19697b42a5 100644 --- a/target/hexagon/gen_tcg.h +++ b/target/hexagon/gen_tcg.h @@ -620,6 +620,9 @@ #define fGEN_TCG_J2_callf(SHORTCODE) \ gen_cond_call(ctx, PuV, TCG_COND_NE, riV) +#define fGEN_TCG_J2_endloop0(SHORTCODE) \ + gen_endloop0(ctx) + /* * Compound compare and jump instructions * Here is a primer to understand the tag names diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h index aacf0b0921..d971f4f095 100644 --- a/target/hexagon/translate.h +++ b/target/hexagon/translate.h @@ -59,6 +59,7 @@ typedef struct DisasContext { bool pre_commit; TCGCond branch_cond; target_ulong branch_dest; + bool is_tight_loop; } DisasContext; static inline void ctx_log_reg_write(DisasContext *ctx, int rnum) diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c index ee0f86fab2..a4a79c8454 100644 --- a/target/hexagon/genptr.c +++ b/target/hexagon/genptr.c @@ -497,6 +497,33 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, int pc_off, } } +static void gen_set_usr_field(int field, TCGv val) +{ + tcg_gen_deposit_tl(hex_new_value[HEX_REG_USR], hex_new_value[HEX_REG_USR], + val, + reg_field_info[field].offset, + reg_field_info[field].width); +} + +static void gen_set_usr_fieldi(int field, int x) +{ + if (reg_field_info[field].width == 1) { + target_ulong bit = 1 << reg_field_info[field].offset; + if ((x & 1) == 1) { + tcg_gen_ori_tl(hex_new_value[HEX_REG_USR], + hex_new_value[HEX_REG_USR], + bit); + } else { + tcg_gen_andi_tl(hex_new_value[HEX_REG_USR], + hex_new_value[HEX_REG_USR], + ~bit); + } + } else { + TCGv val = tcg_constant_tl(x); + gen_set_usr_field(field, val); + } +} + static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2) { TCGv one = tcg_constant_tl(0xff); @@ -636,6 +663,63 @@ static void gen_cond_call(DisasContext *ctx, TCGv pred, gen_set_label(skip); } +static void gen_endloop0(DisasContext *ctx) +{ + TCGv lpcfg = tcg_temp_local_new(); + + GET_USR_FIELD(USR_LPCFG, lpcfg); + + /* + * if (lpcfg == 1) { + * hex_new_pred_value[3] = 0xff; + * hex_pred_written |= 1 << 3; + * } + */ + TCGLabel *label1 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1); + { + tcg_gen_movi_tl(hex_new_pred_value[3], 0xff); + tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3); + } + gen_set_label(label1); + + /* + * if (lpcfg) { + * SET_USR_FIELD(USR_LPCFG, lpcfg - 1); + * } + */ + TCGLabel *label2 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_EQ, lpcfg, 0, label2); + { + tcg_gen_subi_tl(lpcfg, lpcfg, 1); + SET_USR_FIELD(USR_LPCFG, lpcfg); + } + gen_set_label(label2); + + /* + * If we're in a tight loop, we'll do this at the end of the TB to take + * advantage of direct block chaining. + */ + if (!ctx->is_tight_loop) { + /* + * if (hex_gpr[HEX_REG_LC0] > 1) { + * PC = hex_gpr[HEX_REG_SA0]; + * hex_new_value[HEX_REG_LC0] = hex_gpr[HEX_REG_LC0] - 1; + * } + */ + TCGLabel *label3 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, label3); + { + gen_jumpr(ctx, hex_gpr[HEX_REG_SA0]); + tcg_gen_subi_tl(hex_new_value[HEX_REG_LC0], + hex_gpr[HEX_REG_LC0], 1); + } + gen_set_label(label3); + } + + tcg_temp_free(lpcfg); +} + static void gen_cmp_jumpnv(DisasContext *ctx, TCGCond cond, TCGv val, TCGv src, int pc_off) { diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c index f5ef54f039..75f28e08ad 100644 --- a/target/hexagon/translate.c +++ b/target/hexagon/translate.c @@ -135,6 +135,8 @@ static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest) static void gen_end_tb(DisasContext *ctx) { + Packet *pkt = ctx->pkt; + gen_exec_counters(ctx); if (ctx->branch_cond != TCG_COND_NEVER) { @@ -147,6 +149,18 @@ static void gen_end_tb(DisasContext *ctx) } else { gen_goto_tb(ctx, 0, ctx->branch_dest); } + } else if (ctx->is_tight_loop && + pkt->insn[pkt->num_insns - 1].opcode == J2_endloop0) { + /* + * When we're in a tight loop, we defer the endloop0 processing + * to take advantage of direct block chaining + */ + TCGLabel *skip = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, skip); + tcg_gen_subi_tl(hex_gpr[HEX_REG_LC0], hex_gpr[HEX_REG_LC0], 1); + gen_goto_tb(ctx, 0, ctx->base.tb->pc); + gen_set_label(skip); + gen_goto_tb(ctx, 1, ctx->next_PC); } else { tcg_gen_lookup_and_goto_ptr(); } @@ -337,6 +351,15 @@ static void mark_implicit_reg_write(DisasContext *ctx, int attrib, int rnum) */ bool is_predicated = GET_ATTRIB(opcode, A_CONDEXEC) || rnum == HEX_REG_USR; + + /* LC0/LC1 is conditionally written by endloop instructions */ + if ((rnum == HEX_REG_LC0 || rnum == HEX_REG_LC1) && + (opcode == J2_endloop0 || + opcode == J2_endloop1 || + opcode == J2_endloop01)) { + is_predicated = true; + } + if (is_predicated && !is_preloaded(ctx, rnum)) { tcg_gen_mov_tl(hex_new_value[rnum], hex_gpr[rnum]); } @@ -420,6 +443,14 @@ static void gen_reg_writes(DisasContext *ctx) int reg_num = ctx->reg_log[i]; tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]); + + /* + * ctx->is_tight_loop is set when SA0 points to the beginning of the TB. + * If we write to SA0, we have to turn off tight loop handling. + */ + if (reg_num == HEX_REG_SA0) { + ctx->is_tight_loop = false; + } } } @@ -833,12 +864,14 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) { DisasContext *ctx = container_of(dcbase, DisasContext, base); + uint32_t hex_flags = dcbase->tb->flags; ctx->mem_idx = MMU_USER_IDX; ctx->num_packets = 0; ctx->num_insns = 0; ctx->num_hvx_insns = 0; ctx->branch_cond = TCG_COND_NEVER; + ctx->is_tight_loop = FIELD_EX32(hex_flags, TB_FLAGS, IS_TIGHT_LOOP); } static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu)