From patchwork Wed Oct 19 22:37:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Simpson X-Patchwork-Id: 1692209 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=quicinc.com header.i=@quicinc.com header.a=rsa-sha256 header.s=qcppdkim1 header.b=GgBxS3Su; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mt5Kd5zv0z23kS for ; Thu, 20 Oct 2022 09:40:37 +1100 (AEDT) Received: from localhost ([::1]:48304 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1olHjn-0002QJ-HR for incoming@patchwork.ozlabs.org; Wed, 19 Oct 2022 18:40:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48522) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1olHhQ-0002Lc-07 for qemu-devel@nongnu.org; Wed, 19 Oct 2022 18:38:08 -0400 Received: from mx0a-0031df01.pphosted.com ([205.220.168.131]:45830) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1olHh7-00029l-VI for qemu-devel@nongnu.org; Wed, 19 Oct 2022 18:38:07 -0400 Received: from pps.filterd (m0279864.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29JL4gFs018336; Wed, 19 Oct 2022 22:37:44 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=qcppdkim1; bh=2tAe4aunpofBEInt+B67LqN3+dVmUagUZZA3HNspg+A=; b=GgBxS3SuiFU0LbxJ7/5nfA3pHkqwvWebsNpR09K8Xs4rE8/8+u2E5XUocgCT/lxPBqZo l3SSlFqE+SKSn6F2H4YfSkm05Xl2PijUUaKZkFcxKVPj1kdh4t0vbjaPDxfHODiyM5Qv n2r3YOoAE+wgT0a2i7Dk3A4OZTmA9Fp5WE1Z8dUhTZNB3MPHf1U5DjN32mCLmBeVe++b T7KRCAvr6qGelb4rqGEcsjSobij9ygC3um1icDe3HFjPZKaEohbbiWVlL72/4wOZIiT8 lCrn5SGmMA/MbcNZt/M0rwVRbcyvbqNlIDIslk9oHPoe5xcL8k4JNLjV8UMQCK94TOv7 vw== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3kaed8u11r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 19 Oct 2022 22:37:43 +0000 Received: from pps.filterd (NALASPPMTA05.qualcomm.com [127.0.0.1]) by NALASPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTP id 29JMbhJm004645; Wed, 19 Oct 2022 22:37:43 GMT Received: from pps.reinject (localhost [127.0.0.1]) by NALASPPMTA05.qualcomm.com (PPS) with ESMTPS id 3k7nxkwacj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 19 Oct 2022 22:37:43 +0000 Received: from NALASPPMTA05.qualcomm.com (NALASPPMTA05.qualcomm.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 29JMbgD9004629; Wed, 19 Oct 2022 22:37:42 GMT Received: from hu-devc-lv-u18-c.qualcomm.com (hu-tsimpson-lv.qualcomm.com [10.47.235.220]) by NALASPPMTA05.qualcomm.com (PPS) with ESMTP id 29JMbglg004618; Wed, 19 Oct 2022 22:37:42 +0000 Received: by hu-devc-lv-u18-c.qualcomm.com (Postfix, from userid 47164) id C663B500105; Wed, 19 Oct 2022 15:37:41 -0700 (PDT) From: Taylor Simpson To: qemu-devel@nongnu.org Cc: tsimpson@quicinc.com, richard.henderson@linaro.org, philmd@linaro.org, ale@rev.ng, anjo@rev.ng, bcain@quicinc.com, quic_mathbern@quicinc.com Subject: [PATCH 8/8] Hexagon (target/hexagon) Use direct block chaining for tight loops Date: Wed, 19 Oct 2022 15:37:39 -0700 Message-Id: <20221019223739.3868-9-tsimpson@quicinc.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221019223739.3868-1-tsimpson@quicinc.com> References: <20221019223739.3868-1-tsimpson@quicinc.com> MIME-Version: 1.0 X-QCInternal: smtphost X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: fkUGBzMgWf-5GC-a3CkqIiQgYVaJ2n4i X-Proofpoint-GUID: fkUGBzMgWf-5GC-a3CkqIiQgYVaJ2n4i X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-19_12,2022-10-19_04,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 malwarescore=0 mlxscore=0 phishscore=0 mlxlogscore=414 spamscore=0 impostorscore=0 bulkscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210190125 Received-SPF: pass client-ip=205.220.168.131; envelope-from=tsimpson@qualcomm.com; helo=mx0a-0031df01.pphosted.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Direct block chaining is documented here https://qemu.readthedocs.io/en/latest/devel/tcg.html#direct-block-chaining Hexagon inner loops end with the endloop0 instruction To go back to the beginning of the loop, this instructions writes to PC from register SA0 (start address 0). To use direct block chaining, we have to assign PC with a constant value. So, we specialize the code generation when the start of the translation block is equal to SA0. When this is the case, we defer the compare/branch from endloop0 to gen_end_tb. When this is done, we can assign the start address of the TB to PC. Signed-off-by: Taylor Simpson --- target/hexagon/cpu.h | 17 ++++++--- target/hexagon/gen_tcg.h | 3 ++ target/hexagon/translate.h | 1 + target/hexagon/genptr.c | 71 ++++++++++++++++++++++++++++++++++++++ target/hexagon/translate.c | 41 +++++++++++++++++++--- 5 files changed, 124 insertions(+), 9 deletions(-) diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h index ff8c26272d..5260e0f127 100644 --- a/target/hexagon/cpu.h +++ b/target/hexagon/cpu.h @@ -152,16 +152,23 @@ struct ArchCPU { #include "cpu_bits.h" +typedef union { + uint32_t i; + struct { + bool is_tight_loop:1; + }; +} HexStateFlags; + static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, target_ulong *pc, target_ulong *cs_base, uint32_t *flags) { + HexStateFlags hex_flags = { 0 }; *pc = env->gpr[HEX_REG_PC]; *cs_base = 0; -#ifdef CONFIG_USER_ONLY - *flags = 0; -#else -#error System mode not supported on Hexagon yet -#endif + if (*pc == env->gpr[HEX_REG_SA0]) { + hex_flags.is_tight_loop = true; + } + *flags = hex_flags.i; } static inline int cpu_mmu_index(CPUHexagonState *env, bool ifetch) diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h index dbafcae2de..d9c345801b 100644 --- a/target/hexagon/gen_tcg.h +++ b/target/hexagon/gen_tcg.h @@ -620,6 +620,9 @@ #define fGEN_TCG_J2_callf(SHORTCODE) \ gen_cond_call(ctx, pkt, PuV, false, riV) +#define fGEN_TCG_J2_endloop0(SHORTCODE) \ + gen_endloop0(ctx, pkt) + /* * Compound compare and jump instructions * Here is a primer to understand the tag names diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h index e60dbf0e7a..34abe86b5c 100644 --- a/target/hexagon/translate.h +++ b/target/hexagon/translate.h @@ -57,6 +57,7 @@ typedef struct DisasContext { bool has_single_direct_branch; TCGv branch_cond; target_ulong branch_dest; + bool is_tight_loop; } DisasContext; static inline void ctx_log_reg_write(DisasContext *ctx, int rnum) diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c index 07b4326e56..252ed52d1b 100644 --- a/target/hexagon/genptr.c +++ b/target/hexagon/genptr.c @@ -516,6 +516,20 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, Packet *pkt, } } +static void gen_set_usr_field(int field, TCGv val) +{ + tcg_gen_deposit_tl(hex_new_value[HEX_REG_USR], hex_new_value[HEX_REG_USR], + val, + reg_field_info[field].offset, + reg_field_info[field].width); +} + +static void gen_set_usr_fieldi(int field, int x) +{ + TCGv val = tcg_constant_tl(x); + gen_set_usr_field(field, val); +} + static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2) { TCGv one = tcg_constant_tl(0xff); @@ -645,6 +659,63 @@ static void gen_cond_call(DisasContext *ctx, Packet *pkt, gen_set_label(skip); } +static void gen_endloop0(DisasContext *ctx, Packet *pkt) +{ + TCGv lpcfg = tcg_temp_local_new(); + + GET_USR_FIELD(USR_LPCFG, lpcfg); + + /* + * if (lpcfg == 1) { + * hex_new_pred_value[3] = 0xff; + * hex_pred_written |= 1 << 3; + * } + */ + TCGLabel *label1 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1); + { + tcg_gen_movi_tl(hex_new_pred_value[3], 0xff); + tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3); + } + gen_set_label(label1); + + /* + * if (lpcfg) { + * SET_USR_FIELD(USR_LPCFG, lpcfg - 1); + * } + */ + TCGLabel *label2 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_EQ, lpcfg, 0, label2); + { + tcg_gen_subi_tl(lpcfg, lpcfg, 1); + SET_USR_FIELD(USR_LPCFG, lpcfg); + } + gen_set_label(label2); + + /* + * If we're in a tight loop, we'll do this at the end of the TB to take + * advantage of direct block chaining. + */ + if (!ctx->is_tight_loop) { + /* + * if (hex_gpr[HEX_REG_LC0] > 1) { + * PC = hex_gpr[HEX_REG_SA0]; + * hex_new_value[HEX_REG_LC0] = hex_gpr[HEX_REG_LC0] - 1; + * } + */ + TCGLabel *label3 = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, label3); + { + gen_jumpr(ctx, pkt, hex_gpr[HEX_REG_SA0]); + tcg_gen_subi_tl(hex_new_value[HEX_REG_LC0], + hex_gpr[HEX_REG_LC0], 1); + } + gen_set_label(label3); + } + + tcg_temp_free(lpcfg); +} + static void gen_cmp_jumpnv(DisasContext *ctx, Packet *pkt, TCGCond cond, TCGv val, TCGv src, int pc_off) { diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c index 29e2caaf0f..18eb27c651 100644 --- a/target/hexagon/translate.c +++ b/target/hexagon/translate.c @@ -133,7 +133,7 @@ static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest) } } -static void gen_end_tb(DisasContext *ctx) +static void gen_end_tb(DisasContext *ctx, Packet *pkt) { gen_exec_counters(ctx); @@ -149,6 +149,18 @@ static void gen_end_tb(DisasContext *ctx) } else { gen_goto_tb(ctx, 0, ctx->branch_dest); } + } else if (ctx->is_tight_loop && + pkt->insn[pkt->num_insns - 1].opcode == J2_endloop0) { + /* + * When we're in a tight loop, we defer the endloop0 processing + * to take advantage of direct block chaining + */ + TCGLabel *skip = gen_new_label(); + tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, skip); + tcg_gen_subi_tl(hex_gpr[HEX_REG_LC0], hex_gpr[HEX_REG_LC0], 1); + gen_goto_tb(ctx, 0, ctx->base.tb->pc); + gen_set_label(skip); + gen_goto_tb(ctx, 1, ctx->next_PC); } else { tcg_gen_lookup_and_goto_ptr(); } @@ -328,13 +340,23 @@ bool is_gather_store_insn(Insn *insn, Packet *pkt) static void mark_implicit_reg_write(DisasContext *ctx, Insn *insn, int attrib, int rnum) { - if (GET_ATTRIB(insn->opcode, attrib)) { + uint16_t opcode = insn->opcode; + if (GET_ATTRIB(opcode, attrib)) { /* * USR is used to set overflow and FP exceptions, * so treat it as conditional */ - bool is_predicated = GET_ATTRIB(insn->opcode, A_CONDEXEC) || + bool is_predicated = GET_ATTRIB(opcode, A_CONDEXEC) || rnum == HEX_REG_USR; + + /* LC0/LC1 is conditionally written by endloop instructions */ + if ((rnum == HEX_REG_LC0 || rnum == HEX_REG_LC1) && + (opcode == J2_endloop0 || + opcode == J2_endloop1 || + opcode == J2_endloop01)) { + is_predicated = true; + } + if (is_predicated && !is_preloaded(ctx, rnum)) { tcg_gen_mov_tl(hex_new_value[rnum], hex_gpr[rnum]); } @@ -420,6 +442,14 @@ static void gen_reg_writes(DisasContext *ctx) int reg_num = ctx->reg_log[i]; tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]); + + /* + * ctx->is_tight_loop is set when SA0 points to the beginning of the TB. + * If we write to SA0, we have to turn off tight loop handling. + */ + if (reg_num == HEX_REG_SA0) { + ctx->is_tight_loop = false; + } } } @@ -793,7 +823,7 @@ static void gen_commit_packet(CPUHexagonState *env, DisasContext *ctx, } if (pkt->pkt_has_cof) { - gen_end_tb(ctx); + gen_end_tb(ctx, pkt); } } @@ -838,8 +868,11 @@ static void hexagon_tr_init_disas_context(DisasContextBase *dcbase, static void hexagon_tr_tb_start(DisasContextBase *db, CPUState *cpu) { DisasContext *ctx = container_of(db, DisasContext, base); + HexStateFlags hex_flags = { db->tb->flags }; + ctx->has_single_direct_branch = false; ctx->branch_cond = NULL; + ctx->is_tight_loop = hex_flags.is_tight_loop; } static void hexagon_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)