From patchwork Tue Aug 14 18:18:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 957682 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41qhk25TjXz9s9N for ; Wed, 15 Aug 2018 05:00:10 +1000 (AEST) Received: from localhost ([::1]:45817 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fpeY4-0001T4-Ej for incoming@patchwork.ozlabs.org; Tue, 14 Aug 2018 15:00:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52948) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fpdw3-0005Oj-7N for qemu-devel@nongnu.org; Tue, 14 Aug 2018 14:22:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fpduO-0006ua-Q0 for qemu-devel@nongnu.org; Tue, 14 Aug 2018 14:20:51 -0400 Received: from orth.archaic.org.uk ([2001:8b0:1d0::2]:44436) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fpduO-0006uC-H3 for qemu-devel@nongnu.org; Tue, 14 Aug 2018 14:19:08 -0400 Received: from pm215 by orth.archaic.org.uk with local (Exim 4.89) (envelope-from ) id 1fpduN-0007Qi-FY for qemu-devel@nongnu.org; Tue, 14 Aug 2018 19:19:07 +0100 From: Peter Maydell To: qemu-devel@nongnu.org Date: Tue, 14 Aug 2018 19:18:14 +0100 Message-Id: <20180814181815.23348-45-peter.maydell@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180814181815.23348-1-peter.maydell@linaro.org> References: <20180814181815.23348-1-peter.maydell@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:8b0:1d0::2 Subject: [Qemu-devel] [PULL 44/45] target/arm: Reorganize SVE WHILE X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Richard Henderson The pseudocode for this operation is an increment + compare loop, so comparing <= the maximum integer produces an all-true predicate. Rather than bound in both the inline code and the helper, pass the helper the number of predicate bits to set instead of the number of predicate elements to set. Reported-by: Laurent Desnogues Signed-off-by: Richard Henderson Reviewed-by: Laurent Desnogues Tested-by: Alex Bennée Tested-by: Laurent Desnogues Message-id: 20180801123111.3595-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell --- target/arm/sve_helper.c | 5 ---- target/arm/translate-sve.c | 49 +++++++++++++++++++++++++------------- 2 files changed, 32 insertions(+), 22 deletions(-) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 9bd0694d55e..87594a8adb4 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2846,11 +2846,6 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return flags; } - /* Scale from predicate element count to bits. */ - count <<= esz; - /* Bound to the bits in the predicate. */ - count = MIN(count, oprsz * 8); - /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { d->p[i] = esz_mask; diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 9dd4c38bab7..89efc80ee70 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3173,19 +3173,19 @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a, uint32_t insn) static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn) { - if (!sve_access_check(s)) { - return true; - } - - TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1); - TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1); - TCGv_i64 t0 = tcg_temp_new_i64(); - TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2, t3; TCGv_ptr ptr; unsigned desc, vsz = vec_full_reg_size(s); TCGCond cond; + if (!sve_access_check(s)) { + return true; + } + + op0 = read_cpu_reg(s, a->rn, 1); + op1 = read_cpu_reg(s, a->rm, 1); + if (!a->sf) { if (a->u) { tcg_gen_ext32u_i64(op0, op0); @@ -3198,32 +3198,47 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn) /* For the helper, compress the different conditions into a computation * of how many iterations for which the condition is true. - * - * This is slightly complicated by 0 <= UINT64_MAX, which is nominally - * 2**64 iterations, overflowing to 0. Of course, predicate registers - * aren't that large, so any value >= predicate size is sufficient. */ + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); tcg_gen_sub_i64(t0, op1, op0); - /* t0 = MIN(op1 - op0, vsz). */ - tcg_gen_movi_i64(t1, vsz); - tcg_gen_umin_i64(t0, t0, t1); + tmax = tcg_const_i64(vsz >> a->esz); if (a->eq) { /* Equality means one more iteration. */ tcg_gen_addi_i64(t0, t0, 1); + + /* If op1 is max (un)signed integer (and the only time the addition + * above could overflow), then we produce an all-true predicate by + * setting the count to the vector length. This is because the + * pseudocode is described as an increment + compare loop, and the + * max integer would always compare true. + */ + tcg_gen_movi_i64(t1, (a->sf + ? (a->u ? UINT64_MAX : INT64_MAX) + : (a->u ? UINT32_MAX : INT32_MAX))); + tcg_gen_movcond_i64(TCG_COND_EQ, t0, op1, t1, tmax, t0); } - /* t0 = (condition true ? t0 : 0). */ + /* Bound to the maximum. */ + tcg_gen_umin_i64(t0, t0, tmax); + tcg_temp_free_i64(tmax); + + /* Set the count to zero if the condition is false. */ cond = (a->u ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU) : (a->eq ? TCG_COND_LE : TCG_COND_LT)); tcg_gen_movi_i64(t1, 0); tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1); + tcg_temp_free_i64(t1); + /* Since we're bounded, pass as a 32-bit type. */ t2 = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(t2, t0); tcg_temp_free_i64(t0); - tcg_temp_free_i64(t1); + + /* Scale elements to bits. */ + tcg_gen_shli_i32(t2, t2, a->esz); desc = (vsz / 8) - 2; desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);