From patchwork Wed Jun 27 04:32:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Jnf7PCpC"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqst1FsJz9s0w for ; Wed, 27 Jun 2018 14:38:26 +1000 (AEST) Received: from localhost ([::1]:56498 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Dn-0007SR-QR for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:38:23 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60066) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29D-00043K-01 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY298-0008Qr-8D for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:38 -0400 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:46456) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY297-0008PU-VD for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:34 -0400 Received: by mail-pf0-x243.google.com with SMTP id q1-v6so379034pff.13 for ; Tue, 26 Jun 2018 21:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XTPkV6RQtrRYMAiDv7oAO3OIlUw/Z9HU7TpzGlzU21M=; b=Jnf7PCpCEmXLcgHgylMk0Z/TWHcdAgsOx/X7JSqP9dN/6/9Jm5HwaDW5pWEOtPHW+7 iGd3/q9PKOSQxOM5O4lUzmeFUCTMatNxScD+e50QTrVPYEtHI8+YI64CCxG9qMR4+9UY pEINctPX8TH5m4LysJc5S/41UM5jU9bcqnjbo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XTPkV6RQtrRYMAiDv7oAO3OIlUw/Z9HU7TpzGlzU21M=; b=e02NqXENpvjJ9XMG9KLRzcEuugAiynesJcacTwj+bLIJBq1nW4VMyMQE94EhUBI9OV 10ebCunHj96vPotKgsd8vIb2Ff0CoFwF3pl2qc9eWRo4tEXUAay8mjXNHnDt7zaJ12Dr tHPaqXaRaXFdNPdLG3w+ldoCtv3ESUx8Mv7C0dU0iGVttB0bZVwJ0AdJ+hRnMBLRBRN5 jyIfJbPRmfLpBxbzgPF+XJQHkCRxXLhw/RmjUl2bJdLvchvkXRq3fnQHKQkpTs5d33jK Xk6oY5wIXyCxitN6Fm6Jp273BMy8ulIOyJypEkMdbGytXxpyROKR9PUksPA7kLd0Hb+Y Nrvg== X-Gm-Message-State: APt69E3fz81QTUXEsz/epg4jwG5Zl608Rex/ZWsN91KS5tTeONIKgs+o QT2L4v0gv5w/bQqHWCo5hed4eQVuFWc= X-Google-Smtp-Source: ADUXVKIRV6gCFszFUlywEAw/2wZJoIhRNKQrqgplZo841Bew4HIyu5CdOZ/rZeZj2mwEI4iLmE2OIA== X-Received: by 2002:a65:444d:: with SMTP id e13-v6mr3705522pgq.122.1530074012569; Tue, 26 Jun 2018 21:33:32 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:31 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:54 -0700 Message-Id: <20180627043328.11531-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v6 01/35] target/arm: Implement SVE Memory Contiguous Load Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Alex Bennée Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 35 +++++++++ target/arm/sve_helper.c | 153 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 121 +++++++++++++++++++++++++++++ target/arm/sve.decode | 34 +++++++++ 4 files changed, 343 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 2e76084992..fcc9ba5f50 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -719,3 +719,38 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 128bbf9b04..4e6ad282f9 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2810,3 +2810,156 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } + +/* + * Load contiguous data, protected by a governing predicate. + */ +#define DO_LD1(NAME, FN, TYPEE, TYPEM, H) \ +static void do_##NAME(CPUARMState *env, void *vd, void *vg, \ + target_ulong addr, intptr_t oprsz, \ + uintptr_t ra) \ +{ \ + intptr_t i = 0; \ + do { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m = 0; \ + if (pg & 1) { \ + m = FN(env, addr, ra); \ + } \ + *(TYPEE *)(vd + H(i)) = m; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += sizeof(TYPEM); \ + } while (i & 15); \ + } while (i < oprsz); \ +} \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + do_##NAME(env, &env->vfp.zregs[simd_data(desc)], vg, \ + addr, simd_oprsz(desc), GETPC()); \ +} + +#define DO_LD2(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m1 = 0, m2 = 0; \ + if (pg & 1) { \ + m1 = FN(env, addr, ra); \ + m2 = FN(env, addr + sizeof(TYPEM), ra); \ + } \ + *(TYPEE *)(d1 + H(i)) = m1; \ + *(TYPEE *)(d2 + H(i)) = m2; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 2 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +#define DO_LD3(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m1 = 0, m2 = 0, m3 = 0; \ + if (pg & 1) { \ + m1 = FN(env, addr, ra); \ + m2 = FN(env, addr + sizeof(TYPEM), ra); \ + m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \ + } \ + *(TYPEE *)(d1 + H(i)) = m1; \ + *(TYPEE *)(d2 + H(i)) = m2; \ + *(TYPEE *)(d3 + H(i)) = m3; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 3 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +#define DO_LD4(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ + void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m1 = 0, m2 = 0, m3 = 0, m4 = 0; \ + if (pg & 1) { \ + m1 = FN(env, addr, ra); \ + m2 = FN(env, addr + sizeof(TYPEM), ra); \ + m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \ + m4 = FN(env, addr + 3 * sizeof(TYPEM), ra); \ + } \ + *(TYPEE *)(d1 + H(i)) = m1; \ + *(TYPEE *)(d2 + H(i)) = m2; \ + *(TYPEE *)(d3 + H(i)) = m3; \ + *(TYPEE *)(d4 + H(i)) = m4; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 4 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2) +DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2) +DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4) +DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4) +DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, ) +DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, ) + +DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4) +DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4) +DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, ) +DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, ) + +DO_LD1(sve_ld1sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, ) +DO_LD1(sve_ld1sds_r, cpu_ldl_data_ra, uint64_t, int32_t, ) + +DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) +DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) +DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) +DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) + +DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) +DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) +DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) +DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) + +DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) +DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) +DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) +DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) + +DO_LD1(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) +DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) +DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) +DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) + +#undef DO_LD1 +#undef DO_LD2 +#undef DO_LD3 +#undef DO_LD4 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 226c97579c..3543daff48 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -42,6 +42,8 @@ typedef void gen_helper_gvec_flags_3(TCGv_i32, TCGv_ptr, TCGv_ptr, typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_mem(TCGv_env, TCGv_ptr, TCGv_i64, TCGv_i32); + /* * Helpers for extracting complex instruction fields. */ @@ -82,6 +84,15 @@ static inline int expand_imm_sh8u(int x) return (uint8_t)x << (x & 0x100 ? 8 : 0); } +/* Convert a 2-bit memory size (msz) to a 4-bit data type (dtype) + * with unsigned data. C.f. SVE Memory Contiguous Load Group. + */ +static inline int msz_dtype(int msz) +{ + static const uint8_t dtype[4] = { 0, 5, 10, 15 }; + return dtype[msz]; +} + /* * Include the generated decoder. */ @@ -3526,3 +3537,113 @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn) } return true; } + +/* + *** SVE Memory - Contiguous Load Group + */ + +/* The memory mode of the dtype. */ +static const TCGMemOp dtype_mop[16] = { + MO_UB, MO_UB, MO_UB, MO_UB, + MO_SL, MO_UW, MO_UW, MO_UW, + MO_SW, MO_SW, MO_UL, MO_UL, + MO_SB, MO_SB, MO_SB, MO_Q +}; + +#define dtype_msz(x) (dtype_mop[x] & MO_SIZE) + +/* The vector element size of dtype. */ +static const uint8_t dtype_esz[16] = { + 0, 1, 2, 3, + 3, 1, 2, 3, + 3, 2, 2, 3, + 3, 2, 1, 3 +}; + +static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, + gen_helper_gvec_mem *fn) +{ + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr t_pg; + TCGv_i32 desc; + + /* For e.g. LD4, there are not enough arguments to pass all 4 + * registers as pointers, so encode the regno into the data field. + * For consistency, do this even for LD1. + */ + desc = tcg_const_i32(simd_desc(vsz, vsz, zt)); + t_pg = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg)); + fn(cpu_env, t_pg, addr, desc); + + tcg_temp_free_ptr(t_pg); + tcg_temp_free_i32(desc); +} + +static void do_ld_zpa(DisasContext *s, int zt, int pg, + TCGv_i64 addr, int dtype, int nreg) +{ + static gen_helper_gvec_mem * const fns[16][4] = { + { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, + gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, + { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1sds_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r, + gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r }, + { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1hds_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hss_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r, + gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r }, + { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1bds_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r, + gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r }, + }; + gen_helper_gvec_mem *fn = fns[dtype][nreg]; + + /* While there are holes in the table, they are not + * accessible via the instruction encoding. + */ + assert(fn != NULL); + do_mem_zpa(s, zt, pg, addr, fn); +} + +static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn) +{ + if (a->rm == 31) { + return false; + } + if (sve_access_check(s)) { + TCGv_i64 addr = new_tmp_a64(s); + tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), + (a->nreg + 1) << dtype_msz(a->dtype)); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg); + } + return true; +} + +static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) +{ + if (sve_access_check(s)) { + int vsz = vec_full_reg_size(s); + int elements = vsz >> dtype_esz[a->dtype]; + TCGv_i64 addr = new_tmp_a64(s); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + (a->imm * elements * (a->nreg + 1)) + << dtype_msz(a->dtype)); + do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg); + } + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6f436f9096..cfb12da639 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -45,6 +45,9 @@ # Unsigned 8-bit immediate, optionally shifted left by 8. %sh8_i8u 5:9 !function=expand_imm_sh8u +# Unsigned load of msz into esz=2, represented as a dtype. +%msz_dtype 23:2 !function=msz_dtype + # Either a copy of rd (at bit 0), or a different source # as propagated via the MOVPRFX instruction. %reg_movprfx 0:5 @@ -71,6 +74,8 @@ &incdec2_cnt rd rn pat esz imm d u &incdec_pred rd pg esz d u &incdec2_pred rd rn pg esz d u +&rprr_load rd pg rn rm dtype nreg +&rpri_load rd pg rn imm dtype nreg ########################################################################### # Named instruction formats. These are generally used to @@ -170,6 +175,15 @@ @incdec2_pred ........ esz:2 .... .. ..... .. pg:4 rd:5 \ &incdec2_pred rn=%reg_movprfx +# Loads; user must fill in NREG. +@rprr_load_dt ....... dtype:4 rm:5 ... pg:3 rn:5 rd:5 &rprr_load +@rpri_load_dt ....... dtype:4 . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load + +@rprr_load_msz ....... .... rm:5 ... pg:3 rn:5 rd:5 \ + &rprr_load dtype=%msz_dtype +@rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ + &rpri_load dtype=%msz_dtype + ########################################################################### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -665,3 +679,23 @@ LDR_pri 10000101 10 ...... 000 ... ..... 0 .... @pd_rn_i9 # SVE load vector register LDR_zri 10000101 10 ...... 010 ... ..... ..... @rd_rn_i9 + +### SVE Memory Contiguous Load Group + +# SVE contiguous load (scalar plus scalar) +LD_zprr 1010010 .... ..... 010 ... ..... ..... @rprr_load_dt nreg=0 + +# SVE contiguous load (scalar plus immediate) +LD_zpri 1010010 .... 0.... 101 ... ..... ..... @rpri_load_dt nreg=0 + +# SVE contiguous non-temporal load (scalar plus scalar) +# LDNT1B, LDNT1H, LDNT1W, LDNT1D +# SVE load multiple structures (scalar plus scalar) +# LD2B, LD2H, LD2W, LD2D; etc. +LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz + +# SVE contiguous non-temporal load (scalar plus immediate) +# LDNT1B, LDNT1H, LDNT1W, LDNT1D +# SVE load multiple structures (scalar plus immediate) +# LD2B, LD2H, LD2W, LD2D; etc. +LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz From patchwork Wed Jun 27 04:32:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935262 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="aBDMJulC"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FqpJ5T0hz9s1B for ; Wed, 27 Jun 2018 14:35:20 +1000 (AEST) Received: from localhost ([::1]:56478 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Ao-0004wK-Ef for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:35:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60067) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29C-00043L-W2 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY299-0008Sy-M1 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:38 -0400 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:36055) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY299-0008Rn-Er for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:35 -0400 Received: by mail-pf0-x243.google.com with SMTP id u16-v6so389165pfh.3 for ; Tue, 26 Jun 2018 21:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QYO2UbmsWye4KUl79/KJ4eUpBHEcbH/7GuW7WEWWVaY=; b=aBDMJulCJ/x4umZZSUg/n2XFC/4LozaCfk/iRZK9aJtQYUzzBlsANKXIQgrlBYGSe5 uD1zjh8DsoxBpQzcckOGs9j2qsKmrT1tRIbMBK66pxYsuwsPYPWswiFypy+/uHu9GhkM 2x7UAv9chfXWWhwl0PYXWvFlPKgv6pUlJCwTI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QYO2UbmsWye4KUl79/KJ4eUpBHEcbH/7GuW7WEWWVaY=; b=oMx3w6+T8UQZUbi9EQbypBy7jGlqkZM9wz94EYcQdLacjGQKz0QFh76GrhL2wc09hx vwrVqY3OD3cD5CH3xBIRJ1hftYNar/1Pah7Lj6ZHr6tHoL7UELqEFItbA//pFKjEseLI d2LmrCqKT/I6ziSQUebKfZ1E7z54baCdJQAON+XS/VXCNelRToNLTCo/BDHqqcs+Urwi Qlb9zI1JRcrgeklrbidg3sxCw5PStuBQlccUcgNXykvYnwsn2otWW4oKEEfRn0nwDzrW 7VEIVVr80qu6fQO8c1IQNjOIqQMS8p5ylc0bOE4TuOgMwfy4n1799AWQuiFmBZ3YNoTA IcjA== X-Gm-Message-State: APt69E3WqimTzuXKs9XXoFjJSd73T9DPXhLv0e69Mj+UWOq8OUuxANH3 uB93h461ALbQWnhWGN68KeVA55A1YCY= X-Google-Smtp-Source: ADUXVKK63SMmIe8TsAQXeTtYUFS4Z00IPyVIJgACz82ufYBMGha+eZPSfKJBUsqsmm4zT7dgDjqWiQ== X-Received: by 2002:a63:8b44:: with SMTP id j65-v6mr3779624pge.248.1530074014178; Tue, 26 Jun 2018 21:33:34 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.32 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:33 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:55 -0700 Message-Id: <20180627043328.11531-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v6 02/35] target/arm: Implement SVE Contiguous Load, first-fault and no-fault X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson Reviewed-by: Alex Bennée Tested-by: Alex Bennée --- v6: * Remove cold attribute from record_fault, add unlikely marker to the if that protects its call, which seems to be enough to prevent the function being inlined. * Fix the set of bits masked by record_fault. --- target/arm/helper-sve.h | 40 ++++++++++ target/arm/sve_helper.c | 157 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 69 ++++++++++++++++ target/arm/sve.decode | 6 ++ 4 files changed, 272 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index fcc9ba5f50..7338abbbcf 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -754,3 +754,43 @@ DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldff1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldff1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldff1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldff1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldnf1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldnf1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldnf1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4e6ad282f9..0d22a57a22 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2963,3 +2963,160 @@ DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) #undef DO_LD2 #undef DO_LD3 #undef DO_LD4 + +/* + * Load contiguous data, first-fault and no-fault. + */ + +#ifdef CONFIG_USER_ONLY + +/* Fault on byte I. All bits in FFR from I are cleared. The vector + * result from I is CONSTRAINED UNPREDICTABLE; we choose the MERGE + * option, which leaves subsequent data unchanged. + */ +static void record_fault(CPUARMState *env, uintptr_t i, uintptr_t oprsz) +{ + uint64_t *ffr = env->vfp.pregs[FFR_PRED_NUM].p; + + if (i & 63) { + ffr[i / 64] &= MAKE_64BIT_MASK(0, i & 63); + i = ROUND_UP(i, 64); + } + for (; i < oprsz; i += 64) { + ffr[i / 64] = 0; + } +} + +/* Hold the mmap lock during the operation so that there is no race + * between page_check_range and the load operation. We expect the + * usual case to have no faults at all, so we check the whole range + * first and if successful defer to the normal load operation. + * + * TODO: Change mmap_lock to a rwlock so that multiple readers + * can run simultaneously. This will probably help other uses + * within QEMU as well. + */ +#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \ +static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg, \ + target_ulong addr, intptr_t oprsz, \ + bool first, uintptr_t ra) \ +{ \ + intptr_t i = 0; \ + do { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m = 0; \ + if (pg & 1) { \ + if (!first && \ + unlikely(page_check_range(addr, sizeof(TYPEM), \ + PAGE_READ))) { \ + record_fault(env, i, oprsz); \ + return; \ + } \ + m = FN(env, addr, ra); \ + first = false; \ + } \ + *(TYPEE *)(vd + H(i)) = m; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += sizeof(TYPEM); \ + } while (i & 15); \ + } while (i < oprsz); \ +} \ +void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + unsigned rd = simd_data(desc); \ + void *vd = &env->vfp.zregs[rd]; \ + mmap_lock(); \ + if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \ + do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \ + } else { \ + do_sve_ldff1##PART(env, vd, vg, addr, oprsz, true, GETPC()); \ + } \ + mmap_unlock(); \ +} + +/* No-fault loads are like first-fault loads without the + * first faulting special case. + */ +#define DO_LDNF1(PART) \ +void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + unsigned rd = simd_data(desc); \ + void *vd = &env->vfp.zregs[rd]; \ + mmap_lock(); \ + if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \ + do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \ + } else { \ + do_sve_ldff1##PART(env, vd, vg, addr, oprsz, false, GETPC()); \ + } \ + mmap_unlock(); \ +} + +#else + +/* TODO: System mode is not yet supported. + * This would probably use tlb_vaddr_to_host. + */ +#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \ +void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + g_assert_not_reached(); \ +} + +#define DO_LDNF1(PART) \ +void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + g_assert_not_reached(); \ +} + +#endif + +DO_LDFF1(bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) +DO_LDFF1(bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2) +DO_LDFF1(bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2) +DO_LDFF1(bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4) +DO_LDFF1(bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4) +DO_LDFF1(bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, ) +DO_LDFF1(bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, ) + +DO_LDFF1(hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) +DO_LDFF1(hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4) +DO_LDFF1(hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4) +DO_LDFF1(hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, ) +DO_LDFF1(hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, ) + +DO_LDFF1(ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) +DO_LDFF1(sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, ) +DO_LDFF1(sds_r, cpu_ldl_data_ra, uint64_t, int32_t, ) + +DO_LDFF1(dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) + +#undef DO_LDFF1 + +DO_LDNF1(bb_r) +DO_LDNF1(bhu_r) +DO_LDNF1(bhs_r) +DO_LDNF1(bsu_r) +DO_LDNF1(bss_r) +DO_LDNF1(bdu_r) +DO_LDNF1(bds_r) + +DO_LDNF1(hh_r) +DO_LDNF1(hsu_r) +DO_LDNF1(hss_r) +DO_LDNF1(hdu_r) +DO_LDNF1(hds_r) + +DO_LDNF1(ss_r) +DO_LDNF1(sdu_r) +DO_LDNF1(sds_r) + +DO_LDNF1(dd_r) + +#undef DO_LDNF1 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 3543daff48..09f77b5405 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3647,3 +3647,72 @@ static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) } return true; } + +static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn) +{ + static gen_helper_gvec_mem * const fns[16] = { + gen_helper_sve_ldff1bb_r, + gen_helper_sve_ldff1bhu_r, + gen_helper_sve_ldff1bsu_r, + gen_helper_sve_ldff1bdu_r, + + gen_helper_sve_ldff1sds_r, + gen_helper_sve_ldff1hh_r, + gen_helper_sve_ldff1hsu_r, + gen_helper_sve_ldff1hdu_r, + + gen_helper_sve_ldff1hds_r, + gen_helper_sve_ldff1hss_r, + gen_helper_sve_ldff1ss_r, + gen_helper_sve_ldff1sdu_r, + + gen_helper_sve_ldff1bds_r, + gen_helper_sve_ldff1bss_r, + gen_helper_sve_ldff1bhs_r, + gen_helper_sve_ldff1dd_r, + }; + + if (sve_access_check(s)) { + TCGv_i64 addr = new_tmp_a64(s); + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype)); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]); + } + return true; +} + +static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) +{ + static gen_helper_gvec_mem * const fns[16] = { + gen_helper_sve_ldnf1bb_r, + gen_helper_sve_ldnf1bhu_r, + gen_helper_sve_ldnf1bsu_r, + gen_helper_sve_ldnf1bdu_r, + + gen_helper_sve_ldnf1sds_r, + gen_helper_sve_ldnf1hh_r, + gen_helper_sve_ldnf1hsu_r, + gen_helper_sve_ldnf1hdu_r, + + gen_helper_sve_ldnf1hds_r, + gen_helper_sve_ldnf1hss_r, + gen_helper_sve_ldnf1ss_r, + gen_helper_sve_ldnf1sdu_r, + + gen_helper_sve_ldnf1bds_r, + gen_helper_sve_ldnf1bss_r, + gen_helper_sve_ldnf1bhs_r, + gen_helper_sve_ldnf1dd_r, + }; + + if (sve_access_check(s)) { + int vsz = vec_full_reg_size(s); + int elements = vsz >> dtype_esz[a->dtype]; + int off = (a->imm * elements) << dtype_msz(a->dtype); + TCGv_i64 addr = new_tmp_a64(s); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off); + do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]); + } + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index cfb12da639..afbed57de1 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -685,9 +685,15 @@ LDR_zri 10000101 10 ...... 010 ... ..... ..... @rd_rn_i9 # SVE contiguous load (scalar plus scalar) LD_zprr 1010010 .... ..... 010 ... ..... ..... @rprr_load_dt nreg=0 +# SVE contiguous first-fault load (scalar plus scalar) +LDFF1_zprr 1010010 .... ..... 011 ... ..... ..... @rprr_load_dt nreg=0 + # SVE contiguous load (scalar plus immediate) LD_zpri 1010010 .... 0.... 101 ... ..... ..... @rpri_load_dt nreg=0 +# SVE contiguous non-fault load (scalar plus immediate) +LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 + # SVE contiguous non-temporal load (scalar plus scalar) # LDNT1B, LDNT1H, LDNT1W, LDNT1D # SVE load multiple structures (scalar plus scalar) From patchwork Wed Jun 27 04:32:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935261 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Xih5B190"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FqnW2hSVz9s1B for ; Wed, 27 Jun 2018 14:34:39 +1000 (AEST) Received: from localhost ([::1]:56471 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2A9-000494-1U for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:34:37 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60099) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29D-00043T-Km for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29B-0008VV-4N for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:39 -0400 Received: from mail-pf0-x234.google.com ([2607:f8b0:400e:c00::234]:41882) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29A-0008U9-SO for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:37 -0400 Received: by mail-pf0-x234.google.com with SMTP id a11-v6so384373pff.8 for ; Tue, 26 Jun 2018 21:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=1PmEgwZlFp9xc2QDjoVForQzcOsUldcvf0BP0dIYBDg=; b=Xih5B1904RpiA+W3+qBMs2TofChGjvBKbCgin1ecabqgtt/V6Vm3Urv3QzxC6R4EuR 1y+B5RIF28j7iuM5i6d8cXZau/FRCu/oX8zxvui++6Ze+NL0flr/lyZwoV0BjVExYnhL D8bpdUhe6pSYhknNSQwE1vdtlA3zgTdtx+wTw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1PmEgwZlFp9xc2QDjoVForQzcOsUldcvf0BP0dIYBDg=; b=irjVnxXwOK0YFOaErdnXJprUR0TxjyFEkQg7IEF0yT8I/g/pOHjKJyOXb1U+R9DZnn DsGVJapcgD5bgrKDm2qzvAA67w2gQZ2vEG+ha16Nj+bogWHZ3pbz2OuGlMMJ+peR8E4h L+wfJNonl93VSaCBPS4vpGiu76adCx3GdK74R6fE36dhPInNxMn8yd2pASY+hsGkvxV+ IbEaiPinhrfIm7eVEtS9SjTeNDCDR48j2p/LiqJXf/r7NSwikhjmOh+P8jSLHgjLm8d6 FwtGRgD5fHr+ctzCSy0QU2jN1YwJ8PtJEp21TQkMKb95R4HgW5g3x53/Y1963RiCw4CS SJpA== X-Gm-Message-State: APt69E06eSwSKY1rFpkQy0tIOjsS8brx8+naTeOss6pGoFwP+uqK1nvw v7FqkxUWlKeHPQU6XJ9llCnFaI0vs4Y= X-Google-Smtp-Source: ADUXVKJhtUHjSiaHHQWSCNa3WJWa066EQ5r/ilcPq7s3GOOT0T65Vu2tFDXVB+esFEQqRRT/4pU0RQ== X-Received: by 2002:a63:6501:: with SMTP id z1-v6mr3746667pgb.452.1530074015511; Tue, 26 Jun 2018 21:33:35 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:34 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:56 -0700 Message-Id: <20180627043328.11531-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::234 Subject: [Qemu-devel] [PATCH v6 03/35] target/arm: Implement SVE Memory Contiguous Store Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 29 +++++ target/arm/sve_helper.c | 211 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 65 ++++++++++++ target/arm/sve.decode | 38 +++++++ 4 files changed, 343 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 7338abbbcf..b768128951 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -794,3 +794,32 @@ DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 0d22a57a22..bd874e6fa2 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3120,3 +3120,214 @@ DO_LDNF1(sds_r) DO_LDNF1(dd_r) #undef DO_LDNF1 + +/* + * Store contiguous data, protected by a governing predicate. + */ +#define DO_ST1(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *vd = &env->vfp.zregs[rd]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + if (pg & 1) { \ + TYPEM m = *(TYPEE *)(vd + H(i)); \ + FN(env, addr, m, ra); \ + } \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +#define DO_ST1_D(NAME, FN, TYPEM) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc) / 8; \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + uint64_t *d = &env->vfp.zregs[rd].d[0]; \ + uint8_t *pg = vg; \ + for (i = 0; i < oprsz; i += 1) { \ + if (pg[H1(i)] & 1) { \ + FN(env, addr, d[i], ra); \ + } \ + addr += sizeof(TYPEM); \ + } \ +} + +#define DO_ST2(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + if (pg & 1) { \ + TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ + TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ + FN(env, addr, m1, ra); \ + FN(env, addr + sizeof(TYPEM), m2, ra); \ + } \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 2 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +#define DO_ST3(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + if (pg & 1) { \ + TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ + TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ + TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ + FN(env, addr, m1, ra); \ + FN(env, addr + sizeof(TYPEM), m2, ra); \ + FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ + } \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 3 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +#define DO_ST4(NAME, FN, TYPEE, TYPEM, H) \ +void HELPER(NAME)(CPUARMState *env, void *vg, \ + target_ulong addr, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + intptr_t ra = GETPC(); \ + unsigned rd = simd_data(desc); \ + void *d1 = &env->vfp.zregs[rd]; \ + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ + void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + if (pg & 1) { \ + TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ + TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ + TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ + TYPEM m4 = *(TYPEE *)(d4 + H(i)); \ + FN(env, addr, m1, ra); \ + FN(env, addr + sizeof(TYPEM), m2, ra); \ + FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ + FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \ + } \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + addr += 4 * sizeof(TYPEM); \ + } while (i & 15); \ + } \ +} + +DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2) +DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4) +DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t) + +DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4) +DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t) + +DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t) + +DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) +DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) +DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) +DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) + +DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) +DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) +DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) +DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) + +DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) +DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) +DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) +DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) + +DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t) + +void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg, + target_ulong addr, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc) / 8; + intptr_t ra = GETPC(); + unsigned rd = simd_data(desc); + uint64_t *d1 = &env->vfp.zregs[rd].d[0]; + uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; + uint8_t *pg = vg; + + for (i = 0; i < oprsz; i += 1) { + if (pg[H1(i)] & 1) { + cpu_stq_data_ra(env, addr, d1[i], ra); + cpu_stq_data_ra(env, addr + 8, d2[i], ra); + } + addr += 2 * 8; + } +} + +void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg, + target_ulong addr, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc) / 8; + intptr_t ra = GETPC(); + unsigned rd = simd_data(desc); + uint64_t *d1 = &env->vfp.zregs[rd].d[0]; + uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; + uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; + uint8_t *pg = vg; + + for (i = 0; i < oprsz; i += 1) { + if (pg[H1(i)] & 1) { + cpu_stq_data_ra(env, addr, d1[i], ra); + cpu_stq_data_ra(env, addr + 8, d2[i], ra); + cpu_stq_data_ra(env, addr + 16, d3[i], ra); + } + addr += 3 * 8; + } +} + +void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg, + target_ulong addr, uint32_t desc) +{ + intptr_t i, oprsz = simd_oprsz(desc) / 8; + intptr_t ra = GETPC(); + unsigned rd = simd_data(desc); + uint64_t *d1 = &env->vfp.zregs[rd].d[0]; + uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; + uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; + uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0]; + uint8_t *pg = vg; + + for (i = 0; i < oprsz; i += 1) { + if (pg[H1(i)] & 1) { + cpu_stq_data_ra(env, addr, d1[i], ra); + cpu_stq_data_ra(env, addr + 8, d2[i], ra); + cpu_stq_data_ra(env, addr + 16, d3[i], ra); + cpu_stq_data_ra(env, addr + 24, d4[i], ra); + } + addr += 4 * 8; + } +} diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 09f77b5405..b25fe96b77 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3716,3 +3716,68 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) } return true; } + +static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, + int msz, int esz, int nreg) +{ + static gen_helper_gvec_mem * const fn_single[4][4] = { + { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, + gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r }, + { NULL, gen_helper_sve_st1hh_r, + gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r }, + { NULL, NULL, + gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r }, + { NULL, NULL, NULL, gen_helper_sve_st1dd_r }, + }; + static gen_helper_gvec_mem * const fn_multiple[3][4] = { + { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_r, + gen_helper_sve_st2ss_r, gen_helper_sve_st2dd_r }, + { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_r, + gen_helper_sve_st3ss_r, gen_helper_sve_st3dd_r }, + { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_r, + gen_helper_sve_st4ss_r, gen_helper_sve_st4dd_r }, + }; + gen_helper_gvec_mem *fn; + + if (nreg == 0) { + /* ST1 */ + fn = fn_single[msz][esz]; + } else { + /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */ + assert(msz == esz); + fn = fn_multiple[nreg - 1][msz]; + } + assert(fn != NULL); + do_mem_zpa(s, zt, pg, addr, fn); +} + +static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn) +{ + if (a->rm == 31 || a->msz > a->esz) { + return false; + } + if (sve_access_check(s)) { + TCGv_i64 addr = new_tmp_a64(s); + tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), (a->nreg + 1) << a->msz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg); + } + return true; +} + +static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn) +{ + if (a->msz > a->esz) { + return false; + } + if (sve_access_check(s)) { + int vsz = vec_full_reg_size(s); + int elements = vsz >> a->esz; + TCGv_i64 addr = new_tmp_a64(s); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + (a->imm * elements * (a->nreg + 1)) << a->msz); + do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg); + } + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index afbed57de1..6e159faaec 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -27,6 +27,7 @@ %imm7_22_16 22:2 16:5 %imm8_16_10 16:5 10:3 %imm9_16_10 16:s6 10:3 +%size_23 23:2 # A combination of tsz:imm3 -- extract esize. %tszimm_esz 22:2 5:5 !function=tszimm_esz @@ -76,6 +77,8 @@ &incdec2_pred rd rn pg esz d u &rprr_load rd pg rn rm dtype nreg &rpri_load rd pg rn imm dtype nreg +&rprr_store rd pg rn rm msz esz nreg +&rpri_store rd pg rn imm msz esz nreg ########################################################################### # Named instruction formats. These are generally used to @@ -184,6 +187,12 @@ @rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ &rpri_load dtype=%msz_dtype +# Stores; user must fill in ESZ, MSZ, NREG as needed. +@rprr_store ....... .. .. rm:5 ... pg:3 rn:5 rd:5 &rprr_store +@rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store +@rprr_store_esz_n0 ....... .. esz:2 rm:5 ... pg:3 rn:5 rd:5 \ + &rprr_store nreg=0 + ########################################################################### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -705,3 +714,32 @@ LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz # SVE load multiple structures (scalar plus immediate) # LD2B, LD2H, LD2W, LD2D; etc. LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz + +### SVE Memory Store Group + +# SVE contiguous store (scalar plus immediate) +# ST1B, ST1H, ST1W, ST1D; require msz <= esz +ST_zpri 1110010 .. esz:2 0.... 111 ... ..... ..... \ + @rpri_store_msz nreg=0 + +# SVE contiguous store (scalar plus scalar) +# ST1B, ST1H, ST1W, ST1D; require msz <= esz +# Enumerate msz lest we conflict with STR_zri. +ST_zprr 1110010 00 .. ..... 010 ... ..... ..... \ + @rprr_store_esz_n0 msz=0 +ST_zprr 1110010 01 .. ..... 010 ... ..... ..... \ + @rprr_store_esz_n0 msz=1 +ST_zprr 1110010 10 .. ..... 010 ... ..... ..... \ + @rprr_store_esz_n0 msz=2 +ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ + @rprr_store msz=3 esz=3 nreg=0 + +# SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) +# SVE store multiple structures (scalar plus immediate) (nreg != 0) +ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ + @rpri_store_msz esz=%size_23 + +# SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) +# SVE store multiple structures (scalar plus scalar) (nreg != 0) +ST_zprr 1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \ + @rprr_store esz=%size_23 From patchwork Wed Jun 27 04:32:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935264 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="dwhYF9Be"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqrt0h4Zz9s0w for ; Wed, 27 Jun 2018 14:37:34 +1000 (AEST) Received: from localhost ([::1]:56490 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Cx-0006oL-IZ for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:37:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29D-00043Y-Ve for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29C-000065-Aw for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:39 -0400 Received: from mail-pf0-x229.google.com ([2607:f8b0:400e:c00::229]:33701) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29C-0008WR-4G for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:38 -0400 Received: by mail-pf0-x229.google.com with SMTP id b17-v6so391105pfi.0 for ; Tue, 26 Jun 2018 21:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=h035EMECAETS5Qbemtcsd16Nx9M5+Aleay4ARYrkoMU=; b=dwhYF9Be/IrvgBbsEyF7HHE9Egfxd6iRYwvv/w+k820hGF/IQLKW5X5LWEXVb1uGV1 fUgaDFxMpZw0OE2sRkF0ei6eWKfGTilkU0Ft62bFyE9t1GyIWV2rDHMG1+ez7v79GsCE DLGqgJzNVs3odU2klwtgqZGRHRUKlgIt8JVeg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=h035EMECAETS5Qbemtcsd16Nx9M5+Aleay4ARYrkoMU=; b=Yu1hrnH+Qy/z5Zskznf6ORQiKERdMO9XTuk7mHGZ/pUZJmuubj5uKqGiOcWoPnTLdj Kts7PhvQfGGpCmlgKyJpNXwcqmflnSu1haajduhiYP+2wYZEsc0gqm5pf8LS8mevYC0M 9zT/pDPDe2qOejpuNFHTwuzfPoZqjkZqL3jvYVy8W/cZ6DBh9tSkVD3BBlOzdbOt66Ho tK9J0xjY5IbhwnrYx4B8m++rMcVVYyyssgUCq2MeeBsTPZYq5UuIUJWggauGWbRK1Z7u ilP9CqBMEkIQ6OyUyUTYzZs7RQFxh9+cgohkn3tcM8ELgfLMgaxwbAtSXHP1WBHC2/Vp l6gA== X-Gm-Message-State: APt69E2WagPAayo7edmD6OhlVmSUw9i5qP0nwhwKIx1SqXny6DXmnE9X 5cM129s337s/UDrW0XmrRjROVWWyx3U= X-Google-Smtp-Source: AAOMgpdZXvlG54hXaXCYADPesMENsx3cl/QOcgaojslcwLfEsXk4XEJbt8Aw0+FPRtABmIraFb9zCA== X-Received: by 2002:a63:b047:: with SMTP id z7-v6mr3399792pgo.335.1530074016943; Tue, 26 Jun 2018 21:33:36 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:35 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:57 -0700 Message-Id: <20180627043328.11531-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::229 Subject: [Qemu-devel] [PATCH v6 04/35] target/arm: Implement SVE load and broadcast quadword X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/translate-sve.c | 52 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 9 +++++++ 2 files changed, 61 insertions(+) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b25fe96b77..83de87ee0e 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3717,6 +3717,58 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) return true; } +static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz) +{ + static gen_helper_gvec_mem * const fns[4] = { + gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r, + gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r, + }; + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr t_pg; + TCGv_i32 desc; + + /* Load the first quadword using the normal predicated load helpers. */ + desc = tcg_const_i32(simd_desc(16, 16, zt)); + t_pg = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg)); + fns[msz](cpu_env, t_pg, addr, desc); + + tcg_temp_free_ptr(t_pg); + tcg_temp_free_i32(desc); + + /* Replicate that first quadword. */ + if (vsz > 16) { + unsigned dofs = vec_full_reg_offset(s, zt); + tcg_gen_gvec_dup_mem(4, dofs + 16, dofs, vsz - 16, vsz - 16); + } +} + +static bool trans_LD1RQ_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn) +{ + if (a->rm == 31) { + return false; + } + if (sve_access_check(s)) { + int msz = dtype_msz(a->dtype); + TCGv_i64 addr = new_tmp_a64(s); + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), msz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + do_ldrq(s, a->rd, a->pg, addr, msz); + } + return true; +} + +static bool trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) +{ + if (sve_access_check(s)) { + TCGv_i64 addr = new_tmp_a64(s); + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 16); + do_ldrq(s, a->rd, a->pg, addr, dtype_msz(a->dtype)); + } + return true; +} + static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz, int esz, int nreg) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6e159faaec..606c4f623c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -715,6 +715,15 @@ LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz # LD2B, LD2H, LD2W, LD2D; etc. LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz +# SVE load and broadcast quadword (scalar plus scalar) +LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \ + @rprr_load_msz nreg=0 + +# SVE load and broadcast quadword (scalar plus immediate) +# LD1RQB, LD1RQH, LD1RQS, LD1RQD +LD1RQ_zpri 1010010 .. 00 0.... 001 ... ..... ..... \ + @rpri_load_msz nreg=0 + ### SVE Memory Store Group # SVE contiguous store (scalar plus immediate) From patchwork Wed Jun 27 04:32:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935274 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="jHvxwtiH"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FqxF3Fhjz9s0w for ; Wed, 27 Jun 2018 14:41:21 +1000 (AEST) Received: from localhost ([::1]:56520 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Gd-0001ao-2e for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:41:19 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60161) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29G-00045q-Vz for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29D-00008s-QU for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:42 -0400 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]:33634) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29D-000071-GA for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:39 -0400 Received: by mail-pf0-x242.google.com with SMTP id b17-v6so391133pfi.0 for ; Tue, 26 Jun 2018 21:33:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SpC8Ls8Vey49Y1LJVVXo41t8GA9lAgJSrVvjWslh2Vw=; b=jHvxwtiHHQXXQ5dWHTOTLegZxt8LOXfIJzN22I6GDSEFWsdGeU0UBkPl/Os6VAnO0S 6SqzSbVE9wanIM3Sssm96UMLzBmF6jQ0GEu4UzV7VCwA7U2h9NnZbwRIVmL8WGHRfgMJ 68dFCy9MvNDOrPldMSKx6bMAV1IoiSGab2D3o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SpC8Ls8Vey49Y1LJVVXo41t8GA9lAgJSrVvjWslh2Vw=; b=NxZVmqrsAHqGmj4kZwH+5FD8TAq5NicNSCICTaCr3R6bTV6Wv74KrEd0hviN6HNrDc lpQu6WneuT0t38N37HTsrBJD+f5m43mBb3z0R3AyF3/BMGZP0RQ6AfhdTCbFu++x5uLV rHoKHAgonRzpCSN1YMOO6N0sXuubfs+gSTFCj9RCONd55VDLwB7YEMnGPvsMHdJa7BiU Ai8X32pZRehwWub7QYVn9l+ghioogxYWpwYm27Ay5goNsT1kdF64/1wDNWcSg9ZhjOax cg7RBzhS6GKdsbXuPNGhmgElWxL+JSgt4/Gm4tQDz5dEcitDpllUVpGzG5UqaT47/Egj YHUg== X-Gm-Message-State: APt69E0EXdeGrAuRSm/j0gl/V2t9f+U0GqI9xP64rik9HhQlmZII417T 4CxKiANV9TmjTuKusamt5g7lJ3CpvSk= X-Google-Smtp-Source: ADUXVKLoKSvcVfpk+JwW6cy5JZnOJJUTuyOAmCTfhQH1E0c61ZPEYpV7NuBtXBZTuj9dJqA/GUPuEg== X-Received: by 2002:a65:43cb:: with SMTP id n11-v6mr3720648pgp.234.1530074018225; Tue, 26 Jun 2018 21:33:38 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:37 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:58 -0700 Message-Id: <20180627043328.11531-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::242 Subject: [Qemu-devel] [PATCH v6 05/35] target/arm: Implement SVE integer convert to floating-point X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 30 +++++++++++++ target/arm/sve_helper.c | 38 ++++++++++++++++ target/arm/translate-sve.c | 90 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 22 ++++++++++ 4 files changed, 180 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index b768128951..185112e1d2 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -720,6 +720,36 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_dh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_ss, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_scvt_dd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_ucvt_hh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_sh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_dh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_ss, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bd874e6fa2..031bec22df 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2811,6 +2811,44 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } +/* Fully general two-operand expander, controlled by a predicate, + * With the extra float_status parameter. + */ +#define DO_ZPZ_FP(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \ +{ \ + intptr_t i = simd_oprsz(desc); \ + uint64_t *g = vg; \ + do { \ + uint64_t pg = g[(i - 1) >> 6]; \ + do { \ + i -= sizeof(TYPE); \ + if (likely((pg >> (i & 63)) & 1)) { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + *(TYPE *)(vd + H(i)) = OP(nn, status); \ + } \ + } while (i & 63); \ + } while (i != 0); \ +} + +DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16) +DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16) +DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32) +DO_ZPZ_FP(sve_scvt_sd, uint64_t, , int32_to_float64) +DO_ZPZ_FP(sve_scvt_dh, uint64_t, , int64_to_float16) +DO_ZPZ_FP(sve_scvt_ds, uint64_t, , int64_to_float32) +DO_ZPZ_FP(sve_scvt_dd, uint64_t, , int64_to_float64) + +DO_ZPZ_FP(sve_ucvt_hh, uint16_t, H1_2, uint16_to_float16) +DO_ZPZ_FP(sve_ucvt_sh, uint32_t, H1_4, uint32_to_float16) +DO_ZPZ_FP(sve_ucvt_ss, uint32_t, H1_4, uint32_to_float32) +DO_ZPZ_FP(sve_ucvt_sd, uint64_t, , uint32_to_float64) +DO_ZPZ_FP(sve_ucvt_dh, uint64_t, , uint64_to_float16) +DO_ZPZ_FP(sve_ucvt_ds, uint64_t, , uint64_to_float32) +DO_ZPZ_FP(sve_ucvt_dd, uint64_t, , uint64_to_float64) + +#undef DO_ZPZ_FP + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 83de87ee0e..7639e589f5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3425,6 +3425,96 @@ DO_FP3(FRSQRTS, rsqrts) #undef DO_FP3 + +/* + *** SVE Floating Point Unary Operations Prediated Group + */ + +static bool do_zpz_ptr(DisasContext *s, int rd, int rn, int pg, + bool is_fp16, gen_helper_gvec_3_ptr *fn) +{ + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(is_fp16); + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + pred_full_reg_offset(s, pg), + status, vsz, vsz, 0, fn); + tcg_temp_free_ptr(status); + } + return true; +} + +static bool trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh); +} + +static bool trans_SCVTF_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_sh); +} + +static bool trans_SCVTF_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_dh); +} + +static bool trans_SCVTF_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_ss); +} + +static bool trans_SCVTF_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_ds); +} + +static bool trans_SCVTF_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_sd); +} + +static bool trans_SCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_dd); +} + +static bool trans_UCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_hh); +} + +static bool trans_UCVTF_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_sh); +} + +static bool trans_UCVTF_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_dh); +} + +static bool trans_UCVTF_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_ss); +} + +static bool trans_UCVTF_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_ds); +} + +static bool trans_UCVTF_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_sd); +} + +static bool trans_UCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_dd); +} + /* *** SVE Memory - 32-bit Gather and Unsized Contiguous Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 606c4f623c..3abdb87cf5 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -133,6 +133,9 @@ @rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz @rd_pg4_pn ........ esz:2 ... ... .. pg:4 . rn:4 rd:5 &rpr_esz +# One register operand, with governing predicate, no vector element size +@rd_pg_rn_e0 ........ .. ... ... ... pg:3 rn:5 rd:5 &rpr_esz esz=0 + # Two register operands with a 6-bit signed immediate. @rd_rn_i6 ........ ... rn:5 ..... imm:s6 rd:5 &rri @@ -681,6 +684,25 @@ FTSMUL 01100101 .. 0 ..... 000 011 ..... ..... @rd_rn_rm FRECPS 01100101 .. 0 ..... 000 110 ..... ..... @rd_rn_rm FRSQRTS 01100101 .. 0 ..... 000 111 ..... ..... @rd_rn_rm +### SVE FP Unary Operations Predicated Group + +# SVE integer convert to floating-point +SCVTF_hh 01100101 01 010 01 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_sh 01100101 01 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_dh 01100101 01 010 11 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_ss 01100101 10 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_sd 01100101 11 010 00 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_ds 01100101 11 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 +SCVTF_dd 01100101 11 010 11 0 101 ... ..... ..... @rd_pg_rn_e0 + +UCVTF_hh 01100101 01 010 01 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_sh 01100101 01 010 10 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_dh 01100101 01 010 11 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_ss 01100101 10 010 10 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_sd 01100101 11 010 00 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_ds 01100101 11 010 10 1 101 ... ..... ..... @rd_pg_rn_e0 +UCVTF_dd 01100101 11 010 11 1 101 ... ..... ..... @rd_pg_rn_e0 + ### SVE Memory - 32-bit Gather and Unsized Contiguous Group # SVE load predicate register From patchwork Wed Jun 27 04:32:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935267 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="j4HrwqLP"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqtm4cwGz9s0w for ; Wed, 27 Jun 2018 14:39:12 +1000 (AEST) Received: from localhost ([::1]:56503 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2EX-00084K-Gy for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:39:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60197) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29I-00047r-Tr for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29F-0000BG-Ad for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:44 -0400 Received: from mail-pg0-x22a.google.com ([2607:f8b0:400e:c05::22a]:39001) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29E-00009t-Un for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:41 -0400 Received: by mail-pg0-x22a.google.com with SMTP id n2-v6so362136pgq.6 for ; Tue, 26 Jun 2018 21:33:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=S1V2am/qIvokLBP9WK4jWvopleanf//7wXvzpKLZj44=; b=j4HrwqLPSMc0eaaPJMnLNeWUpzUnPW5Me3DuOiAvWg349qRXNTuhNCEsk8YVTSaFDq FXYFngeYDxWQVYGGPcps/wNZeKOtI7Ma0k++BnEG+IJu2l2zVk8zEEXiTfhKVhvukVcR xVxWjinPhKNpYwNJDidVkhCWxDWi2Xu96Me5A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=S1V2am/qIvokLBP9WK4jWvopleanf//7wXvzpKLZj44=; b=C+aaLiCx8Chxhm3vE98PqEbXUXxitIk7DxkcCMqfpG/eZMEcCs47qudvQe85R9pR6v bl4g6JkbKWMZJ63BoiWiCQbjRvcLhZmrw8gVdHWclp5gABQKqy96rZF1J35jku99EkDt A9K/mYz9+E2oyIntKGZoC1uVVEjRTBVAwK/QhMuioh6viCDllsKUGW/aE1ZfIEeFtr86 U6jULXlOKk7SQ+FHgodWe9nxkX2PVZIMJQ9GYVZT1GaTHcanNjAfZJUy71Be+13Bah31 iKM0/mtfoMa9wLgPdeQt0hlRyDdk6iNo8dqzg7iFbMmkBoLH3PmcXzbBlgkdppl4GC+8 6vCA== X-Gm-Message-State: APt69E1b7Cq+Y4foyYbjiUykNqYcSDEbeJYVKBHoLqNMnWeg7whLp9rm 329EHrpojHeHIrkxUNRVAAz4uhuvUgU= X-Google-Smtp-Source: ADUXVKJRbq6MPN1G3gRLGLAgBjPdGHjbf0rNzZdHvTQHqGN8FmEb0/ePf3vbFLGiInUgzTkWBH44PA== X-Received: by 2002:a63:82c7:: with SMTP id w190-v6mr3745245pgd.253.1530074019611; Tue, 26 Jun 2018 21:33:39 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:38 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:32:59 -0700 Message-Id: <20180627043328.11531-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22a Subject: [Qemu-devel] [PATCH v6 06/35] target/arm: Implement SVE floating-point arithmetic (predicated) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 77 +++++++++++++++++++++++++++++++++ target/arm/sve_helper.c | 89 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 46 ++++++++++++++++++++ target/arm/sve.decode | 17 ++++++++ 4 files changed, 229 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 185112e1d2..4097b55f0e 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -720,6 +720,83 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fsub_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsub_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsub_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmul_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmul_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmul_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fdiv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fdiv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fdiv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmin_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmin_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmin_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmax_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fminnum_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmaxnum_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxnum_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxnum_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fabd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fscalbn_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmulx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmulx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmulx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 031bec22df..3401662397 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2811,6 +2811,95 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } +/* Fully general three-operand expander, controlled by a predicate, + * With the extra float_status parameter. + */ +#define DO_ZPZZ_FP(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, \ + void *status, uint32_t desc) \ +{ \ + intptr_t i = simd_oprsz(desc); \ + uint64_t *g = vg; \ + do { \ + uint64_t pg = g[(i - 1) >> 6]; \ + do { \ + i -= sizeof(TYPE); \ + if (likely((pg >> (i & 63)) & 1)) { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + TYPE mm = *(TYPE *)(vm + H(i)); \ + *(TYPE *)(vd + H(i)) = OP(nn, mm, status); \ + } \ + } while (i & 63); \ + } while (i != 0); \ +} + +DO_ZPZZ_FP(sve_fadd_h, uint16_t, H1_2, float16_add) +DO_ZPZZ_FP(sve_fadd_s, uint32_t, H1_4, float32_add) +DO_ZPZZ_FP(sve_fadd_d, uint64_t, , float64_add) + +DO_ZPZZ_FP(sve_fsub_h, uint16_t, H1_2, float16_sub) +DO_ZPZZ_FP(sve_fsub_s, uint32_t, H1_4, float32_sub) +DO_ZPZZ_FP(sve_fsub_d, uint64_t, , float64_sub) + +DO_ZPZZ_FP(sve_fmul_h, uint16_t, H1_2, float16_mul) +DO_ZPZZ_FP(sve_fmul_s, uint32_t, H1_4, float32_mul) +DO_ZPZZ_FP(sve_fmul_d, uint64_t, , float64_mul) + +DO_ZPZZ_FP(sve_fdiv_h, uint16_t, H1_2, float16_div) +DO_ZPZZ_FP(sve_fdiv_s, uint32_t, H1_4, float32_div) +DO_ZPZZ_FP(sve_fdiv_d, uint64_t, , float64_div) + +DO_ZPZZ_FP(sve_fmin_h, uint16_t, H1_2, float16_min) +DO_ZPZZ_FP(sve_fmin_s, uint32_t, H1_4, float32_min) +DO_ZPZZ_FP(sve_fmin_d, uint64_t, , float64_min) + +DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max) +DO_ZPZZ_FP(sve_fmax_s, uint32_t, H1_4, float32_max) +DO_ZPZZ_FP(sve_fmax_d, uint64_t, , float64_max) + +DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum) +DO_ZPZZ_FP(sve_fminnum_s, uint32_t, H1_4, float32_minnum) +DO_ZPZZ_FP(sve_fminnum_d, uint64_t, , float64_minnum) + +DO_ZPZZ_FP(sve_fmaxnum_h, uint16_t, H1_2, float16_maxnum) +DO_ZPZZ_FP(sve_fmaxnum_s, uint32_t, H1_4, float32_maxnum) +DO_ZPZZ_FP(sve_fmaxnum_d, uint64_t, , float64_maxnum) + +static inline float16 abd_h(float16 a, float16 b, float_status *s) +{ + return float16_abs(float16_sub(a, b, s)); +} + +static inline float32 abd_s(float32 a, float32 b, float_status *s) +{ + return float32_abs(float32_sub(a, b, s)); +} + +static inline float64 abd_d(float64 a, float64 b, float_status *s) +{ + return float64_abs(float64_sub(a, b, s)); +} + +DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h) +DO_ZPZZ_FP(sve_fabd_s, uint32_t, H1_4, abd_s) +DO_ZPZZ_FP(sve_fabd_d, uint64_t, , abd_d) + +static inline float64 scalbn_d(float64 a, int64_t b, float_status *s) +{ + int b_int = MIN(MAX(b, INT_MIN), INT_MAX); + return float64_scalbn(a, b_int, s); +} + +DO_ZPZZ_FP(sve_fscalbn_h, int16_t, H1_2, float16_scalbn) +DO_ZPZZ_FP(sve_fscalbn_s, int32_t, H1_4, float32_scalbn) +DO_ZPZZ_FP(sve_fscalbn_d, int64_t, , scalbn_d) + +DO_ZPZZ_FP(sve_fmulx_h, uint16_t, H1_2, helper_advsimd_mulxh) +DO_ZPZZ_FP(sve_fmulx_s, uint32_t, H1_4, helper_vfp_mulxs) +DO_ZPZZ_FP(sve_fmulx_d, uint64_t, , helper_vfp_mulxd) + +#undef DO_ZPZZ_FP + /* Fully general two-operand expander, controlled by a predicate, * With the extra float_status parameter. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7639e589f5..4df5360da9 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3425,6 +3425,52 @@ DO_FP3(FRSQRTS, rsqrts) #undef DO_FP3 +/* + *** SVE Floating Point Arithmetic - Predicated Group + */ + +static bool do_zpzz_fp(DisasContext *s, arg_rprr_esz *a, + gen_helper_gvec_4_ptr *fn) +{ + if (fn == NULL) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + pred_full_reg_offset(s, a->pg), + status, vsz, vsz, 0, fn); + tcg_temp_free_ptr(status); + } + return true; +} + +#define DO_FP3(NAME, name) \ +static bool trans_##NAME(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \ +{ \ + static gen_helper_gvec_4_ptr * const fns[4] = { \ + NULL, gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, gen_helper_sve_##name##_d \ + }; \ + return do_zpzz_fp(s, a, fns[a->esz]); \ +} + +DO_FP3(FADD_zpzz, fadd) +DO_FP3(FSUB_zpzz, fsub) +DO_FP3(FMUL_zpzz, fmul) +DO_FP3(FMIN_zpzz, fmin) +DO_FP3(FMAX_zpzz, fmax) +DO_FP3(FMINNM_zpzz, fminnum) +DO_FP3(FMAXNM_zpzz, fmaxnum) +DO_FP3(FABD, fabd) +DO_FP3(FSCALE, fscalbn) +DO_FP3(FDIV, fdiv) +DO_FP3(FMULX, fmulx) + +#undef DO_FP3 /* *** SVE Floating Point Unary Operations Prediated Group diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 3abdb87cf5..636212a638 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -684,6 +684,23 @@ FTSMUL 01100101 .. 0 ..... 000 011 ..... ..... @rd_rn_rm FRECPS 01100101 .. 0 ..... 000 110 ..... ..... @rd_rn_rm FRSQRTS 01100101 .. 0 ..... 000 111 ..... ..... @rd_rn_rm +### SVE FP Arithmetic Predicated Group + +# SVE floating-point arithmetic (predicated) +FADD_zpzz 01100101 .. 00 0000 100 ... ..... ..... @rdn_pg_rm +FSUB_zpzz 01100101 .. 00 0001 100 ... ..... ..... @rdn_pg_rm +FMUL_zpzz 01100101 .. 00 0010 100 ... ..... ..... @rdn_pg_rm +FSUB_zpzz 01100101 .. 00 0011 100 ... ..... ..... @rdm_pg_rn # FSUBR +FMAXNM_zpzz 01100101 .. 00 0100 100 ... ..... ..... @rdn_pg_rm +FMINNM_zpzz 01100101 .. 00 0101 100 ... ..... ..... @rdn_pg_rm +FMAX_zpzz 01100101 .. 00 0110 100 ... ..... ..... @rdn_pg_rm +FMIN_zpzz 01100101 .. 00 0111 100 ... ..... ..... @rdn_pg_rm +FABD 01100101 .. 00 1000 100 ... ..... ..... @rdn_pg_rm +FSCALE 01100101 .. 00 1001 100 ... ..... ..... @rdn_pg_rm +FMULX 01100101 .. 00 1010 100 ... ..... ..... @rdn_pg_rm +FDIV 01100101 .. 00 1100 100 ... ..... ..... @rdm_pg_rn # FDIVR +FDIV 01100101 .. 00 1101 100 ... ..... ..... @rdn_pg_rm + ### SVE FP Unary Operations Predicated Group # SVE integer convert to floating-point From patchwork Wed Jun 27 04:33:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935263 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="dsQ1t9Po"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FqpN1jfLz9s1B for ; Wed, 27 Jun 2018 14:35:24 +1000 (AEST) Received: from localhost ([::1]:56479 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Ar-000550-PQ for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:35:21 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60244) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29K-00048P-Th for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29G-0000EJ-RU for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:46 -0400 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]:38912) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29G-0000Co-Gj for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:42 -0400 Received: by mail-pf0-x242.google.com with SMTP id s21-v6so384551pfm.6 for ; Tue, 26 Jun 2018 21:33:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=mZm2gkWkdnvvX+gOmWHDGixvgU5Aow6CSssLEBQ63CA=; b=dsQ1t9PoC5RLxZ0pbhkDiFKzixn7w+9nc+ibC6UCfY+1TSJr6uqVTPAf/fnIcsV7nt e+UuOxgsgsr8t3pcFNkc6NPckcFziT5+B9ucQZ28JDcBKUbIy+VISSrgtpgzLQs108Ga cYFZyXFiKN/n3+SdG0CD65CwdwajYyLxPg+6Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=mZm2gkWkdnvvX+gOmWHDGixvgU5Aow6CSssLEBQ63CA=; b=tSERyx7Pf9T29mDRM7KdvS27dVHndPONLfvNbq3JpAFTpaueC6iudP+doajnUnpCOd 5MFBz3vORYYHIStbtuCMJo+4cexnDeADC8xFl/2kEE/6ixkY0x4wtnxccgwdTl2QlMf9 IGCRHIniNbsnvO2wVEOdJ+swMZSlRouWdavt2srk0RqlAwYUPwuG9qU7AtK2YGaK+voN hnhnQOUZFLoJqZHfbyp3ikuwxBBEOIsd0N+FQLVOMy2XO65T19KUKoPAUK2aCMduAi9a ZUZi4hsKajR7dmbk8v2dQ5lOIHUhymWoJi9eLOuBmHf5nRFl+dybZ5Zsy52tDotZ12Z/ CDeg== X-Gm-Message-State: APt69E1m5BjrWg+E6pJMdWjU9yXJ72HBEcDHQbKqxgPHFN/S0cNkrj0P e/1hDfrCuPJzoltCFXreBeBy4DrVj6Q= X-Google-Smtp-Source: AAOMgpeT9+OMbLV+wHhKRAd/WYTC3egleGq80VY4YxmQZW7piy82O54ilMwx4s+Ug+9BAZvPZ/FxLg== X-Received: by 2002:a62:569c:: with SMTP id h28-v6mr53573pfj.201.1530074021129; Tue, 26 Jun 2018 21:33:41 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.39 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:40 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:00 -0700 Message-Id: <20180627043328.11531-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::242 Subject: [Qemu-devel] [PATCH v6 07/35] target/arm: Implement SVE FP Multiply-Add Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée --- v6: Add some decode commentary. --- target/arm/helper-sve.h | 16 ++++ target/arm/sve_helper.c | 158 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 49 ++++++++++++ target/arm/sve.decode | 18 +++++ 4 files changed, 241 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 4097b55f0e..eb0645dd43 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -827,6 +827,22 @@ DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) + +DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) + +DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) + +DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 3401662397..2f416e5e28 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2938,6 +2938,164 @@ DO_ZPZ_FP(sve_ucvt_dd, uint64_t, , uint64_to_float64) #undef DO_ZPZ_FP +/* 4-operand predicated multiply-add. This requires 7 operands to pass + * "properly", so we need to encode some of the registers into DESC. + */ +QEMU_BUILD_BUG_ON(SIMD_DATA_SHIFT + 20 > 32); + +static void do_fmla_zpzzz_h(CPUARMState *env, void *vg, uint32_t desc, + uint16_t neg1, uint16_t neg3) +{ + intptr_t i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + i -= 2; + if (likely((pg >> (i & 63)) & 1)) { + float16 e1, e2, e3, r; + + e1 = *(uint16_t *)(vn + H1_2(i)) ^ neg1; + e2 = *(uint16_t *)(vm + H1_2(i)); + e3 = *(uint16_t *)(va + H1_2(i)) ^ neg3; + r = float16_muladd(e1, e2, e3, 0, &env->vfp.fp_status); + *(uint16_t *)(vd + H1_2(i)) = r; + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fmla_zpzzz_h)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_h(env, vg, desc, 0, 0); +} + +void HELPER(sve_fmls_zpzzz_h)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_h(env, vg, desc, 0x8000, 0); +} + +void HELPER(sve_fnmla_zpzzz_h)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_h(env, vg, desc, 0x8000, 0x8000); +} + +void HELPER(sve_fnmls_zpzzz_h)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_h(env, vg, desc, 0, 0x8000); +} + +static void do_fmla_zpzzz_s(CPUARMState *env, void *vg, uint32_t desc, + uint32_t neg1, uint32_t neg3) +{ + intptr_t i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + i -= 4; + if (likely((pg >> (i & 63)) & 1)) { + float32 e1, e2, e3, r; + + e1 = *(uint32_t *)(vn + H1_4(i)) ^ neg1; + e2 = *(uint32_t *)(vm + H1_4(i)); + e3 = *(uint32_t *)(va + H1_4(i)) ^ neg3; + r = float32_muladd(e1, e2, e3, 0, &env->vfp.fp_status); + *(uint32_t *)(vd + H1_4(i)) = r; + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fmla_zpzzz_s)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_s(env, vg, desc, 0, 0); +} + +void HELPER(sve_fmls_zpzzz_s)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_s(env, vg, desc, 0x80000000, 0); +} + +void HELPER(sve_fnmla_zpzzz_s)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_s(env, vg, desc, 0x80000000, 0x80000000); +} + +void HELPER(sve_fnmls_zpzzz_s)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_s(env, vg, desc, 0, 0x80000000); +} + +static void do_fmla_zpzzz_d(CPUARMState *env, void *vg, uint32_t desc, + uint64_t neg1, uint64_t neg3) +{ + intptr_t i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + i -= 8; + if (likely((pg >> (i & 63)) & 1)) { + float64 e1, e2, e3, r; + + e1 = *(uint64_t *)(vn + i) ^ neg1; + e2 = *(uint64_t *)(vm + i); + e3 = *(uint64_t *)(va + i) ^ neg3; + r = float64_muladd(e1, e2, e3, 0, &env->vfp.fp_status); + *(uint64_t *)(vd + i) = r; + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_d(env, vg, desc, 0, 0); +} + +void HELPER(sve_fmls_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_d(env, vg, desc, INT64_MIN, 0); +} + +void HELPER(sve_fnmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_d(env, vg, desc, INT64_MIN, INT64_MIN); +} + +void HELPER(sve_fnmls_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) +{ + do_fmla_zpzzz_d(env, vg, desc, 0, INT64_MIN); +} + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4df5360da9..acad6374ef 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3472,6 +3472,55 @@ DO_FP3(FMULX, fmulx) #undef DO_FP3 +typedef void gen_helper_sve_fmla(TCGv_env, TCGv_ptr, TCGv_i32); + +static bool do_fmla(DisasContext *s, arg_rprrr_esz *a, gen_helper_sve_fmla *fn) +{ + if (fn == NULL) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + unsigned vsz = vec_full_reg_size(s); + unsigned desc; + TCGv_i32 t_desc; + TCGv_ptr pg = tcg_temp_new_ptr(); + + /* We would need 7 operands to pass these arguments "properly". + * So we encode all the register numbers into the descriptor. + */ + desc = deposit32(a->rd, 5, 5, a->rn); + desc = deposit32(desc, 10, 5, a->rm); + desc = deposit32(desc, 15, 5, a->ra); + desc = simd_desc(vsz, vsz, desc); + + t_desc = tcg_const_i32(desc); + tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg)); + fn(cpu_env, pg, t_desc); + tcg_temp_free_i32(t_desc); + tcg_temp_free_ptr(pg); + return true; +} + +#define DO_FMLA(NAME, name) \ +static bool trans_##NAME(DisasContext *s, arg_rprrr_esz *a, uint32_t insn) \ +{ \ + static gen_helper_sve_fmla * const fns[4] = { \ + NULL, gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, gen_helper_sve_##name##_d \ + }; \ + return do_fmla(s, a, fns[a->esz]); \ +} + +DO_FMLA(FMLA_zpzzz, fmla_zpzzz) +DO_FMLA(FMLS_zpzzz, fmls_zpzzz) +DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz) +DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz) + +#undef DO_FMLA + /* *** SVE Floating Point Unary Operations Prediated Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 636212a638..e8531e28cd 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -128,6 +128,8 @@ &rprrr_esz ra=%reg_movprfx @rdn_pg_ra_rm ........ esz:2 . rm:5 ... pg:3 ra:5 rd:5 \ &rprrr_esz rn=%reg_movprfx +@rdn_pg_rm_ra ........ esz:2 . ra:5 ... pg:3 rm:5 rd:5 \ + &rprrr_esz rn=%reg_movprfx # One register operand, with governing predicate, vector element size @rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz @@ -701,6 +703,22 @@ FMULX 01100101 .. 00 1010 100 ... ..... ..... @rdn_pg_rm FDIV 01100101 .. 00 1100 100 ... ..... ..... @rdm_pg_rn # FDIVR FDIV 01100101 .. 00 1101 100 ... ..... ..... @rdn_pg_rm +### SVE FP Multiply-Add Group + +# SVE floating-point multiply-accumulate writing addend +FMLA_zpzzz 01100101 .. 1 ..... 000 ... ..... ..... @rda_pg_rn_rm +FMLS_zpzzz 01100101 .. 1 ..... 001 ... ..... ..... @rda_pg_rn_rm +FNMLA_zpzzz 01100101 .. 1 ..... 010 ... ..... ..... @rda_pg_rn_rm +FNMLS_zpzzz 01100101 .. 1 ..... 011 ... ..... ..... @rda_pg_rn_rm + +# SVE floating-point multiply-accumulate writing multiplicand +# Alter the operand extraction order and reuse the helpers from above. +# FMAD, FMSB, FNMAD, FNMS +FMLA_zpzzz 01100101 .. 1 ..... 100 ... ..... ..... @rdn_pg_rm_ra +FMLS_zpzzz 01100101 .. 1 ..... 101 ... ..... ..... @rdn_pg_rm_ra +FNMLA_zpzzz 01100101 .. 1 ..... 110 ... ..... ..... @rdn_pg_rm_ra +FNMLS_zpzzz 01100101 .. 1 ..... 111 ... ..... ..... @rdn_pg_rm_ra + ### SVE FP Unary Operations Predicated Group # SVE integer convert to floating-point From patchwork Wed Jun 27 04:33:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935278 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="JIOKqtY0"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr0R2R4xz9s0w for ; Wed, 27 Jun 2018 14:44:07 +1000 (AEST) Received: from localhost ([::1]:56539 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2JI-0003sc-Me for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:44:04 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60269) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29L-00048p-7I for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29I-0000Gm-3O for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:47 -0400 Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:32906) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29H-0000FA-Qg for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:43 -0400 Received: by mail-pg0-x236.google.com with SMTP id e11-v6so369956pgq.0 for ; Tue, 26 Jun 2018 21:33:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iG8HT096URRb4CWAiRTwQ1cVUSLxVdPPA37onBhzXoI=; b=JIOKqtY0fKjngKNtOnlNhv3HvytEvRnFMUiFqNTwwkTmTNbQjz6LQEopDhUjljOdqQ vTlJTwrQVwyFMNdyd0eRza2CYn2mi/5Aut8UMz/tkd4Wm0dAhiMxkr5CNeU0KUGDi7wR qw/bsn3EIjk7zCXyZuTO84Xj0VsrUIESOY1/g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iG8HT096URRb4CWAiRTwQ1cVUSLxVdPPA37onBhzXoI=; b=sRQc1uKaZs2Ax7+Q1DTanFNhznrWaLyRMx4zO2l3bqUcbbU+96SxndZfIiCDd63C1e By8VjvptEBit1UkLmo0C3K2888GDwqLuJGSRFt70+CBJP4Hn3sW8KZm0nKTClUZi5yH6 bhRWp50CWbrwTQR0gV/27aWetl0yVsxcrbzuRrMQv7/j7C6QKV0SUfCgcZfZ7OJ2qa9t XHi2yF/fdn7mS5KZmEXvQm3dAsmMlwCQfEjpxH/QW6hmvhKwuSJqcOoTcVybChdyxbiG VhS2HgjsV158pLlE5+kukCUvy0Bt9d7AoYkp9k/2ct34194vDlINFdKCc3PCz8aGyeod rfow== X-Gm-Message-State: APt69E2GRTaoE9In5YpCP7w3t9PqQOecNRJU7ZtEFe0RV6JrH6BPDU1C pIKgUxWp1O0qUlPDvdlFZ6wNhTKy2WY= X-Google-Smtp-Source: AAOMgpcTHi4hvBYBUc2nqzK/lvdeYlmSIDOhe6Tv0ilBf9kgs+0SPKy1+ph7UdmbrFjMPM7K99pQ8g== X-Received: by 2002:a62:f705:: with SMTP id h5-v6mr4290259pfi.169.1530074022520; Tue, 26 Jun 2018 21:33:42 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:41 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:01 -0700 Message-Id: <20180627043328.11531-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::236 Subject: [Qemu-devel] [PATCH v6 08/35] target/arm: Implement SVE Floating Point Accumulating Reduction Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 7 +++++ target/arm/sve_helper.c | 56 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 45 ++++++++++++++++++++++++++++++ target/arm/sve.decode | 5 ++++ 4 files changed, 113 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index eb0645dd43..68e55a8d03 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -720,6 +720,13 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG, + i64, i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, + i64, i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fadda_d, TCG_CALL_NO_RWG, + i64, i64, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 2f416e5e28..2d08b7dcd3 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2811,6 +2811,62 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } +uint64_t HELPER(sve_fadda_h)(uint64_t nn, void *vm, void *vg, + void *status, uint32_t desc) +{ + intptr_t i = 0, opr_sz = simd_oprsz(desc); + float16 result = nn; + + do { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + float16 mm = *(float16 *)(vm + H1_2(i)); + result = float16_add(result, mm, status); + } + i += sizeof(float16), pg >>= sizeof(float16); + } while (i & 15); + } while (i < opr_sz); + + return result; +} + +uint64_t HELPER(sve_fadda_s)(uint64_t nn, void *vm, void *vg, + void *status, uint32_t desc) +{ + intptr_t i = 0, opr_sz = simd_oprsz(desc); + float32 result = nn; + + do { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + float32 mm = *(float32 *)(vm + H1_2(i)); + result = float32_add(result, mm, status); + } + i += sizeof(float32), pg >>= sizeof(float32); + } while (i & 15); + } while (i < opr_sz); + + return result; +} + +uint64_t HELPER(sve_fadda_d)(uint64_t nn, void *vm, void *vg, + void *status, uint32_t desc) +{ + intptr_t i = 0, opr_sz = simd_oprsz(desc) / 8; + uint64_t *m = vm; + uint8_t *pg = vg; + + for (i = 0; i < opr_sz; i++) { + if (pg[H1(i)] & 1) { + nn = float64_add(nn, m[i], status); + } + } + + return nn; +} + /* Fully general three-operand expander, controlled by a predicate, * With the extra float_status parameter. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index acad6374ef..483ad33179 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3383,6 +3383,51 @@ DO_ZZI(UMIN, umin) #undef DO_ZZI +/* + *** SVE Floating Point Accumulating Reduction Group + */ + +static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a, uint32_t insn) +{ + typedef void fadda_fn(TCGv_i64, TCGv_i64, TCGv_ptr, + TCGv_ptr, TCGv_ptr, TCGv_i32); + static fadda_fn * const fns[3] = { + gen_helper_sve_fadda_h, + gen_helper_sve_fadda_s, + gen_helper_sve_fadda_d, + }; + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr t_rm, t_pg, t_fpst; + TCGv_i64 t_val; + TCGv_i32 t_desc; + + if (a->esz == 0) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + t_val = load_esz(cpu_env, vec_reg_offset(s, a->rn, 0, a->esz), a->esz); + t_rm = tcg_temp_new_ptr(); + t_pg = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_rm, cpu_env, vec_full_reg_offset(s, a->rm)); + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg)); + t_fpst = get_fpstatus_ptr(a->esz == MO_16); + t_desc = tcg_const_i32(simd_desc(vsz, vsz, 0)); + + fns[a->esz - 1](t_val, t_val, t_rm, t_pg, t_fpst, t_desc); + + tcg_temp_free_i32(t_desc); + tcg_temp_free_ptr(t_fpst); + tcg_temp_free_ptr(t_pg); + tcg_temp_free_ptr(t_rm); + + write_fp_dreg(s, a->rd, t_val); + tcg_temp_free_i64(t_val); + return true; +} + /* *** SVE Floating Point Arithmetic - Unpredicated Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e8531e28cd..675b81aaa0 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -676,6 +676,11 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u # SVE integer multiply immediate (unpredicated) MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s +### SVE FP Accumulating Reduction Group + +# SVE floating-point serial reduction (predicated) +FADDA 01100101 .. 011 000 001 ... ..... ..... @rdn_pg_rm + ### SVE Floating Point Arithmetic - Unpredicated Group # SVE floating-point arithmetic (unpredicated) From patchwork Wed Jun 27 04:33:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935266 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="iM8wVUdJ"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqt540lTz9s0w for ; Wed, 27 Jun 2018 14:38:37 +1000 (AEST) Received: from localhost ([::1]:56500 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Dz-0007e3-6L for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:38:35 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60298) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29M-00049p-6R for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29J-0000Ke-Ml for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:48 -0400 Received: from mail-pf0-x22f.google.com ([2607:f8b0:400e:c00::22f]:46870) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29J-0000IM-94 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:45 -0400 Received: by mail-pf0-x22f.google.com with SMTP id q1-v6so379241pff.13 for ; Tue, 26 Jun 2018 21:33:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Z1YLZRLEbj7aorz8UjYdcjeWqwKLgRDQNDaDKpgC6UY=; b=iM8wVUdJnnf+peyqEnLTOTvNqikAjaigTyfg6IevPRdn0/rCpctrSgIXNqdzzBjYM1 l0J84ZRsgbQ4VoQ2KCHJ+9FbQ/nUhmbbcIZiJqvWQyQq/68zXu/9cIW4TKdNMpcRiK/u USv86+x4xqagAzQtavDkX2Yr+1Ez55aSV4zGs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Z1YLZRLEbj7aorz8UjYdcjeWqwKLgRDQNDaDKpgC6UY=; b=NjRwu6jhDhHmcUCH6HyuEaTWZGEimMWxCP3ta9uN2a1doAmU27MR/PyTMSrMgyohl7 ZTVsnsSEM2te3W1tlW/tHv7OyPLPOvBLTRb1FgFV6Y/3byRmfcvGFo1DRkCVIQhIVOyU dKke9xHqTGhlJUxVeayHg+EXaxLTHe/q9+lFUFmPF56RA0zzhrLreVTKDFAGDHuSdnIu 3y29ozQTSHR/KgpKIhBtuqh6rRkovfx7ghkru0coXvfW4evrh+mD316jRhXJSwuj6y9S jLyEoC8uL8VPcAA3gUOHknnEvMq9L6d6d3qwGkgvsXD8JURwBY4XDQbwOyn6iNhE9bg/ Wscw== X-Gm-Message-State: APt69E1IWZXbE61OAd/68ytNhhZTjJijaqpCFBi5M5ldgzFVUMbFLxnO 86kygdJfmE++8dX3pYB5brg7P7XASis= X-Google-Smtp-Source: ADUXVKIFxu6EnRC7mTkskWdYBlHIrU1mECeAS+5vX7eVg6h2HnZq1tCn2YHjodKd9Xz9E9Nao9U1AA== X-Received: by 2002:a63:3190:: with SMTP id x138-v6mr3761884pgx.60.1530074023945; Tue, 26 Jun 2018 21:33:43 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:43 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:02 -0700 Message-Id: <20180627043328.11531-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22f Subject: [Qemu-devel] [PATCH v6 09/35] target/arm: Implement SVE load and broadcast element X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Fix typo in comment. --- target/arm/helper-sve.h | 5 +++ target/arm/sve_helper.c | 41 +++++++++++++++++++++++++ target/arm/translate-sve.c | 62 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 5 +++ 4 files changed, 113 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 68e55a8d03..a5d3bb121c 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -274,6 +274,11 @@ DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_movz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_movz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_movz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_movz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_asr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_asr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_asr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 2d08b7dcd3..93f2942590 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -995,6 +995,47 @@ void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc) } } +/* Copy Zn into Zd, and store zero into inactive elements. */ +void HELPER(sve_movz_b)(void *vd, void *vn, void *vg, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc) / 8; + uint64_t *d = vd, *n = vn; + uint8_t *pg = vg; + for (i = 0; i < opr_sz; i += 1) { + d[i] = n[i] & expand_pred_b(pg[H1(i)]); + } +} + +void HELPER(sve_movz_h)(void *vd, void *vn, void *vg, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc) / 8; + uint64_t *d = vd, *n = vn; + uint8_t *pg = vg; + for (i = 0; i < opr_sz; i += 1) { + d[i] = n[i] & expand_pred_h(pg[H1(i)]); + } +} + +void HELPER(sve_movz_s)(void *vd, void *vn, void *vg, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc) / 8; + uint64_t *d = vd, *n = vn; + uint8_t *pg = vg; + for (i = 0; i < opr_sz; i += 1) { + d[i] = n[i] & expand_pred_s(pg[H1(i)]); + } +} + +void HELPER(sve_movz_d)(void *vd, void *vn, void *vg, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc) / 8; + uint64_t *d = vd, *n = vn; + uint8_t *pg = vg; + for (i = 0; i < opr_sz; i += 1) { + d[i] = n[1] & -(uint64_t)(pg[H1(i)] & 1); + } +} + /* Three-operand expander, immediate operand, controlled by a predicate. */ #define DO_ZPZI(NAME, TYPE, H, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 483ad33179..954d6653d3 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -606,6 +606,20 @@ static bool do_clr_zp(DisasContext *s, int rd, int pg, int esz) return true; } +/* Copy Zn into Zd, storing zeros into inactive elements. */ +static void do_movz_zpz(DisasContext *s, int rd, int rn, int pg, int esz) +{ + static gen_helper_gvec_3 * const fns[4] = { + gen_helper_sve_movz_b, gen_helper_sve_movz_h, + gen_helper_sve_movz_s, gen_helper_sve_movz_d, + }; + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + pred_full_reg_offset(s, pg), + vsz, vsz, 0, fns[esz]); +} + static bool do_zpzi_ool(DisasContext *s, arg_rpri_esz *a, gen_helper_gvec_3 *fn) { @@ -3999,6 +4013,54 @@ static bool trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) return true; } +/* Load and broadcast element. */ +static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn) +{ + if (!sve_access_check(s)) { + return true; + } + + unsigned vsz = vec_full_reg_size(s); + unsigned psz = pred_full_reg_size(s); + unsigned esz = dtype_esz[a->dtype]; + TCGLabel *over = gen_new_label(); + TCGv_i64 temp; + + /* If the guarding predicate has no bits set, no load occurs. */ + if (psz <= 8) { + /* Reduce the pred_esz_masks value simply to reduce the + * size of the code generated here. + */ + uint64_t psz_mask = MAKE_64BIT_MASK(0, psz * 8); + temp = tcg_temp_new_i64(); + tcg_gen_ld_i64(temp, cpu_env, pred_full_reg_offset(s, a->pg)); + tcg_gen_andi_i64(temp, temp, pred_esz_masks[esz] & psz_mask); + tcg_gen_brcondi_i64(TCG_COND_EQ, temp, 0, over); + tcg_temp_free_i64(temp); + } else { + TCGv_i32 t32 = tcg_temp_new_i32(); + find_last_active(s, t32, esz, a->pg); + tcg_gen_brcondi_i32(TCG_COND_LT, t32, 0, over); + tcg_temp_free_i32(t32); + } + + /* Load the data. */ + temp = tcg_temp_new_i64(); + tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << esz); + tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s), + s->be_data | dtype_mop[a->dtype]); + + /* Broadcast to *all* elements. */ + tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd), + vsz, vsz, temp); + tcg_temp_free_i64(temp); + + /* Zero the inactive elements. */ + gen_set_label(over); + do_movz_zpz(s, a->rd, a->rd, a->pg, esz); + return true; +} + static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz, int esz, int nreg) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 675b81aaa0..765e7e479b 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -28,6 +28,7 @@ %imm8_16_10 16:5 10:3 %imm9_16_10 16:s6 10:3 %size_23 23:2 +%dtype_23_13 23:2 13:2 # A combination of tsz:imm3 -- extract esize. %tszimm_esz 22:2 5:5 !function=tszimm_esz @@ -751,6 +752,10 @@ LDR_pri 10000101 10 ...... 000 ... ..... 0 .... @pd_rn_i9 # SVE load vector register LDR_zri 10000101 10 ...... 010 ... ..... ..... @rd_rn_i9 +# SVE load and broadcast element +LD1R_zpri 1000010 .. 1 imm:6 1.. pg:3 rn:5 rd:5 \ + &rpri_load dtype=%dtype_23_13 nreg=0 + ### SVE Memory Contiguous Load Group # SVE contiguous load (scalar plus scalar) From patchwork Wed Jun 27 04:33:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935273 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="JJSLUwCO"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqwl6FLMz9s0w for ; Wed, 27 Jun 2018 14:40:55 +1000 (AEST) Received: from localhost ([::1]:56515 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2GD-0001Dj-I6 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:40:53 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60349) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29N-0004BP-RW for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29K-0000N7-VW for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:49 -0400 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:35736) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29K-0000LW-H7 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:46 -0400 Received: by mail-pg0-x241.google.com with SMTP id i7-v6so366597pgp.2 for ; Tue, 26 Jun 2018 21:33:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=praTdmP/ENPsluJAViKzy8vHypuywXyhhcVnFIAdZPE=; b=JJSLUwCOFoz+vZQxircoHKTfC+ODkoPWLJenHSlm1FA49/4lbhfypOB/0pl2dvAC/S Bjwo+1AoVZMSBRXMqfDCjDmlELQEtqyXv0DfydgitHBh5CtMPtnEAfEgoTVcum8QnG7x Mul4C04N1iY0VocFSmv6Ph9tQOWvyThWn8o6A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=praTdmP/ENPsluJAViKzy8vHypuywXyhhcVnFIAdZPE=; b=hicKDrlqz8v7DPHFjovh+I3662b4AWl8tiOiMwUn3c3nZB1N7uWd2X0QtYbkiJTgVY 9iOXpWrCBEqEXqAaNfKnWbqukK1hxiDmvuSMPJf03UUoWu90ESAbAwk3xHnvuJ4tkxO1 s52aS1Z8mftCHogzvQMpAVsTQGZpeATEPpfanZklWaMqciRDR2leSOuzITO8WguMU7i5 8RaCEd7sFn8KdGywlZAPHz/BkD3A86WW9BI+Zq7l7mhhjGkkeXiRgEDvbqb4ha3zw4TL H3V8VqT+lENvu1R8foLYI7ImYBUhqNAjmIrJm5caTEr+QpD/c2WMRYbUkcR3tdaeTiuS KLgQ== X-Gm-Message-State: APt69E0bYJwr0et59gjp7EifpZwrbxPa5eo16uFEcpkPXwiLTfqBFCu4 ZW5Yiy/A6yBpvZUq9FkaAhQwMhVS9Z8= X-Google-Smtp-Source: AAOMgpcoVe8gxIjgJ/yDvN/R1rQc7N6SqLaIdqz6AyUzetrWV9sWb0oQ6pqOw3GSjDy9Br77JA078Q== X-Received: by 2002:a62:3dc8:: with SMTP id x69-v6mr1254433pfj.182.1530074025203; Tue, 26 Jun 2018 21:33:45 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:44 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:03 -0700 Message-Id: <20180627043328.11531-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v6 10/35] target/arm: Implement SVE store vector/predicate register X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Fix shift of data in 6 byte store. --- target/arm/translate-sve.c | 103 +++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 6 +++ 2 files changed, 109 insertions(+) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 954d6653d3..4116fe9904 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3762,6 +3762,89 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len, tcg_temp_free_i64(t0); } +/* Similarly for stores. */ +static void do_str(DisasContext *s, uint32_t vofs, uint32_t len, + int rn, int imm) +{ + uint32_t len_align = QEMU_ALIGN_DOWN(len, 8); + uint32_t len_remain = len % 8; + uint32_t nparts = len / 8 + ctpop8(len_remain); + int midx = get_mem_index(s); + TCGv_i64 addr, t0; + + addr = tcg_temp_new_i64(); + t0 = tcg_temp_new_i64(); + + /* Note that unpredicated load/store of vector/predicate registers + * are defined as a stream of bytes, which equates to little-endian + * operations on larger quantities. There is no nice way to force + * a little-endian store for aarch64_be-linux-user out of line. + * + * Attempt to keep code expansion to a minimum by limiting the + * amount of unrolling done. + */ + if (nparts <= 4) { + int i; + + for (i = 0; i < len_align; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, vofs + i); + tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ); + } + } else { + TCGLabel *loop = gen_new_label(); + TCGv_ptr t2, i = tcg_const_local_ptr(0); + + gen_set_label(loop); + + t2 = tcg_temp_new_ptr(); + tcg_gen_add_ptr(t2, cpu_env, i); + tcg_gen_ld_i64(t0, t2, vofs); + + /* Minimize the number of local temps that must be re-read from + * the stack each iteration. Instead, re-compute values other + * than the loop counter. + */ + tcg_gen_addi_ptr(t2, i, imm); + tcg_gen_extu_ptr_i64(addr, t2); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn)); + tcg_temp_free_ptr(t2); + + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ); + + tcg_gen_addi_ptr(i, i, 8); + + tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + tcg_temp_free_ptr(i); + } + + /* Predicate register stores can be any multiple of 2. */ + if (len_remain) { + tcg_gen_ld_i64(t0, cpu_env, vofs + len_align); + tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align); + + switch (len_remain) { + case 2: + case 4: + case 8: + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LE | ctz32(len_remain)); + break; + + case 6: + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUL); + tcg_gen_addi_i64(addr, addr, 4); + tcg_gen_shri_i64(t0, t0, 32); + tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUW); + break; + + default: + g_assert_not_reached(); + } + } + tcg_temp_free_i64(addr); + tcg_temp_free_i64(t0); +} + static bool trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn) { if (sve_access_check(s)) { @@ -3782,6 +3865,26 @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn) return true; } +static bool trans_STR_zri(DisasContext *s, arg_rri *a, uint32_t insn) +{ + if (sve_access_check(s)) { + int size = vec_full_reg_size(s); + int off = vec_full_reg_offset(s, a->rd); + do_str(s, off, size, a->rn, a->imm * size); + } + return true; +} + +static bool trans_STR_pri(DisasContext *s, arg_rri *a, uint32_t insn) +{ + if (sve_access_check(s)) { + int size = pred_full_reg_size(s); + int off = pred_full_reg_offset(s, a->rd); + do_str(s, off, size, a->rn, a->imm * size); + } + return true; +} + /* *** SVE Memory - Contiguous Load Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 765e7e479b..6a76010f51 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -793,6 +793,12 @@ LD1RQ_zpri 1010010 .. 00 0.... 001 ... ..... ..... \ ### SVE Memory Store Group +# SVE store predicate register +STR_pri 1110010 11 0. ..... 000 ... ..... 0 .... @pd_rn_i9 + +# SVE store vector register +STR_zri 1110010 11 0. ..... 010 ... ..... ..... @rd_rn_i9 + # SVE contiguous store (scalar plus immediate) # ST1B, ST1H, ST1W, ST1D; require msz <= esz ST_zpri 1110010 .. esz:2 0.... 111 ... ..... ..... \ From patchwork Wed Jun 27 04:33:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935276 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="bJiQOBNo"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FqyM1WSLz9s0w for ; Wed, 27 Jun 2018 14:42:19 +1000 (AEST) Received: from localhost ([::1]:56528 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2HY-0002Ny-P5 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:42:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60430) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29Q-0004F0-7c for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29M-0000QC-Bv for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:52 -0400 Received: from mail-pf0-x234.google.com ([2607:f8b0:400e:c00::234]:40031) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29M-0000OU-03 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:48 -0400 Received: by mail-pf0-x234.google.com with SMTP id z24-v6so385426pfe.7 for ; Tue, 26 Jun 2018 21:33:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kgjnqdo1x1Jt+65NIX/VybxO/cc0Uu1aOhlAHWzNoyA=; b=bJiQOBNoAT1eCJMR2ZDpJPYaWb5hoNNE0ex12UvjRIr80uO1+Qbm1+E7u6JOUC8hf/ J1gpuOhMuoV9YLHwwpXSWvJMekBRvyvEg4sx6PTu3nJYtytbWzZ0PyarAWfg3VlhvM9y 8v+oOAbHJwoqNHnDF6XBg84IhSy6jtCTGvnQ8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kgjnqdo1x1Jt+65NIX/VybxO/cc0Uu1aOhlAHWzNoyA=; b=PLd4RTIf8ocEdB33ff9kvy66IJGJX1zpsDgcRsgBc67u1tSRtXFaPR3NZC7GxhBWx6 FE9nYgEYTDQdO/yQSOXIxl21O39l29wQW7eOA9M7sRMV0kleLMKbHXFH7JOLAcIOSPNa Lckx1JOiy+aIYpc7AlntXqI/MElpu4kx75n76CtMyzF6e9YVKE+sRJ2oZUv6vt+tAAAD 9C0y0WRbuMP5z5xU6a2a7njLFc1rkXSRsFPy44luhsreDO0hv8CAkSUQ0+epRxPAKX3p TpoCmVl33ngtMg+rlO2fJEw/FqKdNp3hbVPHsro216WU900dOsfXFynKcCbXKorX1/Hh yyPA== X-Gm-Message-State: APt69E3qP5ycn8ZzvuyUdDe4ydPsvvj+eDhskpmg6DAV5jcFFJofrNWW OHO1PfmdkxLAJfJz1TLhnWl9JOmH7og= X-Google-Smtp-Source: AAOMgpdnO3/3AyTv7Ns0Mfz2SLJj0aYQiA9Eig/4mI/0OWDSpp/B0XoZF5m/K5jHZpVEYNnXkml2Ag== X-Received: by 2002:aa7:818b:: with SMTP id g11-v6mr4297320pfi.50.1530074026605; Tue, 26 Jun 2018 21:33:46 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.45 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:45 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:04 -0700 Message-Id: <20180627043328.11531-12-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::234 Subject: [Qemu-devel] [PATCH v6 11/35] target/arm: Implement SVE scatter stores X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: * Rewrite to the usual two nested loops, * Add comment about XS=2. --- target/arm/helper-sve.h | 41 +++++++++++++++++++++ target/arm/sve_helper.c | 61 +++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 75 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 39 ++++++++++++++++++++ 4 files changed, 216 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index a5d3bb121c..8880128f9c 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -958,3 +958,44 @@ DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stss_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_stbs_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_sths_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stss_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_stbd_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_sthd_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stsd_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stdd_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_stbd_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_sthd_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stsd_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stdd_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_stbd_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_sthd_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stsd_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stdd_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 93f2942590..7622bb2af0 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3713,3 +3713,64 @@ void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg, addr += 4 * 8; } } + +/* Stores with a vector index. */ + +#define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + unsigned scale = simd_data(desc); \ + uintptr_t ra = GETPC(); \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + if (likely(pg & 1)) { \ + target_ulong off = *(TYPEI *)(vm + H1_4(i)); \ + uint32_t d = *(uint32_t *)(vd + H1_4(i)); \ + FN(env, base + (off << scale), d, ra); \ + } \ + i += sizeof(uint32_t), pg >>= sizeof(uint32_t); \ + } while (i & 15); \ + } \ +} + +#define DO_ST1_ZPZ_D(NAME, TYPEI, FN) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc) / 8; \ + unsigned scale = simd_data(desc); \ + uintptr_t ra = GETPC(); \ + uint64_t *d = vd, *m = vm; uint8_t *pg = vg; \ + for (i = 0; i < oprsz; i++) { \ + if (likely(pg[H1(i)] & 1)) { \ + target_ulong off = (target_ulong)(TYPEI)m[i] << scale; \ + FN(env, base + off, d[i], ra); \ + } \ + } \ +} + +DO_ST1_ZPZ_S(sve_stbs_zsu, uint32_t, cpu_stb_data_ra) +DO_ST1_ZPZ_S(sve_sths_zsu, uint32_t, cpu_stw_data_ra) +DO_ST1_ZPZ_S(sve_stss_zsu, uint32_t, cpu_stl_data_ra) + +DO_ST1_ZPZ_S(sve_stbs_zss, int32_t, cpu_stb_data_ra) +DO_ST1_ZPZ_S(sve_sths_zss, int32_t, cpu_stw_data_ra) +DO_ST1_ZPZ_S(sve_stss_zss, int32_t, cpu_stl_data_ra) + +DO_ST1_ZPZ_D(sve_stbd_zsu, uint32_t, cpu_stb_data_ra) +DO_ST1_ZPZ_D(sve_sthd_zsu, uint32_t, cpu_stw_data_ra) +DO_ST1_ZPZ_D(sve_stsd_zsu, uint32_t, cpu_stl_data_ra) +DO_ST1_ZPZ_D(sve_stdd_zsu, uint32_t, cpu_stq_data_ra) + +DO_ST1_ZPZ_D(sve_stbd_zss, int32_t, cpu_stb_data_ra) +DO_ST1_ZPZ_D(sve_sthd_zss, int32_t, cpu_stw_data_ra) +DO_ST1_ZPZ_D(sve_stsd_zss, int32_t, cpu_stl_data_ra) +DO_ST1_ZPZ_D(sve_stdd_zss, int32_t, cpu_stq_data_ra) + +DO_ST1_ZPZ_D(sve_stbd_zd, uint64_t, cpu_stb_data_ra) +DO_ST1_ZPZ_D(sve_sthd_zd, uint64_t, cpu_stw_data_ra) +DO_ST1_ZPZ_D(sve_stsd_zd, uint64_t, cpu_stl_data_ra) +DO_ST1_ZPZ_D(sve_stdd_zd, uint64_t, cpu_stq_data_ra) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4116fe9904..65da3e633f 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -43,6 +43,8 @@ typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); typedef void gen_helper_gvec_mem(TCGv_env, TCGv_ptr, TCGv_i64, TCGv_i32); +typedef void gen_helper_gvec_mem_scatter(TCGv_env, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i64, TCGv_i32); /* * Helpers for extracting complex instruction fields. @@ -4228,3 +4230,76 @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn) } return true; } + +/* + *** SVE gather loads / scatter stores + */ + +static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale, + TCGv_i64 scalar, gen_helper_gvec_mem_scatter *fn) +{ + unsigned vsz = vec_full_reg_size(s); + TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, scale)); + TCGv_ptr t_zm = tcg_temp_new_ptr(); + TCGv_ptr t_pg = tcg_temp_new_ptr(); + TCGv_ptr t_zt = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg)); + tcg_gen_addi_ptr(t_zm, cpu_env, vec_full_reg_offset(s, zm)); + tcg_gen_addi_ptr(t_zt, cpu_env, vec_full_reg_offset(s, zt)); + fn(cpu_env, t_zt, t_pg, t_zm, scalar, desc); + + tcg_temp_free_ptr(t_zt); + tcg_temp_free_ptr(t_zm); + tcg_temp_free_ptr(t_pg); + tcg_temp_free_i32(desc); +} + +static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) +{ + /* Indexed by [xs][msz]. */ + static gen_helper_gvec_mem_scatter * const fn32[2][3] = { + { gen_helper_sve_stbs_zsu, + gen_helper_sve_sths_zsu, + gen_helper_sve_stss_zsu, }, + { gen_helper_sve_stbs_zss, + gen_helper_sve_sths_zss, + gen_helper_sve_stss_zss, }, + }; + /* Note that we overload xs=2 to indicate 64-bit offset. */ + static gen_helper_gvec_mem_scatter * const fn64[3][4] = { + { gen_helper_sve_stbd_zsu, + gen_helper_sve_sthd_zsu, + gen_helper_sve_stsd_zsu, + gen_helper_sve_stdd_zsu, }, + { gen_helper_sve_stbd_zss, + gen_helper_sve_sthd_zss, + gen_helper_sve_stsd_zss, + gen_helper_sve_stdd_zss, }, + { gen_helper_sve_stbd_zd, + gen_helper_sve_sthd_zd, + gen_helper_sve_stsd_zd, + gen_helper_sve_stdd_zd, }, + }; + gen_helper_gvec_mem_scatter *fn; + + if (a->esz < a->msz || (a->msz == 0 && a->scale)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + switch (a->esz) { + case MO_32: + fn = fn32[a->xs][a->msz]; + break; + case MO_64: + fn = fn64[a->xs][a->msz]; + break; + default: + g_assert_not_reached(); + } + do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz, + cpu_reg_sp(s, a->rn), fn); + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6a76010f51..7d24c2bdc4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -80,6 +80,7 @@ &rpri_load rd pg rn imm dtype nreg &rprr_store rd pg rn rm msz esz nreg &rpri_store rd pg rn imm msz esz nreg +&rprr_scatter_store rd pg rn rm esz msz xs scale ########################################################################### # Named instruction formats. These are generally used to @@ -198,6 +199,8 @@ @rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store @rprr_store_esz_n0 ....... .. esz:2 rm:5 ... pg:3 rn:5 rd:5 \ &rprr_store nreg=0 +@rprr_scatter_store ....... msz:2 .. rm:5 ... pg:3 rn:5 rd:5 \ + &rprr_scatter_store ########################################################################### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -825,3 +828,39 @@ ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ # SVE store multiple structures (scalar plus scalar) (nreg != 0) ST_zprr 1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \ @rprr_store esz=%size_23 + +# SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) +# Require msz > 0 && msz <= esz. +ST1_zprz 1110010 .. 11 ..... 100 ... ..... ..... \ + @rprr_scatter_store xs=0 esz=2 scale=1 +ST1_zprz 1110010 .. 11 ..... 110 ... ..... ..... \ + @rprr_scatter_store xs=1 esz=2 scale=1 + +# SVE 32-bit scatter store (scalar plus 32-bit unscaled offsets) +# Require msz <= esz. +ST1_zprz 1110010 .. 10 ..... 100 ... ..... ..... \ + @rprr_scatter_store xs=0 esz=2 scale=0 +ST1_zprz 1110010 .. 10 ..... 110 ... ..... ..... \ + @rprr_scatter_store xs=1 esz=2 scale=0 + +# SVE 64-bit scatter store (scalar plus 64-bit scaled offset) +# Require msz > 0 +ST1_zprz 1110010 .. 01 ..... 101 ... ..... ..... \ + @rprr_scatter_store xs=2 esz=3 scale=1 + +# SVE 64-bit scatter store (scalar plus 64-bit unscaled offset) +ST1_zprz 1110010 .. 00 ..... 101 ... ..... ..... \ + @rprr_scatter_store xs=2 esz=3 scale=0 + +# SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset) +# Require msz > 0 +ST1_zprz 1110010 .. 01 ..... 100 ... ..... ..... \ + @rprr_scatter_store xs=0 esz=3 scale=1 +ST1_zprz 1110010 .. 01 ..... 110 ... ..... ..... \ + @rprr_scatter_store xs=1 esz=3 scale=1 + +# SVE 64-bit scatter store (scalar plus unpacked 32-bit unscaled offset) +ST1_zprz 1110010 .. 00 ..... 100 ... ..... ..... \ + @rprr_scatter_store xs=0 esz=3 scale=0 +ST1_zprz 1110010 .. 00 ..... 110 ... ..... ..... \ + @rprr_scatter_store xs=1 esz=3 scale=0 From patchwork Wed Jun 27 04:33:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935277 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Dbl3o2tV"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr0K5nlpz9s0w for ; Wed, 27 Jun 2018 14:44:01 +1000 (AEST) Received: from localhost ([::1]:56537 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2JD-0003o8-DH for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:43:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60420) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29P-0004ER-Nh for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29N-0000TH-KL for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:51 -0400 Received: from mail-pf0-x236.google.com ([2607:f8b0:400e:c00::236]:40033) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29N-0000RR-B3 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:49 -0400 Received: by mail-pf0-x236.google.com with SMTP id z24-v6so385454pfe.7 for ; Tue, 26 Jun 2018 21:33:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9Lq4027RZ73q8aO2yN77xs6Fxbmm8ORU6JI1DqiGp0U=; b=Dbl3o2tV66yNcaS+my+QzlfieJfmnnfJGi+4wVuNMiIGfUVB4KJhYZuzR94+CkY7RA qiCJIRxkHkTgaTQucPbDW0tHTQOCoS3PMaPWmPoxt5SZPHLUqdaRVHnSJbDZsu1nsMoG fJx+S7z8sbq0frfsS2iPZ6/CiZ2/Nu7+IEB6g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9Lq4027RZ73q8aO2yN77xs6Fxbmm8ORU6JI1DqiGp0U=; b=dYsNTgr0JBacYoXYaZZXMs+2AwEm7UKE7qy01i5ePD3R/UlRAr+FOZ385NbNpk8B7A vHoFApuP3fw322jfDamUyVsNWBaWOpW7yY33cQKgWtr2pLKenq+ijIfRzWxTzWBvYo0u KsrL6AdBl5xw6+Mh2GBUNxHRykBq+L1vrD23ymv81GBTz91ThfZYvJP2Z5NSg3b70wNZ lgoFcZknjkw7CMRBRvKRIacoxdJlzdMkEFlk/5LEqVHWjnRgpdo8t/1wu8WETrRMPkVb EnFE/sPn5GGqTHmdexQGTCQGBKj530ti+WPTXRyyW9zJdprYBKHI5kq02/QNipEinYf5 B4PQ== X-Gm-Message-State: APt69E1eoL12Aqtk0EKxo1DphRRAJiVvzS4L33EcT/E+b4zFeyCuSwNB 4WIBOQqncbHotQVvjAu87TX7IFVJqfI= X-Google-Smtp-Source: AAOMgpdQpuIvbehNMmITaXJmV88DahTaMD7ONGsUdveefKdV8hJAMl6MgNpOb9pydFtBg1dZJdWaJg== X-Received: by 2002:a62:e70e:: with SMTP id s14-v6mr4270998pfh.131.1530074027951; Tue, 26 Jun 2018 21:33:47 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.46 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:47 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:05 -0700 Message-Id: <20180627043328.11531-13-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::236 Subject: [Qemu-devel] [PATCH v6 12/35] target/arm: Implement SVE prefetches X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/translate-sve.c | 21 +++++++++++++++++++++ target/arm/sve.decode | 23 +++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 65da3e633f..27854e0042 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4303,3 +4303,24 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) cpu_reg_sp(s, a->rn), fn); return true; } + +/* + * Prefetches + */ + +static bool trans_PRF(DisasContext *s, arg_PRF *a, uint32_t insn) +{ + /* Prefetch is a nop within QEMU. */ + sve_access_check(s); + return true; +} + +static bool trans_PRF_rr(DisasContext *s, arg_PRF_rr *a, uint32_t insn) +{ + if (a->rm == 31) { + return false; + } + /* Prefetch is a nop within QEMU. */ + sve_access_check(s); + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 7d24c2bdc4..80b955ff84 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -794,6 +794,29 @@ LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \ LD1RQ_zpri 1010010 .. 00 0.... 001 ... ..... ..... \ @rpri_load_msz nreg=0 +# SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets) +PRF 1000010 00 -1 ----- 0-- --- ----- 0 ---- + +# SVE 32-bit gather prefetch (vector plus immediate) +PRF 1000010 -- 00 ----- 111 --- ----- 0 ---- + +# SVE contiguous prefetch (scalar plus immediate) +PRF 1000010 11 1- ----- 0-- --- ----- 0 ---- + +# SVE contiguous prefetch (scalar plus scalar) +PRF_rr 1000010 -- 00 rm:5 110 --- ----- 0 ---- + +### SVE Memory 64-bit Gather Group + +# SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets) +PRF 1100010 00 11 ----- 1-- --- ----- 0 ---- + +# SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets) +PRF 1100010 00 -1 ----- 0-- --- ----- 0 ---- + +# SVE 64-bit gather prefetch (vector plus immediate) +PRF 1100010 -- 00 ----- 111 --- ----- 0 ---- + ### SVE Memory Store Group # SVE store predicate register From patchwork Wed Jun 27 04:33:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935282 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="RJN80DZ5"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr4D5wrLz9s29 for ; Wed, 27 Jun 2018 14:47:24 +1000 (AEST) Received: from localhost ([::1]:56568 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2MU-0006kj-D5 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:47:22 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29S-0004Gt-QN for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29P-0000Vp-0W for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:54 -0400 Received: from mail-pf0-x235.google.com ([2607:f8b0:400e:c00::235]:44638) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29O-0000U1-M8 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:50 -0400 Received: by mail-pf0-x235.google.com with SMTP id j3-v6so383023pfh.11 for ; Tue, 26 Jun 2018 21:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=qNYfzm3VvV6HqePkDEdE08ZEgEkLyLKRB5+Jl0PRJT0=; b=RJN80DZ5XKdHXI/VZu0qBTlfQ7Dj19Oht/79qztgNlSPHJFUm6SvgA7MDo6WBgCOYn G2etOpzZXAjhcDAP0blUUW91SVQUbrmU1QP5Y32mDOpxkeLaohS93aUmjXHkZHN6AoRt nRN0n53foo6vaTc9fs8Nj0fEp09TQBW+jLcz8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=qNYfzm3VvV6HqePkDEdE08ZEgEkLyLKRB5+Jl0PRJT0=; b=F6vX9orNw1/825jhiGP+9wmsIKCyxwyjZlGTDHIIGJ12hXdazJo3rpGogTGbKTzkq/ Y10tDcl3vfSR8t2Mrhy/TO3cKG+bIowTdo7S/pvyO+jzbCbg/eDWw8zGCRfosYbyewQR uZEMeNA66xAylqX+SvFbu5xSK9sa1DAL1xLdXAsPXV3YF2FixXwVmAuzqo7eUBZZCH0e BGEFiXR+DQ77wyqFWIQ7vaO1PUcF35a0KGv05A4jY/sM6WmLl9mrNZDOTLKR/x2mHbmx 1FW1ScNSq6djYSIP+HbO3ihgo0mMii//AS254xxLtVUBMext6mazeKp2sizbxTQf5Cll Vriw== X-Gm-Message-State: APt69E0Fa3F757lgTDtFmmLENk7y9MnFqRBwqEZsfRIhBlz7jfpDRMQT qgi+aeJIHjPhlfZM4HSkvbuZCZv06t0= X-Google-Smtp-Source: ADUXVKJoOEVLpftX1NAohF6U1xDYa5Zw/eSb6i15u3xz4rY7NpKr7GD3GSm+e0ss+vrI7Tii3+/UOg== X-Received: by 2002:a63:8b44:: with SMTP id j65-v6mr3780141pge.248.1530074029244; Tue, 26 Jun 2018 21:33:49 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.48 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:48 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:06 -0700 Message-Id: <20180627043328.11531-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::235 Subject: [Qemu-devel] [PATCH v6 13/35] target/arm: Implement SVE gather loads X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée --- v6: * Finish esz == msz && u==1 decode in sve.decode. * Remove duplicate decode in trans_ST1_zprz. * Add xs=2 comment. * Reformat tables to leave room for ff helpers. --- target/arm/helper-sve.h | 67 +++++++++++++++++++++++++ target/arm/sve_helper.c | 77 ++++++++++++++++++++++++++++ target/arm/translate-sve.c | 100 +++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 57 +++++++++++++++++++++ 4 files changed, 301 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 8880128f9c..aeb62afc34 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -959,6 +959,73 @@ DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhsu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldssu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbss_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhss_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldbsu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhsu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldssu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbss_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhss_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldbdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldddu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldbdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldddu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldbdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldddu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldbds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldhds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldsds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 7622bb2af0..24f75a32d3 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3714,6 +3714,83 @@ void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg, } } +/* Loads with a vector index. */ + +#define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + unsigned scale = simd_data(desc); \ + uintptr_t ra = GETPC(); \ + for (i = 0; i < oprsz; i++) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m = 0; \ + if (pg & 1) { \ + target_ulong off = *(TYPEI *)(vm + H1_4(i)); \ + m = FN(env, base + (off << scale), ra); \ + } \ + *(uint32_t *)(vd + H1_4(i)) = m; \ + i += 4, pg >>= 4; \ + } while (i & 15); \ + } \ +} + +#define DO_LD1_ZPZ_D(NAME, TYPEI, TYPEM, FN) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc) / 8; \ + unsigned scale = simd_data(desc); \ + uintptr_t ra = GETPC(); \ + uint64_t *d = vd, *m = vm; uint8_t *pg = vg; \ + for (i = 0; i < oprsz; i++) { \ + TYPEM mm = 0; \ + if (pg[H1(i)] & 1) { \ + target_ulong off = (TYPEI)m[i]; \ + mm = FN(env, base + (off << scale), ra); \ + } \ + d[i] = mm; \ + } \ +} + +DO_LD1_ZPZ_S(sve_ldbsu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_S(sve_ldhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_S(sve_ldssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra) +DO_LD1_ZPZ_S(sve_ldbss_zsu, uint32_t, int8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_S(sve_ldhss_zsu, uint32_t, int16_t, cpu_lduw_data_ra) + +DO_LD1_ZPZ_S(sve_ldbsu_zss, int32_t, uint8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_S(sve_ldhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_S(sve_ldssu_zss, int32_t, uint32_t, cpu_ldl_data_ra) +DO_LD1_ZPZ_S(sve_ldbss_zss, int32_t, int8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_S(sve_ldhss_zss, int32_t, int16_t, cpu_lduw_data_ra) + +DO_LD1_ZPZ_D(sve_ldbdu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra) +DO_LD1_ZPZ_D(sve_ldddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra) +DO_LD1_ZPZ_D(sve_ldbds_zsu, uint32_t, int8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhds_zsu, uint32_t, int16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsds_zsu, uint32_t, int32_t, cpu_ldl_data_ra) + +DO_LD1_ZPZ_D(sve_ldbdu_zss, int32_t, uint8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra) +DO_LD1_ZPZ_D(sve_ldddu_zss, int32_t, uint64_t, cpu_ldq_data_ra) +DO_LD1_ZPZ_D(sve_ldbds_zss, int32_t, int8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhds_zss, int32_t, int16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsds_zss, int32_t, int32_t, cpu_ldl_data_ra) + +DO_LD1_ZPZ_D(sve_ldbdu_zd, uint64_t, uint8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra) +DO_LD1_ZPZ_D(sve_ldddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra) +DO_LD1_ZPZ_D(sve_ldbds_zd, uint64_t, int8_t, cpu_ldub_data_ra) +DO_LD1_ZPZ_D(sve_ldhds_zd, uint64_t, int16_t, cpu_lduw_data_ra) +DO_LD1_ZPZ_D(sve_ldsds_zd, uint64_t, int32_t, cpu_ldl_data_ra) + /* Stores with a vector index. */ #define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 27854e0042..33ffb217d0 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4255,6 +4255,106 @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale, tcg_temp_free_i32(desc); } +/* Indexed by [ff][xs][u][msz]. */ +static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][3] = { + { { { gen_helper_sve_ldbss_zsu, + gen_helper_sve_ldhss_zsu, + NULL, }, + { gen_helper_sve_ldbsu_zsu, + gen_helper_sve_ldhsu_zsu, + gen_helper_sve_ldssu_zsu, } }, + { { gen_helper_sve_ldbss_zss, + gen_helper_sve_ldhss_zss, + NULL, }, + { gen_helper_sve_ldbsu_zss, + gen_helper_sve_ldhsu_zss, + gen_helper_sve_ldssu_zss, } } }, + /* TODO fill in first-fault handlers */ +}; + +/* Note that we overload xs=2 to indicate 64-bit offset. */ +static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][3][2][4] = { + { { { gen_helper_sve_ldbds_zsu, + gen_helper_sve_ldhds_zsu, + gen_helper_sve_ldsds_zsu, + NULL, }, + { gen_helper_sve_ldbdu_zsu, + gen_helper_sve_ldhdu_zsu, + gen_helper_sve_ldsdu_zsu, + gen_helper_sve_ldddu_zsu, } }, + { { gen_helper_sve_ldbds_zss, + gen_helper_sve_ldhds_zss, + gen_helper_sve_ldsds_zss, + NULL, }, + { gen_helper_sve_ldbdu_zss, + gen_helper_sve_ldhdu_zss, + gen_helper_sve_ldsdu_zss, + gen_helper_sve_ldddu_zss, } }, + { { gen_helper_sve_ldbds_zd, + gen_helper_sve_ldhds_zd, + gen_helper_sve_ldsds_zd, + NULL, }, + { gen_helper_sve_ldbdu_zd, + gen_helper_sve_ldhdu_zd, + gen_helper_sve_ldsdu_zd, + gen_helper_sve_ldddu_zd, } } }, + /* TODO fill in first-fault handlers */ +}; + +static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn) +{ + gen_helper_gvec_mem_scatter *fn = NULL; + + if (!sve_access_check(s)) { + return true; + } + + switch (a->esz) { + case MO_32: + fn = gather_load_fn32[a->ff][a->xs][a->u][a->msz]; + break; + case MO_64: + fn = gather_load_fn64[a->ff][a->xs][a->u][a->msz]; + break; + } + assert(fn != NULL); + + do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz, + cpu_reg_sp(s, a->rn), fn); + return true; +} + +static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn) +{ + gen_helper_gvec_mem_scatter *fn = NULL; + TCGv_i64 imm; + + if (a->esz < a->msz || (a->esz == a->msz && !a->u)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + switch (a->esz) { + case MO_32: + fn = gather_load_fn32[a->ff][0][a->u][a->msz]; + break; + case MO_64: + fn = gather_load_fn64[a->ff][2][a->u][a->msz]; + break; + } + assert(fn != NULL); + + /* Treat LD1_zpiz (zn[x] + imm) the same way as LD1_zprz (rn + zm[x]) + * by loading the immediate into the scalar parameter. + */ + imm = tcg_const_i64(a->imm << a->msz); + do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn); + tcg_temp_free_i64(imm); + return true; +} + static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) { /* Indexed by [xs][msz]. */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 80b955ff84..45016c6042 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -80,6 +80,8 @@ &rpri_load rd pg rn imm dtype nreg &rprr_store rd pg rn rm msz esz nreg &rpri_store rd pg rn imm msz esz nreg +&rprr_gather_load rd pg rn rm esz msz u ff xs scale +&rpri_gather_load rd pg rn imm esz msz u ff &rprr_scatter_store rd pg rn rm esz msz xs scale ########################################################################### @@ -194,6 +196,22 @@ @rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ &rpri_load dtype=%msz_dtype +# Gather Loads. +@rprr_g_load_u ....... .. . . rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load xs=2 +@rprr_g_load_xs_u ....... .. xs:1 . rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load +@rprr_g_load_xs_u_sc ....... .. xs:1 scale:1 rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load +@rprr_g_load_xs_sc ....... .. xs:1 scale:1 rm:5 . . ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load +@rprr_g_load_u_sc ....... .. . scale:1 rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load xs=2 +@rprr_g_load_sc ....... .. . scale:1 rm:5 . . ff:1 pg:3 rn:5 rd:5 \ + &rprr_gather_load xs=2 +@rpri_g_load ....... msz:2 .. imm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \ + &rpri_gather_load + # Stores; user must fill in ESZ, MSZ, NREG as needed. @rprr_store ....... .. .. rm:5 ... pg:3 rn:5 rd:5 &rprr_store @rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store @@ -759,6 +777,19 @@ LDR_zri 10000101 10 ...... 010 ... ..... ..... @rd_rn_i9 LD1R_zpri 1000010 .. 1 imm:6 1.. pg:3 rn:5 rd:5 \ &rpri_load dtype=%dtype_23_13 nreg=0 +# SVE 32-bit gather load (scalar plus 32-bit unscaled offsets) +# SVE 32-bit gather load (scalar plus 32-bit scaled offsets) +LD1_zprz 1000010 00 .0 ..... 0.. ... ..... ..... \ + @rprr_g_load_xs_u esz=2 msz=0 scale=0 +LD1_zprz 1000010 01 .. ..... 0.. ... ..... ..... \ + @rprr_g_load_xs_u_sc esz=2 msz=1 +LD1_zprz 1000010 10 .. ..... 01. ... ..... ..... \ + @rprr_g_load_xs_sc esz=2 msz=2 u=1 + +# SVE 32-bit gather load (vector plus immediate) +LD1_zpiz 1000010 .. 01 ..... 1.. ... ..... ..... \ + @rpri_g_load esz=2 + ### SVE Memory Contiguous Load Group # SVE contiguous load (scalar plus scalar) @@ -808,6 +839,32 @@ PRF_rr 1000010 -- 00 rm:5 110 --- ----- 0 ---- ### SVE Memory 64-bit Gather Group +# SVE 64-bit gather load (scalar plus 32-bit unpacked unscaled offsets) +# SVE 64-bit gather load (scalar plus 32-bit unpacked scaled offsets) +LD1_zprz 1100010 00 .0 ..... 0.. ... ..... ..... \ + @rprr_g_load_xs_u esz=3 msz=0 scale=0 +LD1_zprz 1100010 01 .. ..... 0.. ... ..... ..... \ + @rprr_g_load_xs_u_sc esz=3 msz=1 +LD1_zprz 1100010 10 .. ..... 0.. ... ..... ..... \ + @rprr_g_load_xs_u_sc esz=3 msz=2 +LD1_zprz 1100010 11 .. ..... 01. ... ..... ..... \ + @rprr_g_load_xs_sc esz=3 msz=3 u=1 + +# SVE 64-bit gather load (scalar plus 64-bit unscaled offsets) +# SVE 64-bit gather load (scalar plus 64-bit scaled offsets) +LD1_zprz 1100010 00 10 ..... 1.. ... ..... ..... \ + @rprr_g_load_u esz=3 msz=0 scale=0 +LD1_zprz 1100010 01 1. ..... 1.. ... ..... ..... \ + @rprr_g_load_u_sc esz=3 msz=1 +LD1_zprz 1100010 10 1. ..... 1.. ... ..... ..... \ + @rprr_g_load_u_sc esz=3 msz=2 +LD1_zprz 1100010 11 1. ..... 11. ... ..... ..... \ + @rprr_g_load_sc esz=3 msz=3 u=1 + +# SVE 64-bit gather load (vector plus immediate) +LD1_zpiz 1100010 .. 01 ..... 1.. ... ..... ..... \ + @rpri_g_load esz=3 + # SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets) PRF 1100010 00 11 ----- 1-- --- ----- 0 ---- From patchwork Wed Jun 27 04:33:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935280 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="BpKgVB6h"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr2w4QTgz9s0w for ; Wed, 27 Jun 2018 14:46:16 +1000 (AEST) Received: from localhost ([::1]:56555 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2LO-0005xC-57 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:46:14 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60526) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29T-0004IT-NK for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29Q-0000YH-HG for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:55 -0400 Received: from mail-pg0-x22e.google.com ([2607:f8b0:400e:c05::22e]:35602) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29Q-0000WY-4q for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:52 -0400 Received: by mail-pg0-x22e.google.com with SMTP id i7-v6so366689pgp.2 for ; Tue, 26 Jun 2018 21:33:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5vv4EiTE9BZ96vf89TrhRWBBhF9m5gudG6jjB7iTbjw=; b=BpKgVB6hmk40q4nqbaaKwzHxRko6wqTAYkRyhB5/LDOaBu12phWb9rYnY3J563gch2 rDxnmyZcmFGJex+A5q+ycl8WFJlWO6vohQCYSstR9RAY1stc9kJ5hWgjQpKFZrZQ4WTc 3pu0x4hiX4KP125Gl0AIvSOqBHJerchK7xk5Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5vv4EiTE9BZ96vf89TrhRWBBhF9m5gudG6jjB7iTbjw=; b=VsWUqOew1/mvXLopfsUH+qo6AzP5xS7EeE3H5A9+SI+SN2bY7Za6ks72ZZyK/QUh1R D7rvZei+P4IqcEv2NxaFLnoUnFGwWtTaX2k4hiurgpg0DlCNjd+gXVCaviK3IoumGO6g 9DeC0aMYUp/LAWVTbMn5hhf0LBi3/AY5hZEXrejjTPVMGtCerGRVZsV1cBqsQgTmoVcT Gs8tFDnFaMmfVBF4xa1s8K5AvjqKz+0PpdAdOQz0719jhdFyDtuZuDbpnP0eQ7rqLG14 56oCA5pkn6qdVASbcEf+3fuBIBJI0MXBK9TSpgNWgZK4+1ksBb4YHF7Cg+h2UisYSuWp i4hA== X-Gm-Message-State: APt69E2sQD9EEzkKlQDZ0U9OnE2DVz++IX40jdObdYTk2MmhJNVxYu6a yIgLIbW82mjt0J8XqD+MHgH2KnY1eB8= X-Google-Smtp-Source: ADUXVKIRMO6vRqFkODqr9EV4purQ9EJXhPPM6ppWoQy+ZmUwzdfIOZbSQtCryCxW6tinV9tDpqUrLw== X-Received: by 2002:a63:5e45:: with SMTP id s66-v6mr3712536pgb.151.1530074030801; Tue, 26 Jun 2018 21:33:50 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:49 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:07 -0700 Message-Id: <20180627043328.11531-15-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22e Subject: [Qemu-devel] [PATCH v6 14/35] target/arm: Implement SVE first-fault gather loads X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 67 +++++++++++++++++++++++++++++ target/arm/sve_helper.c | 88 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 40 ++++++++++++++++- 3 files changed, 193 insertions(+), 2 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index aeb62afc34..55e8a908d4 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1026,6 +1026,73 @@ DEF_HELPER_FLAGS_6(sve_ldhds_zd, TCG_CALL_NO_WG, DEF_HELPER_FLAGS_6(sve_ldsds_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhsu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffssu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbss_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhss_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldffbsu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhsu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffssu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbss_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhss_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldffbdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsdu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffddu_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsds_zsu, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldffbdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsdu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffddu_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsds_zss, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + +DEF_HELPER_FLAGS_6(sve_ldffbdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsdu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffddu_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffbds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffhds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldffsds_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) + DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 24f75a32d3..81fc968087 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3791,6 +3791,94 @@ DO_LD1_ZPZ_D(sve_ldbds_zd, uint64_t, int8_t, cpu_ldub_data_ra) DO_LD1_ZPZ_D(sve_ldhds_zd, uint64_t, int16_t, cpu_lduw_data_ra) DO_LD1_ZPZ_D(sve_ldsds_zd, uint64_t, int32_t, cpu_ldl_data_ra) +/* First fault loads with a vector index. */ + +#ifdef CONFIG_USER_ONLY + +#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + unsigned scale = simd_data(desc); \ + uintptr_t ra = GETPC(); \ + bool first = true; \ + mmap_lock(); \ + for (i = 0; i < oprsz; i++) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPEM m = 0; \ + if (pg & 1) { \ + target_ulong off = *(TYPEI *)(vm + H(i)); \ + target_ulong addr = base + (off << scale); \ + if (!first && \ + page_check_range(addr, sizeof(TYPEM), PAGE_READ)) { \ + record_fault(env, i, oprsz); \ + goto exit; \ + } \ + m = FN(env, addr, ra); \ + first = false; \ + } \ + *(TYPEE *)(vd + H(i)) = m; \ + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ + } while (i & 15); \ + } \ + exit: \ + mmap_unlock(); \ +} + +#else + +#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H) \ +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ + target_ulong base, uint32_t desc) \ +{ \ + g_assert_not_reached(); \ +} + +#endif + +#define DO_LDFF1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \ + DO_LDFF1_ZPZ(NAME, uint32_t, TYPEI, TYPEM, FN, H1_4) +#define DO_LDFF1_ZPZ_D(NAME, TYPEI, TYPEM, FN) \ + DO_LDFF1_ZPZ(NAME, uint64_t, TYPEI, TYPEM, FN, ) + +DO_LDFF1_ZPZ_S(sve_ldffbsu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffbss_zsu, uint32_t, int8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffhss_zsu, uint32_t, int16_t, cpu_lduw_data_ra) + +DO_LDFF1_ZPZ_S(sve_ldffbsu_zss, int32_t, uint8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffssu_zss, int32_t, uint32_t, cpu_ldl_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffbss_zss, int32_t, int8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_S(sve_ldffhss_zss, int32_t, int16_t, cpu_lduw_data_ra) + +DO_LDFF1_ZPZ_D(sve_ldffbdu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffbds_zsu, uint32_t, int8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhds_zsu, uint32_t, int16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsds_zsu, uint32_t, int32_t, cpu_ldl_data_ra) + +DO_LDFF1_ZPZ_D(sve_ldffbdu_zss, int32_t, uint8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffddu_zss, int32_t, uint64_t, cpu_ldq_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffbds_zss, int32_t, int8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhds_zss, int32_t, int16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsds_zss, int32_t, int32_t, cpu_ldl_data_ra) + +DO_LDFF1_ZPZ_D(sve_ldffbdu_zd, uint64_t, uint8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffbds_zd, uint64_t, int8_t, cpu_ldub_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffhds_zd, uint64_t, int16_t, cpu_lduw_data_ra) +DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t, cpu_ldl_data_ra) + /* Stores with a vector index. */ #define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 33ffb217d0..ea4407b746 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4269,7 +4269,19 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][3] = { { gen_helper_sve_ldbsu_zss, gen_helper_sve_ldhsu_zss, gen_helper_sve_ldssu_zss, } } }, - /* TODO fill in first-fault handlers */ + + { { { gen_helper_sve_ldffbss_zsu, + gen_helper_sve_ldffhss_zsu, + NULL, }, + { gen_helper_sve_ldffbsu_zsu, + gen_helper_sve_ldffhsu_zsu, + gen_helper_sve_ldffssu_zsu, } }, + { { gen_helper_sve_ldffbss_zss, + gen_helper_sve_ldffhss_zss, + NULL, }, + { gen_helper_sve_ldffbsu_zss, + gen_helper_sve_ldffhsu_zss, + gen_helper_sve_ldffssu_zss, } } } }; /* Note that we overload xs=2 to indicate 64-bit offset. */ @@ -4298,7 +4310,31 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][3][2][4] = { gen_helper_sve_ldhdu_zd, gen_helper_sve_ldsdu_zd, gen_helper_sve_ldddu_zd, } } }, - /* TODO fill in first-fault handlers */ + + { { { gen_helper_sve_ldffbds_zsu, + gen_helper_sve_ldffhds_zsu, + gen_helper_sve_ldffsds_zsu, + NULL, }, + { gen_helper_sve_ldffbdu_zsu, + gen_helper_sve_ldffhdu_zsu, + gen_helper_sve_ldffsdu_zsu, + gen_helper_sve_ldffddu_zsu, } }, + { { gen_helper_sve_ldffbds_zss, + gen_helper_sve_ldffhds_zss, + gen_helper_sve_ldffsds_zss, + NULL, }, + { gen_helper_sve_ldffbdu_zss, + gen_helper_sve_ldffhdu_zss, + gen_helper_sve_ldffsdu_zss, + gen_helper_sve_ldffddu_zss, } }, + { { gen_helper_sve_ldffbds_zd, + gen_helper_sve_ldffhds_zd, + gen_helper_sve_ldffsds_zd, + NULL, }, + { gen_helper_sve_ldffbdu_zd, + gen_helper_sve_ldffhdu_zd, + gen_helper_sve_ldffsdu_zd, + gen_helper_sve_ldffddu_zd, } } } }; static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn) From patchwork Wed Jun 27 04:33:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935275 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="OTrEyJvG"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fqy34t2nz9s29 for ; Wed, 27 Jun 2018 14:42:03 +1000 (AEST) Received: from localhost ([::1]:56527 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2HJ-0002C9-1n for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:42:01 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60558) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29U-0004JV-RU for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29R-0000aZ-Rc for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:56 -0400 Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:38669) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29R-0000Z5-GQ for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:53 -0400 Received: by mail-pg0-x236.google.com with SMTP id c9-v6so365769pgf.5 for ; Tue, 26 Jun 2018 21:33:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dhB8ii/GaM/P1LrIoAgJ5HoWXCaUKMDijHadhOvMSUc=; b=OTrEyJvGApnT5jQ64CAc0Q3i0rfOuVYF4Ou1BBEtyCAS43+bORzUUSmUjkvxDlR/Yw BZf2iQSD8anGq3u6fMWfv7LGJahtn4hoFFBC6L+7j97PZzrUQmMHNMLOAXNn0g9225Ut Q9bCXT4SxiYsHCbH+JSwxHP6dRNyoskLlMm9Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dhB8ii/GaM/P1LrIoAgJ5HoWXCaUKMDijHadhOvMSUc=; b=W1wdQE59UAy7yeMUGmTd6kGLmkkIKtCMpze2JWT+xwsXgihkoix2EqIpNb0x30hwbk S9iGyfi2+7Iuya7nGnICEJcrV58PAoLo+rJ1zRM8choq4My5GTKNRPScn7Pc+UBARLD8 /PW11Jh0JXC8NfGT8SUlJHYlU8afqPdoQC2MXslpbdzbyEzss6RFvDB8d43oK+d2M+sF t2VxoV2850sjw43vfR5BUNzeGlHGI8MncTyQ3ataGvqIEQTrd6rq5o4nPUfv8pCykLq1 psL5zaKihJy8ruEy+QL2GO2Gf4NV+UV0D4KROKHDm8A58Ajy1i/hhC3mutjcufPgidol bGcw== X-Gm-Message-State: APt69E1+Wr9838NOVMDMbzjOGdf5A1pw5+Y10G5y6v6XDAoSHsF8JS08 oSk+Pgd8VtU4zonFJu5+Q9n6N35uW68= X-Google-Smtp-Source: ADUXVKKWQNGpwRUUJusuLeQuqPyYJxqvgt3UgE7KcuRfYRpis+k9kDBl0ChEq6CiAQgWjenl8KNfoA== X-Received: by 2002:a63:6501:: with SMTP id z1-v6mr3747234pgb.452.1530074032199; Tue, 26 Jun 2018 21:33:52 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.50 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:51 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:08 -0700 Message-Id: <20180627043328.11531-16-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::236 Subject: [Qemu-devel] [PATCH v6 15/35] target/arm: Implement SVE scatter store vector immediate X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/translate-sve.c | 85 ++++++++++++++++++++++++++------------ target/arm/sve.decode | 11 +++++ 2 files changed, 70 insertions(+), 26 deletions(-) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index ea4407b746..9eb2530d3b 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4391,32 +4391,34 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn) return true; } +/* Indexed by [xs][msz]. */ +static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][3] = { + { gen_helper_sve_stbs_zsu, + gen_helper_sve_sths_zsu, + gen_helper_sve_stss_zsu, }, + { gen_helper_sve_stbs_zss, + gen_helper_sve_sths_zss, + gen_helper_sve_stss_zss, }, +}; + +/* Note that we overload xs=2 to indicate 64-bit offset. */ +static gen_helper_gvec_mem_scatter * const scatter_store_fn64[3][4] = { + { gen_helper_sve_stbd_zsu, + gen_helper_sve_sthd_zsu, + gen_helper_sve_stsd_zsu, + gen_helper_sve_stdd_zsu, }, + { gen_helper_sve_stbd_zss, + gen_helper_sve_sthd_zss, + gen_helper_sve_stsd_zss, + gen_helper_sve_stdd_zss, }, + { gen_helper_sve_stbd_zd, + gen_helper_sve_sthd_zd, + gen_helper_sve_stsd_zd, + gen_helper_sve_stdd_zd, }, +}; + static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) { - /* Indexed by [xs][msz]. */ - static gen_helper_gvec_mem_scatter * const fn32[2][3] = { - { gen_helper_sve_stbs_zsu, - gen_helper_sve_sths_zsu, - gen_helper_sve_stss_zsu, }, - { gen_helper_sve_stbs_zss, - gen_helper_sve_sths_zss, - gen_helper_sve_stss_zss, }, - }; - /* Note that we overload xs=2 to indicate 64-bit offset. */ - static gen_helper_gvec_mem_scatter * const fn64[3][4] = { - { gen_helper_sve_stbd_zsu, - gen_helper_sve_sthd_zsu, - gen_helper_sve_stsd_zsu, - gen_helper_sve_stdd_zsu, }, - { gen_helper_sve_stbd_zss, - gen_helper_sve_sthd_zss, - gen_helper_sve_stsd_zss, - gen_helper_sve_stdd_zss, }, - { gen_helper_sve_stbd_zd, - gen_helper_sve_sthd_zd, - gen_helper_sve_stsd_zd, - gen_helper_sve_stdd_zd, }, - }; gen_helper_gvec_mem_scatter *fn; if (a->esz < a->msz || (a->msz == 0 && a->scale)) { @@ -4427,10 +4429,10 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) } switch (a->esz) { case MO_32: - fn = fn32[a->xs][a->msz]; + fn = scatter_store_fn32[a->xs][a->msz]; break; case MO_64: - fn = fn64[a->xs][a->msz]; + fn = scatter_store_fn64[a->xs][a->msz]; break; default: g_assert_not_reached(); @@ -4440,6 +4442,37 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) return true; } +static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn) +{ + gen_helper_gvec_mem_scatter *fn = NULL; + TCGv_i64 imm; + + if (a->esz < a->msz) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + switch (a->esz) { + case MO_32: + fn = scatter_store_fn32[0][a->msz]; + break; + case MO_64: + fn = scatter_store_fn64[2][a->msz]; + break; + } + assert(fn != NULL); + + /* Treat ST1_zpiz (zn[x] + imm) the same way as ST1_zprz (rn + zm[x]) + * by loading the immediate into the scalar parameter. + */ + imm = tcg_const_i64(a->imm << a->msz); + do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn); + tcg_temp_free_i64(imm); + return true; +} + /* * Prefetches */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 45016c6042..75133ce659 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -83,6 +83,7 @@ &rprr_gather_load rd pg rn rm esz msz u ff xs scale &rpri_gather_load rd pg rn imm esz msz u ff &rprr_scatter_store rd pg rn rm esz msz xs scale +&rpri_scatter_store rd pg rn imm esz msz ########################################################################### # Named instruction formats. These are generally used to @@ -219,6 +220,8 @@ &rprr_store nreg=0 @rprr_scatter_store ....... msz:2 .. rm:5 ... pg:3 rn:5 rd:5 \ &rprr_scatter_store +@rpri_scatter_store ....... msz:2 .. imm:5 ... pg:3 rn:5 rd:5 \ + &rpri_scatter_store ########################################################################### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -932,6 +935,14 @@ ST1_zprz 1110010 .. 01 ..... 101 ... ..... ..... \ ST1_zprz 1110010 .. 00 ..... 101 ... ..... ..... \ @rprr_scatter_store xs=2 esz=3 scale=0 +# SVE 64-bit scatter store (vector plus immediate) +ST1_zpiz 1110010 .. 10 ..... 101 ... ..... ..... \ + @rpri_scatter_store esz=3 + +# SVE 32-bit scatter store (vector plus immediate) +ST1_zpiz 1110010 .. 11 ..... 101 ... ..... ..... \ + @rpri_scatter_store esz=2 + # SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset) # Require msz > 0 ST1_zprz 1110010 .. 01 ..... 100 ... ..... ..... \ From patchwork Wed Jun 27 04:33:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935285 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="LzOZAX2n"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr8B5qydz9s0w for ; Wed, 27 Jun 2018 14:50:50 +1000 (AEST) Received: from localhost ([::1]:56584 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Po-0000wQ-BJ for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:50:48 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60570) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29V-0004Jz-9z for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29T-0000cy-4V for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:57 -0400 Received: from mail-pg0-x230.google.com ([2607:f8b0:400e:c05::230]:40253) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29S-0000bZ-RT for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:55 -0400 Received: by mail-pg0-x230.google.com with SMTP id w8-v6so362709pgp.7 for ; Tue, 26 Jun 2018 21:33:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=k+61EXcPbnvOR/jsDDYTANncnlq/tWc+99vEb14N6Ow=; b=LzOZAX2nyFXzjPIP0d/7zxvG/wsynE5xtPzuj6S3cDcZ2KMEpNYmJphdssYdQIEXtR 2BDX4I1Wr3bc2ZvrpzDK2Xv4ZQkUyMaYgVzEq6HwYz0r0ZcTUMyXXNDpkGlXea9jh0rA 07B65TDgrpM4DCPusrIYIRpZucKulcRAz4LBc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=k+61EXcPbnvOR/jsDDYTANncnlq/tWc+99vEb14N6Ow=; b=TVGrZ4FsBelpZJMGLkXb7DwhWDbRI7a06NoWbExnbfal2ZNjbvWp6OOTsfXTLYpzj/ jAu+yNYwyH1YhnOo1SCi14sU0FIOHGKcn7NEPydnJa3+jgsMuGuBoOLTfHC4CY/wAda/ 7LEwXwizlsqHx/uEssR6+SDeetc/+y3oq+RnfNFtCusbOzflP/K71PxSkQyQpVSYv67k e1GKDnWpayc8J2rtwhY/vH82xVeZEuro4J+66l/dUwYjnFCDzwV9MNMmXIJmlRIvhwuB lyXNywl8FzH+XBfFQjjU9eQ93P0LqhF1BviblQ6TgcB2S3WWF2m5GZTNCLz2hjwfkuIE 7ySQ== X-Gm-Message-State: APt69E2z5DVNGgAEQekiYd2Ap9KeuK1HF6PYNqPCq4F96umWillnQ5X9 R04re9KOSKpujGXLCGqTFyiQHfQ48lU= X-Google-Smtp-Source: AAOMgpcQPqppsn4GdP91hPds7yY952eE6zVvowL3TBNuw9UHBPYNWKO3Rz0H7EIb6hs1d3mylO+LMw== X-Received: by 2002:a63:2ac4:: with SMTP id q187-v6mr2283055pgq.333.1530074033622; Tue, 26 Jun 2018 21:33:53 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.52 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:52 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:09 -0700 Message-Id: <20180627043328.11531-17-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::230 Subject: [Qemu-devel] [PATCH v6 16/35] target/arm: Implement SVE floating-point compare vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 49 ++++++++++++++++++++++++++++++ target/arm/sve_helper.c | 62 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 40 ++++++++++++++++++++++++ target/arm/sve.decode | 11 +++++++ 4 files changed, 162 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 55e8a908d4..6089b3a53f 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -839,6 +839,55 @@ DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmge_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmge_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmge_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fcmgt_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmgt_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmgt_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fcmeq_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmeq_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmeq_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fcmne_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmne_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmne_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fcmuo_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmuo_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcmuo_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_facge_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_facge_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_facge_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_facgt_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_facgt_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_facgt_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 81fc968087..41d8ce6b54 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3193,6 +3193,68 @@ void HELPER(sve_fnmls_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) do_fmla_zpzzz_d(env, vg, desc, 0, INT64_MIN); } +/* Two operand floating-point comparison controlled by a predicate. + * Unlike the integer version, we are not allowed to optimistically + * compare operands, since the comparison may have side effects wrt + * the FPSR. + */ +#define DO_FPCMP_PPZZ(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, \ + void *status, uint32_t desc) \ +{ \ + intptr_t i = simd_oprsz(desc), j = (i - 1) >> 6; \ + uint64_t *d = vd, *g = vg; \ + do { \ + uint64_t out = 0, pg = g[j]; \ + do { \ + i -= sizeof(TYPE), out <<= sizeof(TYPE); \ + if (likely((pg >> (i & 63)) & 1)) { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + TYPE mm = *(TYPE *)(vm + H(i)); \ + out |= OP(TYPE, nn, mm, status); \ + } \ + } while (i & 63); \ + d[j--] = out; \ + } while (i > 0); \ +} + +#define DO_FPCMP_PPZZ_H(NAME, OP) \ + DO_FPCMP_PPZZ(NAME##_h, float16, H1_2, OP) +#define DO_FPCMP_PPZZ_S(NAME, OP) \ + DO_FPCMP_PPZZ(NAME##_s, float32, H1_4, OP) +#define DO_FPCMP_PPZZ_D(NAME, OP) \ + DO_FPCMP_PPZZ(NAME##_d, float64, , OP) + +#define DO_FPCMP_PPZZ_ALL(NAME, OP) \ + DO_FPCMP_PPZZ_H(NAME, OP) \ + DO_FPCMP_PPZZ_S(NAME, OP) \ + DO_FPCMP_PPZZ_D(NAME, OP) + +#define DO_FCMGE(TYPE, X, Y, ST) TYPE##_compare(Y, X, ST) <= 0 +#define DO_FCMGT(TYPE, X, Y, ST) TYPE##_compare(Y, X, ST) < 0 +#define DO_FCMEQ(TYPE, X, Y, ST) TYPE##_compare_quiet(X, Y, ST) == 0 +#define DO_FCMNE(TYPE, X, Y, ST) TYPE##_compare_quiet(X, Y, ST) != 0 +#define DO_FCMUO(TYPE, X, Y, ST) \ + TYPE##_compare_quiet(X, Y, ST) == float_relation_unordered +#define DO_FACGE(TYPE, X, Y, ST) \ + TYPE##_compare(TYPE##_abs(Y), TYPE##_abs(X), ST) <= 0 +#define DO_FACGT(TYPE, X, Y, ST) \ + TYPE##_compare(TYPE##_abs(Y), TYPE##_abs(X), ST) < 0 + +DO_FPCMP_PPZZ_ALL(sve_fcmge, DO_FCMGE) +DO_FPCMP_PPZZ_ALL(sve_fcmgt, DO_FCMGT) +DO_FPCMP_PPZZ_ALL(sve_fcmeq, DO_FCMEQ) +DO_FPCMP_PPZZ_ALL(sve_fcmne, DO_FCMNE) +DO_FPCMP_PPZZ_ALL(sve_fcmuo, DO_FCMUO) +DO_FPCMP_PPZZ_ALL(sve_facge, DO_FACGE) +DO_FPCMP_PPZZ_ALL(sve_facgt, DO_FACGT) + +#undef DO_FPCMP_PPZZ_ALL +#undef DO_FPCMP_PPZZ_D +#undef DO_FPCMP_PPZZ_S +#undef DO_FPCMP_PPZZ_H +#undef DO_FPCMP_PPZZ + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 9eb2530d3b..b028a034fd 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3533,6 +3533,46 @@ DO_FP3(FMULX, fmulx) #undef DO_FP3 +static bool do_fp_cmp(DisasContext *s, arg_rprr_esz *a, + gen_helper_gvec_4_ptr *fn) +{ + if (fn == NULL) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_4_ptr(pred_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + pred_full_reg_offset(s, a->pg), + status, vsz, vsz, 0, fn); + tcg_temp_free_ptr(status); + } + return true; +} + +#define DO_FPCMP(NAME, name) \ +static bool trans_##NAME##_ppzz(DisasContext *s, arg_rprr_esz *a, \ + uint32_t insn) \ +{ \ + static gen_helper_gvec_4_ptr * const fns[4] = { \ + NULL, gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, gen_helper_sve_##name##_d \ + }; \ + return do_fp_cmp(s, a, fns[a->esz]); \ +} + +DO_FPCMP(FCMGE, fcmge) +DO_FPCMP(FCMGT, fcmgt) +DO_FPCMP(FCMEQ, fcmeq) +DO_FPCMP(FCMNE, fcmne) +DO_FPCMP(FCMUO, fcmuo) +DO_FPCMP(FACGE, facge) +DO_FPCMP(FACGT, facgt) + +#undef DO_FPCMP + typedef void gen_helper_sve_fmla(TCGv_env, TCGv_ptr, TCGv_i32); static bool do_fmla(DisasContext *s, arg_rprrr_esz *a, gen_helper_sve_fmla *fn) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 75133ce659..a1bc6cb395 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -324,6 +324,17 @@ UXTH 00000100 .. 010 011 101 ... ..... ..... @rd_pg_rn SXTW 00000100 .. 010 100 101 ... ..... ..... @rd_pg_rn UXTW 00000100 .. 010 101 101 ... ..... ..... @rd_pg_rn +### SVE Floating Point Compare - Vectors Group + +# SVE floating-point compare vectors +FCMGE_ppzz 01100101 .. 0 ..... 010 ... ..... 0 .... @pd_pg_rn_rm +FCMGT_ppzz 01100101 .. 0 ..... 010 ... ..... 1 .... @pd_pg_rn_rm +FCMEQ_ppzz 01100101 .. 0 ..... 011 ... ..... 0 .... @pd_pg_rn_rm +FCMNE_ppzz 01100101 .. 0 ..... 011 ... ..... 1 .... @pd_pg_rn_rm +FCMUO_ppzz 01100101 .. 0 ..... 110 ... ..... 0 .... @pd_pg_rn_rm +FACGE_ppzz 01100101 .. 0 ..... 110 ... ..... 1 .... @pd_pg_rn_rm +FACGT_ppzz 01100101 .. 0 ..... 111 ... ..... 1 .... @pd_pg_rn_rm + ### SVE Integer Multiply-Add Group # SVE integer multiply-add writing addend (predicated) From patchwork Wed Jun 27 04:33:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935284 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Jsqa/K89"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr6p34rjz9s0w for ; Wed, 27 Jun 2018 14:49:38 +1000 (AEST) Received: from localhost ([::1]:56579 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Oe-0008I7-1J for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:49:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29X-0004Lz-BY for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29U-0000fF-JM for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:59 -0400 Received: from mail-pl0-x231.google.com ([2607:f8b0:400e:c01::231]:39500) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29U-0000eA-By for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:56 -0400 Received: by mail-pl0-x231.google.com with SMTP id s24-v6so413446plq.6 for ; Tue, 26 Jun 2018 21:33:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kDboUG7FrbsPtw6o6E23ED2280qDLDt1CgJUPA8X60M=; b=Jsqa/K89IhS/AFWR/vXamkmkLh6uDcyMEITIQY8BWGa4NevfKSuj9Ffm85cP8hjv+B LFR4UHRz72s4jU8gAJE64tKoLn3ZVoCdKKXKY6f1JAXGrPJv36qsuu4xyt+Fc5m0edyF ZVB4+zvGLP1x0wkd1xSEhQkaPHNKz5BkNYePQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kDboUG7FrbsPtw6o6E23ED2280qDLDt1CgJUPA8X60M=; b=iB4Ql+5KxxOFdFKBq8Gbmbnfq4uKhOVJutzPcoYGFwTKjQdZWXi3tFCqImuQFN5YLX VFAHb86rk3LNzX69JIpIaMR1rcAgblqmDwDXwxSCAPU7ktjcjRamS6i7iVtoJ4ZnImbW Sq9wGFICCk2clT05qREoonuW0luPTaQxeip1yYB/8pQswUOYx/5IDTLutJh71BPnnkVW JrJT7UsUqkIq5u7r9EMTbQ1uRU9pw0qprEh70ly/p94sv+7XbJaYiNg0Ca00WcoOJvYN B3aPMhu239y4zj53wDmvxcJ1Xly+1UNWSfWUYjFlLJ+mP2efbMVi7cPUMvUCvMTeZic0 P6Qg== X-Gm-Message-State: APt69E2HavBmT7Lf0x8ICiMWp1k4RosFEgjhMOeGtt7cpUEc/4y313cE rlNu81yjQt21x88+HaXM44Q3KTRbZPY= X-Google-Smtp-Source: ADUXVKJBIPYg+4QlOlaDxckN0BOyxG42iEJ0nPNv6uHHtqu1uvgGgLnRl/5M/kDA1HxlS38TzJSzVA== X-Received: by 2002:a17:902:3e3:: with SMTP id d90-v6mr4501169pld.12.1530074035018; Tue, 26 Jun 2018 21:33:55 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:54 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:10 -0700 Message-Id: <20180627043328.11531-18-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::231 Subject: [Qemu-devel] [PATCH v6 17/35] target/arm: Implement SVE floating-point arithmetic with immediate X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 56 ++++++++++++++++++++++++++++ target/arm/sve_helper.c | 69 +++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 75 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 14 +++++++ 4 files changed, 214 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 6089b3a53f..087819ec2b 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -809,6 +809,62 @@ DEF_HELPER_FLAGS_6(sve_fmulx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sve_fmulx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadds_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadds_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fadds_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fsubs_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsubs_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsubs_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmuls_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmuls_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmuls_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fsubrs_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsubrs_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fsubrs_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmaxnms_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxnms_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxnms_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fminnms_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fminnms_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fminnms_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmaxs_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxs_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmaxs_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + +DEF_HELPER_FLAGS_6(sve_fmins_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i64, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 41d8ce6b54..bc23c66221 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2997,6 +2997,75 @@ DO_ZPZZ_FP(sve_fmulx_d, uint64_t, , helper_vfp_mulxd) #undef DO_ZPZZ_FP +/* Three-operand expander, with one scalar operand, controlled by + * a predicate, with the extra float_status parameter. + */ +#define DO_ZPZS_FP(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, uint64_t scalar, \ + void *status, uint32_t desc) \ +{ \ + intptr_t i = simd_oprsz(desc); \ + uint64_t *g = vg; \ + TYPE mm = scalar; \ + do { \ + uint64_t pg = g[(i - 1) >> 6]; \ + do { \ + i -= sizeof(TYPE); \ + if (likely((pg >> (i & 63)) & 1)) { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + *(TYPE *)(vd + H(i)) = OP(nn, mm, status); \ + } \ + } while (i & 63); \ + } while (i != 0); \ +} + +DO_ZPZS_FP(sve_fadds_h, float16, H1_2, float16_add) +DO_ZPZS_FP(sve_fadds_s, float32, H1_4, float32_add) +DO_ZPZS_FP(sve_fadds_d, float64, , float64_add) + +DO_ZPZS_FP(sve_fsubs_h, float16, H1_2, float16_sub) +DO_ZPZS_FP(sve_fsubs_s, float32, H1_4, float32_sub) +DO_ZPZS_FP(sve_fsubs_d, float64, , float64_sub) + +DO_ZPZS_FP(sve_fmuls_h, float16, H1_2, float16_mul) +DO_ZPZS_FP(sve_fmuls_s, float32, H1_4, float32_mul) +DO_ZPZS_FP(sve_fmuls_d, float64, , float64_mul) + +static inline float16 subr_h(float16 a, float16 b, float_status *s) +{ + return float16_sub(b, a, s); +} + +static inline float32 subr_s(float32 a, float32 b, float_status *s) +{ + return float32_sub(b, a, s); +} + +static inline float64 subr_d(float64 a, float64 b, float_status *s) +{ + return float64_sub(b, a, s); +} + +DO_ZPZS_FP(sve_fsubrs_h, float16, H1_2, subr_h) +DO_ZPZS_FP(sve_fsubrs_s, float32, H1_4, subr_s) +DO_ZPZS_FP(sve_fsubrs_d, float64, , subr_d) + +DO_ZPZS_FP(sve_fmaxnms_h, float16, H1_2, float16_maxnum) +DO_ZPZS_FP(sve_fmaxnms_s, float32, H1_4, float32_maxnum) +DO_ZPZS_FP(sve_fmaxnms_d, float64, , float64_maxnum) + +DO_ZPZS_FP(sve_fminnms_h, float16, H1_2, float16_minnum) +DO_ZPZS_FP(sve_fminnms_s, float32, H1_4, float32_minnum) +DO_ZPZS_FP(sve_fminnms_d, float64, , float64_minnum) + +DO_ZPZS_FP(sve_fmaxs_h, float16, H1_2, float16_max) +DO_ZPZS_FP(sve_fmaxs_s, float32, H1_4, float32_max) +DO_ZPZS_FP(sve_fmaxs_d, float64, , float64_max) + +DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min) +DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min) +DO_ZPZS_FP(sve_fmins_d, float64, , float64_min) + /* Fully general two-operand expander, controlled by a predicate, * With the extra float_status parameter. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b028a034fd..499252deff 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -32,6 +32,7 @@ #include "exec/log.h" #include "trace-tcg.h" #include "translate-a64.h" +#include "fpu/softfloat.h" typedef void GVecGen2sFn(unsigned, uint32_t, uint32_t, @@ -3533,6 +3534,80 @@ DO_FP3(FMULX, fmulx) #undef DO_FP3 +typedef void gen_helper_sve_fp2scalar(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_i64, TCGv_ptr, TCGv_i32); + +static void do_fp_scalar(DisasContext *s, int zd, int zn, int pg, bool is_fp16, + TCGv_i64 scalar, gen_helper_sve_fp2scalar *fn) +{ + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr t_zd, t_zn, t_pg, status; + TCGv_i32 desc; + + t_zd = tcg_temp_new_ptr(); + t_zn = tcg_temp_new_ptr(); + t_pg = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, zd)); + tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, zn)); + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg)); + + status = get_fpstatus_ptr(is_fp16); + desc = tcg_const_i32(simd_desc(vsz, vsz, 0)); + fn(t_zd, t_zn, t_pg, scalar, status, desc); + + tcg_temp_free_i32(desc); + tcg_temp_free_ptr(status); + tcg_temp_free_ptr(t_pg); + tcg_temp_free_ptr(t_zn); + tcg_temp_free_ptr(t_zd); +} + +static void do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm, + gen_helper_sve_fp2scalar *fn) +{ + TCGv_i64 temp = tcg_const_i64(imm); + do_fp_scalar(s, a->rd, a->rn, a->pg, a->esz == MO_16, temp, fn); + tcg_temp_free_i64(temp); +} + +#define DO_FP_IMM(NAME, name, const0, const1) \ +static bool trans_##NAME##_zpzi(DisasContext *s, arg_rpri_esz *a, \ + uint32_t insn) \ +{ \ + static gen_helper_sve_fp2scalar * const fns[3] = { \ + gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, \ + gen_helper_sve_##name##_d \ + }; \ + static uint64_t const val[3][2] = { \ + { float16_##const0, float16_##const1 }, \ + { float32_##const0, float32_##const1 }, \ + { float64_##const0, float64_##const1 }, \ + }; \ + if (a->esz == 0) { \ + return false; \ + } \ + if (sve_access_check(s)) { \ + do_fp_imm(s, a, val[a->esz - 1][a->imm], fns[a->esz - 1]); \ + } \ + return true; \ +} + +#define float16_two make_float16(0x4000) +#define float32_two make_float32(0x40000000) +#define float64_two make_float64(0x4000000000000000ULL) + +DO_FP_IMM(FADD, fadds, half, one) +DO_FP_IMM(FSUB, fsubs, half, one) +DO_FP_IMM(FMUL, fmuls, half, two) +DO_FP_IMM(FSUBR, fsubrs, half, one) +DO_FP_IMM(FMAXNM, fmaxnms, zero, one) +DO_FP_IMM(FMINNM, fminnms, zero, one) +DO_FP_IMM(FMAX, fmaxs, zero, one) +DO_FP_IMM(FMIN, fmins, zero, one) + +#undef DO_FP_IMM + static bool do_fp_cmp(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4_ptr *fn) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a1bc6cb395..267eb2dcfc 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -160,6 +160,10 @@ @rdn_pg4 ........ esz:2 .. pg:4 ... ........ rd:5 \ &rpri_esz rn=%reg_movprfx +# Two register operand, one one-bit floating-point operand. +@rdn_i1 ........ esz:2 ......... pg:3 .... imm:1 rd:5 \ + &rpri_esz rn=%reg_movprfx + # Two register operand, one encoded bitmask. @rdn_dbm ........ .. .... dbm:13 rd:5 \ &rr_dbm rn=%reg_movprfx @@ -744,6 +748,16 @@ FMULX 01100101 .. 00 1010 100 ... ..... ..... @rdn_pg_rm FDIV 01100101 .. 00 1100 100 ... ..... ..... @rdm_pg_rn # FDIVR FDIV 01100101 .. 00 1101 100 ... ..... ..... @rdn_pg_rm +# SVE floating-point arithmetic with immediate (predicated) +FADD_zpzi 01100101 .. 011 000 100 ... 0000 . ..... @rdn_i1 +FSUB_zpzi 01100101 .. 011 001 100 ... 0000 . ..... @rdn_i1 +FMUL_zpzi 01100101 .. 011 010 100 ... 0000 . ..... @rdn_i1 +FSUBR_zpzi 01100101 .. 011 011 100 ... 0000 . ..... @rdn_i1 +FMAXNM_zpzi 01100101 .. 011 100 100 ... 0000 . ..... @rdn_i1 +FMINNM_zpzi 01100101 .. 011 101 100 ... 0000 . ..... @rdn_i1 +FMAX_zpzi 01100101 .. 011 110 100 ... 0000 . ..... @rdn_i1 +FMIN_zpzi 01100101 .. 011 111 100 ... 0000 . ..... @rdn_i1 + ### SVE FP Multiply-Add Group # SVE floating-point multiply-accumulate writing addend From patchwork Wed Jun 27 04:33:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935279 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="NVQkaJBR"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr1S1Tc4z9s0w for ; Wed, 27 Jun 2018 14:45:00 +1000 (AEST) Received: from localhost ([::1]:56543 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2K9-0004e3-OZ for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:44:57 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60652) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29X-0004Md-Ql for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29V-0000h6-VJ for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:59 -0400 Received: from mail-pl0-x236.google.com ([2607:f8b0:400e:c01::236]:46480) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29V-0000gB-Jr for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:57 -0400 Received: by mail-pl0-x236.google.com with SMTP id 30-v6so405963pld.13 for ; Tue, 26 Jun 2018 21:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=16MyiFcRJA37Xsd1aBuTec8AVNcB5hz5enLFniZXy2k=; b=NVQkaJBRRzidqQPS4cHQl/kxaUZ72s1JXXrmLmluzEx10NihdThCbr8w6H7KSaMkAV OZCLYhHSM+vs1wgkiCTob2deVU4nCtnHUQ5Q41WXxvFK8M9zY+Ri/JlhX2HUe6EHqjkK 37i99SISk7GhXhvUAP0GWB92bi+6LUqHULT9M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=16MyiFcRJA37Xsd1aBuTec8AVNcB5hz5enLFniZXy2k=; b=MYuguR/abYUKh5XD3M/qLyNBGRgpLgOwLACPMS5HMtDIw5u6zr1N0rqtj85hGjSnAp CSAnR/x6e+aH39U+dmJNFTslAr6kBOc4vDyCVkSPFkEJx5xBh0c8OUnDoagdoIcjdoII za6R56tEW/8y3vJ9Vgo1KaSZcGLmYV1MqWbnsPbc94gaouvBwsmvZIRHIr7AQ6H5IGc/ kzB8i2pxGhKYvwZG1Rvzb6L5JGaZRfgdM/KtFOhHSBfAP/mp//JEVQEOFYmBVH+Vv07B 2EFSHlex9g89ma5P8cOOnasGTevyC+pTfNLg/mXDjN9xgQiBrZrkngG9xakYpQ64Gdgp HVTQ== X-Gm-Message-State: APt69E0Z57fahqTqaHjYKm9HssNccQk5yylk32/0+DFyOyCnpx4hbXGV /WEltgG/MDHMmki7xIZBGfOzfUPeulc= X-Google-Smtp-Source: ADUXVKLKCmBSaWx1gJfTCaq4WiHZJV2XHgz/ZHW9WMrSeTVZqVSz2CXUQCoWde+xspiqEHRiQTIOBg== X-Received: by 2002:a17:902:b08a:: with SMTP id p10-v6mr4571907plr.0.1530074036300; Tue, 26 Jun 2018 21:33:56 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:55 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:11 -0700 Message-Id: <20180627043328.11531-19-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::236 Subject: [Qemu-devel] [PATCH v6 18/35] target/arm: Implement SVE Floating Point Multiply Indexed Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper.h | 14 +++++++++++ target/arm/translate-sve.c | 50 ++++++++++++++++++++++++++++++++++++++ target/arm/vec_helper.c | 48 ++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 19 +++++++++++++++ 4 files changed, 131 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 879a7229e9..56439ac1e4 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -620,6 +620,20 @@ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_ftsmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_fmul_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_fmul_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_fmul_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(gvec_fmla_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 499252deff..b60d47af2c 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3400,6 +3400,56 @@ DO_ZZI(UMIN, umin) #undef DO_ZZI +/* + *** SVE Floating Point Multiply-Add Indexed Group + */ + +static bool trans_FMLA_zzxz(DisasContext *s, arg_FMLA_zzxz *a, uint32_t insn) +{ + static gen_helper_gvec_4_ptr * const fns[3] = { + gen_helper_gvec_fmla_idx_h, + gen_helper_gvec_fmla_idx_s, + gen_helper_gvec_fmla_idx_d, + }; + + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vec_full_reg_offset(s, a->ra), + status, vsz, vsz, (a->index << 1) | a->sub, + fns[a->esz - 1]); + tcg_temp_free_ptr(status); + } + return true; +} + +/* + *** SVE Floating Point Multiply Indexed Group + */ + +static bool trans_FMUL_zzx(DisasContext *s, arg_FMUL_zzx *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[3] = { + gen_helper_gvec_fmul_idx_h, + gen_helper_gvec_fmul_idx_s, + gen_helper_gvec_fmul_idx_d, + }; + + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + status, vsz, vsz, a->index, fns[a->esz - 1]); + tcg_temp_free_ptr(status); + } + return true; +} + /* *** SVE Floating Point Accumulating Reduction Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index f504dd53c8..97af75a61b 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -495,3 +495,51 @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64) #endif #undef DO_3OP + +/* For the indexed ops, SVE applies the index per 128-bit vector segment. + * For AdvSIMD, there is of course only one such vector segment. + */ + +#define DO_MUL_IDX(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \ +{ \ + intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \ + intptr_t idx = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \ + TYPE mm = m[H(i + idx)]; \ + for (j = 0; j < segment; j++) { \ + d[i + j] = TYPE##_mul(n[i + j], mm, stat); \ + } \ + } \ +} + +DO_MUL_IDX(gvec_fmul_idx_h, float16, H2) +DO_MUL_IDX(gvec_fmul_idx_s, float32, H4) +DO_MUL_IDX(gvec_fmul_idx_d, float64, ) + +#undef DO_MUL_IDX + +#define DO_FMLA_IDX(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \ + void *stat, uint32_t desc) \ +{ \ + intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \ + TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1); \ + intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1); \ + TYPE *d = vd, *n = vn, *m = vm, *a = va; \ + op1_neg <<= (8 * sizeof(TYPE) - 1); \ + for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \ + TYPE mm = m[H(i + idx)]; \ + for (j = 0; j < segment; j++) { \ + d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg, \ + mm, a[i + j], 0, stat); \ + } \ + } \ +} + +DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2) +DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4) +DO_FMLA_IDX(gvec_fmla_idx_d, float64, ) + +#undef DO_FMLA_IDX diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 267eb2dcfc..15fa790d5b 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -29,6 +29,7 @@ %imm9_16_10 16:s6 10:3 %size_23 23:2 %dtype_23_13 23:2 13:2 +%index3_22_19 22:1 19:2 # A combination of tsz:imm3 -- extract esize. %tszimm_esz 22:2 5:5 !function=tszimm_esz @@ -716,6 +717,24 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u # SVE integer multiply immediate (unpredicated) MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s +### SVE FP Multiply-Add Indexed Group + +# SVE floating-point multiply-add (indexed) +FMLA_zzxz 01100100 0.1 .. rm:3 00000 sub:1 rn:5 rd:5 \ + ra=%reg_movprfx index=%index3_22_19 esz=1 +FMLA_zzxz 01100100 101 index:2 rm:3 00000 sub:1 rn:5 rd:5 \ + ra=%reg_movprfx esz=2 +FMLA_zzxz 01100100 111 index:1 rm:4 00000 sub:1 rn:5 rd:5 \ + ra=%reg_movprfx esz=3 + +### SVE FP Multiply Indexed Group + +# SVE floating-point multiply (indexed) +FMUL_zzx 01100100 0.1 .. rm:3 001000 rn:5 rd:5 \ + index=%index3_22_19 esz=1 +FMUL_zzx 01100100 101 index:2 rm:3 001000 rn:5 rd:5 esz=2 +FMUL_zzx 01100100 111 index:1 rm:4 001000 rn:5 rd:5 esz=3 + ### SVE FP Accumulating Reduction Group # SVE floating-point serial reduction (predicated) From patchwork Wed Jun 27 04:33:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935292 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="U1Kz1CG4"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrDC3vL5z9s0w for ; Wed, 27 Jun 2018 14:54:19 +1000 (AEST) Received: from localhost ([::1]:56606 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2TB-0003ZP-01 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:54:17 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60693) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29Z-0004Oa-9y for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29X-0000jB-IN for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:01 -0400 Received: from mail-pl0-x22b.google.com ([2607:f8b0:400e:c01::22b]:46470) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29X-0000iB-8C for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:33:59 -0400 Received: by mail-pl0-x22b.google.com with SMTP id 30-v6so406006pld.13 for ; Tue, 26 Jun 2018 21:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=BdZt6i8MbMPKk1Bf9oqfvRlv/ASYX8to6Ot9k54XpXg=; b=U1Kz1CG4ixrx1EE2kt98ega52U/bbyRhh1zAlHWW3YFe/Q04rt/kyGhVdyxna9rqiS qLLxhQ+bdSUxbIryav0Dpscq21GJjJq09dle37xrbqJuBe6swDhos3Tga1SHCHJrtndz o7frISHY2DPtk8TyvFQO4x8pvEJjsV7LEPfF0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=BdZt6i8MbMPKk1Bf9oqfvRlv/ASYX8to6Ot9k54XpXg=; b=Vuvwquh1/EpcIGM6X3InoE0Vx/RO7lzGRoVNekR3d4bWqcquaMwTrrBxYzMbR1rPDO Wl1Q1h8iWmrOlaW55MvW09R6OnFszwF6ms2Llnac1H8Onck9dBvKyF9vfsXRJBvBE+Jt XNn5irLvG8DqajiNSi3z0RnPGhBADft8IhZPNne+LnAhe7cCYSmqG0qrGaKT7HtZr9r5 rbHGQVc1jM24a1V874eTMGBknnDUW+aZVIG0taVxk38hEbhhyrTxEVBvoCZkHTVe9by6 RTjeEq9R7iS4wbn41IK5m4luDHwpCmzAyjFKeuADW7jHMh0fI56/V7p8WMz2iT8FytOb U/dQ== X-Gm-Message-State: APt69E04/sPzG35mmlXPqGOpg0B7ajoL3KQsL+u2fyralRGkLn5NxMw4 BNA2LKrpHZkon2IlsphTL5KajKiEGZA= X-Google-Smtp-Source: ADUXVKIpPm1NTdXEBO80dGGsexu64uWc/U7EuNbi9w0zwcVZI+0tjZjPoL11fHGl/Dy+V+xI7LQM4g== X-Received: by 2002:a17:902:b68c:: with SMTP id c12-v6mr4512421pls.114.1530074038007; Tue, 26 Jun 2018 21:33:58 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:56 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:12 -0700 Message-Id: <20180627043328.11531-20-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::22b Subject: [Qemu-devel] [PATCH v6 19/35] target/arm: Implement SVE FP Fast Reduction Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 35 ++++++++++++++++++++++ target/arm/sve_helper.c | 61 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 57 +++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 8 +++++ 4 files changed, 161 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 087819ec2b..ff69d143a0 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -725,6 +725,41 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_faddv_d, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_fmaxnmv_h, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fmaxnmv_s, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fmaxnmv_d, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_fminnmv_h, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fminnmv_s, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fminnmv_d, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_fmaxv_h, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fmaxv_s, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fmaxv_d, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_fminv_h, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG, + i64, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG, i64, i64, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bc23c66221..4c44d52a23 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2852,6 +2852,67 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } +/* Recursive reduction on a function; + * C.f. the ARM ARM function ReducePredicated. + * + * While it would be possible to write this without the DATA temporary, + * it is much simpler to process the predicate register this way. + * The recursion is bounded to depth 7 (128 fp16 elements), so there's + * little to gain with a more complex non-recursive form. + */ +#define DO_REDUCE(NAME, TYPE, H, FUNC, IDENT) \ +static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \ +{ \ + if (n == 1) { \ + return *data; \ + } else { \ + uintptr_t half = n / 2; \ + TYPE lo = NAME##_reduce(data, status, half); \ + TYPE hi = NAME##_reduce(data + half, status, half); \ + return TYPE##_##FUNC(lo, hi, status); \ + } \ +} \ +uint64_t HELPER(NAME)(void *vn, void *vg, void *vs, uint32_t desc) \ +{ \ + uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_maxsz(desc); \ + TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)]; \ + for (i = 0; i < oprsz; ) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ + do { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + *(TYPE *)((void *)data + i) = (pg & 1 ? nn : IDENT); \ + i += sizeof(TYPE), pg >>= sizeof(TYPE); \ + } while (i & 15); \ + } \ + for (; i < maxsz; i += sizeof(TYPE)) { \ + *(TYPE *)((void *)data + i) = IDENT; \ + } \ + return NAME##_reduce(data, vs, maxsz / sizeof(TYPE)); \ +} + +DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero) +DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero) +DO_REDUCE(sve_faddv_d, float64, , add, float64_zero) + +/* Identity is floatN_default_nan, without the function call. */ +DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00) +DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000) +DO_REDUCE(sve_fminnmv_d, float64, , minnum, 0x7FF8000000000000ULL) + +DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00) +DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000) +DO_REDUCE(sve_fmaxnmv_d, float64, , maxnum, 0x7FF8000000000000ULL) + +DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity) +DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity) +DO_REDUCE(sve_fminv_d, float64, , min, float64_infinity) + +DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity)) +DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity)) +DO_REDUCE(sve_fmaxv_d, float64, , max, float64_chs(float64_infinity)) + +#undef DO_REDUCE + uint64_t HELPER(sve_fadda_h)(uint64_t nn, void *vm, void *vg, void *status, uint32_t desc) { diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b60d47af2c..3b009193a9 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3450,6 +3450,63 @@ static bool trans_FMUL_zzx(DisasContext *s, arg_FMUL_zzx *a, uint32_t insn) return true; } +/* + *** SVE Floating Point Fast Reduction Group + */ + +typedef void gen_helper_fp_reduce(TCGv_i64, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); + +static void do_reduce(DisasContext *s, arg_rpr_esz *a, + gen_helper_fp_reduce *fn) +{ + unsigned vsz = vec_full_reg_size(s); + unsigned p2vsz = pow2ceil(vsz); + TCGv_i32 t_desc = tcg_const_i32(simd_desc(vsz, p2vsz, 0)); + TCGv_ptr t_zn, t_pg, status; + TCGv_i64 temp; + + temp = tcg_temp_new_i64(); + t_zn = tcg_temp_new_ptr(); + t_pg = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn)); + tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg)); + status = get_fpstatus_ptr(a->esz == MO_16); + + fn(temp, t_zn, t_pg, status, t_desc); + tcg_temp_free_ptr(t_zn); + tcg_temp_free_ptr(t_pg); + tcg_temp_free_ptr(status); + tcg_temp_free_i32(t_desc); + + write_fp_dreg(s, a->rd, temp); + tcg_temp_free_i64(temp); +} + +#define DO_VPZ(NAME, name) \ +static bool trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \ +{ \ + static gen_helper_fp_reduce * const fns[3] = { \ + gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, \ + gen_helper_sve_##name##_d, \ + }; \ + if (a->esz == 0) { \ + return false; \ + } \ + if (sve_access_check(s)) { \ + do_reduce(s, a, fns[a->esz - 1]); \ + } \ + return true; \ +} + +DO_VPZ(FADDV, faddv) +DO_VPZ(FMINNMV, fminnmv) +DO_VPZ(FMAXNMV, fmaxnmv) +DO_VPZ(FMINV, fminv) +DO_VPZ(FMAXV, fmaxv) + /* *** SVE Floating Point Accumulating Reduction Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 15fa790d5b..66b0fd0cc4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -735,6 +735,14 @@ FMUL_zzx 01100100 0.1 .. rm:3 001000 rn:5 rd:5 \ FMUL_zzx 01100100 101 index:2 rm:3 001000 rn:5 rd:5 esz=2 FMUL_zzx 01100100 111 index:1 rm:4 001000 rn:5 rd:5 esz=3 +### SVE FP Fast Reduction Group + +FADDV 01100101 .. 000 000 001 ... ..... ..... @rd_pg_rn +FMAXNMV 01100101 .. 000 100 001 ... ..... ..... @rd_pg_rn +FMINNMV 01100101 .. 000 101 001 ... ..... ..... @rd_pg_rn +FMAXV 01100101 .. 000 110 001 ... ..... ..... @rd_pg_rn +FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn + ### SVE FP Accumulating Reduction Group # SVE floating-point serial reduction (predicated) From patchwork Wed Jun 27 04:33:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935288 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="dT7Ss+eC"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrBc6vWxz9s0w for ; Wed, 27 Jun 2018 14:52:56 +1000 (AEST) Received: from localhost ([::1]:56598 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Rq-0002Xn-JW for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:52:54 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60718) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29a-0004S5-98 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29Y-0000kO-UU for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:02 -0400 Received: from mail-pg0-x232.google.com ([2607:f8b0:400e:c05::232]:37196) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29Y-0000jn-KX for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:00 -0400 Received: by mail-pg0-x232.google.com with SMTP id o11-v6so364985pgv.4 for ; Tue, 26 Jun 2018 21:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=A9fH4Y/CsOq2xS27tz8HpWWa5Nt50JQqxNt4bJzisd0=; b=dT7Ss+eCFWTfxQ0PXtUUeme7cGsCttKYzHg2qYGTDQIvwhDWMn6v6sqUzrZ8sqUHE1 pOHlhDSl9RZVboz3EOBVNkNGaGvukBLrjH0ncS2YziYa3EzadFeLKSvURpmSXbjPASbn s2DipVRf5pLXBxOcvLa3MEbpdrk4ZXHjDFRqA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=A9fH4Y/CsOq2xS27tz8HpWWa5Nt50JQqxNt4bJzisd0=; b=ZgL30GHn51lO9fdMLB8O4gPIiVAU3lV7vyXzGwUVEI+5ctpUvv/ehDUTmDM7dUh91U LYEAgFR4DOJmGEzYK6YaQ9Obd40EU7lKEfCVDood1uxBtFzFZp7kwH07dB6mWCqcQvkC UHkysZ19hKElrmh549BLaTWazwh7/Ja64gJnZ9ZjwlabHbuVALSfXulurbe5wJHwSB/n MZGcXO5LOPc/DFCoPo09au08Ou4Lv6CdtrslStCDlESFky6+oHnjP1sJmISFxJxK4uK+ 61EalB5UEUQCXWeIlj6MNVHgqsi3HOxDxemW5UfLQInE253mTvPUfMqiedfy8dOD5yAy WibQ== X-Gm-Message-State: APt69E0T4Gv3iKV2iAiChVENMG2kAH6mThDzzHswxyi5bcjbr2bI7STT g3Iuy0ES9WXrMsh9Hr67otuGlUhJ+gk= X-Google-Smtp-Source: ADUXVKKsF6I+RE8p0Zb3wQAnUAM56/yAKGkYfG5LheQyf8CLxn7xZ0Cfpl+Y285aTEYH0KXoOLEJHw== X-Received: by 2002:a65:61d1:: with SMTP id j17-v6mr3820198pgv.447.1530074039342; Tue, 26 Jun 2018 21:33:59 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:58 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:13 -0700 Message-Id: <20180627043328.11531-21-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::232 Subject: [Qemu-devel] [PATCH v6 20/35] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper.h | 8 +++++++ target/arm/translate-sve.c | 47 ++++++++++++++++++++++++++++++++++++++ target/arm/vec_helper.c | 20 ++++++++++++++++ target/arm/sve.decode | 5 ++++ 4 files changed, 80 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 56439ac1e4..ad9cb6c7d5 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -601,6 +601,14 @@ DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 3b009193a9..1dcc2d38c9 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3507,6 +3507,53 @@ DO_VPZ(FMAXNMV, fmaxnmv) DO_VPZ(FMINV, fminv) DO_VPZ(FMAXV, fmaxv) +/* + *** SVE Floating Point Unary Operations - Unpredicated Group + */ + +static void do_zz_fp(DisasContext *s, arg_rr_esz *a, gen_helper_gvec_2_ptr *fn) +{ + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + status, vsz, vsz, 0, fn); + tcg_temp_free_ptr(status); +} + +static bool trans_FRECPE(DisasContext *s, arg_rr_esz *a, uint32_t insn) +{ + static gen_helper_gvec_2_ptr * const fns[3] = { + gen_helper_gvec_frecpe_h, + gen_helper_gvec_frecpe_s, + gen_helper_gvec_frecpe_d, + }; + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + do_zz_fp(s, a, fns[a->esz - 1]); + } + return true; +} + +static bool trans_FRSQRTE(DisasContext *s, arg_rr_esz *a, uint32_t insn) +{ + static gen_helper_gvec_2_ptr * const fns[3] = { + gen_helper_gvec_frsqrte_h, + gen_helper_gvec_frsqrte_s, + gen_helper_gvec_frsqrte_d, + }; + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + do_zz_fp(s, a, fns[a->esz - 1]); + } + return true; +} + /* *** SVE Floating Point Accumulating Reduction Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 97af75a61b..073e5c58e7 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -427,6 +427,26 @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +#define DO_2OP(NAME, FUNC, TYPE) \ +void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \ +{ \ + intptr_t i, oprsz = simd_oprsz(desc); \ + TYPE *d = vd, *n = vn; \ + for (i = 0; i < oprsz / sizeof(TYPE); i++) { \ + d[i] = FUNC(n[i], stat); \ + } \ +} + +DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16) +DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32) +DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64) + +DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16) +DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32) +DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64) + +#undef DO_2OP + /* Floating-point trigonometric starting value. * See the ARM ARM pseudocode function FPTrigSMul. */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 66b0fd0cc4..ca93bdb2b3 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -743,6 +743,11 @@ FMINNMV 01100101 .. 000 101 001 ... ..... ..... @rd_pg_rn FMAXV 01100101 .. 000 110 001 ... ..... ..... @rd_pg_rn FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn +## SVE Floating Point Unary Operations - Unpredicated Group + +FRECPE 01100101 .. 001 110 001100 ..... ..... @rd_rn +FRSQRTE 01100101 .. 001 111 001100 ..... ..... @rd_rn + ### SVE FP Accumulating Reduction Group # SVE floating-point serial reduction (predicated) From patchwork Wed Jun 27 04:33:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935295 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="OoAxk4FM"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrGX5pCHz9s0w for ; Wed, 27 Jun 2018 14:56:20 +1000 (AEST) Received: from localhost ([::1]:56627 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2V8-0005Ux-Ct for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:56:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60797) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29d-0004Vb-6v for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29a-0000lo-AG for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:05 -0400 Received: from mail-pf0-x22e.google.com ([2607:f8b0:400e:c00::22e]:35710) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29a-0000kx-06 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:02 -0400 Received: by mail-pf0-x22e.google.com with SMTP id c22-v6so389951pfi.2 for ; Tue, 26 Jun 2018 21:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=XcxUc+IuwOCzeTRePaPETaBJTNxjUkcwGFz2+hndW5M=; b=OoAxk4FMZjUxRPMbhomeRd+Rktp/UroyL+JkktWjradrcXTJEV8PHS2yqJ+RE2/qSG 889jFiPb4NqiOXjuevxyqK1MmB3IJwjBrly8Z5uaY8rnt9ZgO3nSBWmhP9Bu9aWRGRwa Ij7GBcc1Occp5hNTdAzqpTLUv8MqpqM2t2Rnk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=XcxUc+IuwOCzeTRePaPETaBJTNxjUkcwGFz2+hndW5M=; b=shxERAUMisJVlpTZ+QdoNHEXb2+7qhDr73CVYswZDmv8O8nGQTCTO3cmQDUcNiidn1 XHEL8vKCsMaUwF8CDB2+t3L4QxQwsaK7AZpDP78wpY+77GGf2va/LOL/I0dMko6RLohB 4B3lvfEsKHiIDTumthPHi0h/IwhOgpizPRAxofackIwlXIrzncdq2NABspSMoUkSylRw zrNKQJlXeuTpn3+QBp2KXqPGyJ4Nz225LN+DUI6Xf1hAU+LAeVf7T/ohl1ZXhuxf8W7m KC23vwGW9es7c56mEbqqtUREivj3SyBh0ZWNps3C4W1HC3+Nurewte/u+QHqMJcC/Gc4 HFbg== X-Gm-Message-State: APt69E1pkOYG4oGJ0MAvwcxEBYbck5ZC70PxNwdOJ2gx8kNsQBhvmOV0 k9BL0c7n/AhdCQeTImIrlEGUZ0KtPjw= X-Google-Smtp-Source: ADUXVKI3fz5gv1TtaAwkGa57wv/352So72PtxxqC7FSW0Fn6qNGiCrkm+2LPqg7K/DaWxR+y7lds6g== X-Received: by 2002:a65:43cb:: with SMTP id n11-v6mr3721372pgp.234.1530074040658; Tue, 26 Jun 2018 21:34:00 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.33.59 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:33:59 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:14 -0700 Message-Id: <20180627043328.11531-22-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22e Subject: [Qemu-devel] [PATCH v6 21/35] target/arm: Implement SVE FP Compare with Zero Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 42 +++++++++++++++++++++++++++++++++++++ target/arm/sve_helper.c | 43 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 43 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 10 +++++++++ 4 files changed, 138 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index ff69d143a0..44a98440c9 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -767,6 +767,48 @@ DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_fadda_d, TCG_CALL_NO_RWG, i64, i64, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmge0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmge0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmge0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcmgt0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmgt0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmgt0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcmlt0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmlt0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmlt0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcmle0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmle0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmle0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcmeq0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmeq0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmeq0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcmne0_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmne0_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcmne0_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4c44d52a23..0486cb1e5e 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3362,6 +3362,8 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, \ #define DO_FCMGE(TYPE, X, Y, ST) TYPE##_compare(Y, X, ST) <= 0 #define DO_FCMGT(TYPE, X, Y, ST) TYPE##_compare(Y, X, ST) < 0 +#define DO_FCMLE(TYPE, X, Y, ST) TYPE##_compare(X, Y, ST) <= 0 +#define DO_FCMLT(TYPE, X, Y, ST) TYPE##_compare(X, Y, ST) < 0 #define DO_FCMEQ(TYPE, X, Y, ST) TYPE##_compare_quiet(X, Y, ST) == 0 #define DO_FCMNE(TYPE, X, Y, ST) TYPE##_compare_quiet(X, Y, ST) != 0 #define DO_FCMUO(TYPE, X, Y, ST) \ @@ -3385,6 +3387,47 @@ DO_FPCMP_PPZZ_ALL(sve_facgt, DO_FACGT) #undef DO_FPCMP_PPZZ_H #undef DO_FPCMP_PPZZ +/* One operand floating-point comparison against zero, controlled + * by a predicate. + */ +#define DO_FPCMP_PPZ0(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, \ + void *status, uint32_t desc) \ +{ \ + intptr_t i = simd_oprsz(desc), j = (i - 1) >> 6; \ + uint64_t *d = vd, *g = vg; \ + do { \ + uint64_t out = 0, pg = g[j]; \ + do { \ + i -= sizeof(TYPE), out <<= sizeof(TYPE); \ + if ((pg >> (i & 63)) & 1) { \ + TYPE nn = *(TYPE *)(vn + H(i)); \ + out |= OP(TYPE, nn, 0, status); \ + } \ + } while (i & 63); \ + d[j--] = out; \ + } while (i > 0); \ +} + +#define DO_FPCMP_PPZ0_H(NAME, OP) \ + DO_FPCMP_PPZ0(NAME##_h, float16, H1_2, OP) +#define DO_FPCMP_PPZ0_S(NAME, OP) \ + DO_FPCMP_PPZ0(NAME##_s, float32, H1_4, OP) +#define DO_FPCMP_PPZ0_D(NAME, OP) \ + DO_FPCMP_PPZ0(NAME##_d, float64, , OP) + +#define DO_FPCMP_PPZ0_ALL(NAME, OP) \ + DO_FPCMP_PPZ0_H(NAME, OP) \ + DO_FPCMP_PPZ0_S(NAME, OP) \ + DO_FPCMP_PPZ0_D(NAME, OP) + +DO_FPCMP_PPZ0_ALL(sve_fcmge0, DO_FCMGE) +DO_FPCMP_PPZ0_ALL(sve_fcmgt0, DO_FCMGT) +DO_FPCMP_PPZ0_ALL(sve_fcmle0, DO_FCMLE) +DO_FPCMP_PPZ0_ALL(sve_fcmlt0, DO_FCMLT) +DO_FPCMP_PPZ0_ALL(sve_fcmeq0, DO_FCMEQ) +DO_FPCMP_PPZ0_ALL(sve_fcmne0, DO_FCMNE) + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 1dcc2d38c9..cfee256be9 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3554,6 +3554,49 @@ static bool trans_FRSQRTE(DisasContext *s, arg_rr_esz *a, uint32_t insn) return true; } +/* + *** SVE Floating Point Compare with Zero Group + */ + +static void do_ppz_fp(DisasContext *s, arg_rpr_esz *a, + gen_helper_gvec_3_ptr *fn) +{ + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + + tcg_gen_gvec_3_ptr(pred_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + pred_full_reg_offset(s, a->pg), + status, vsz, vsz, 0, fn); + tcg_temp_free_ptr(status); +} + +#define DO_PPZ(NAME, name) \ +static bool trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \ +{ \ + static gen_helper_gvec_3_ptr * const fns[3] = { \ + gen_helper_sve_##name##_h, \ + gen_helper_sve_##name##_s, \ + gen_helper_sve_##name##_d, \ + }; \ + if (a->esz == 0) { \ + return false; \ + } \ + if (sve_access_check(s)) { \ + do_ppz_fp(s, a, fns[a->esz - 1]); \ + } \ + return true; \ +} + +DO_PPZ(FCMGE_ppz0, fcmge0) +DO_PPZ(FCMGT_ppz0, fcmgt0) +DO_PPZ(FCMLE_ppz0, fcmle0) +DO_PPZ(FCMLT_ppz0, fcmlt0) +DO_PPZ(FCMEQ_ppz0, fcmeq0) +DO_PPZ(FCMNE_ppz0, fcmne0) + +#undef DO_PPZ + /* *** SVE Floating Point Accumulating Reduction Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index ca93bdb2b3..a774becd6c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -140,6 +140,7 @@ # One register operand, with governing predicate, vector element size @rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz @rd_pg4_pn ........ esz:2 ... ... .. pg:4 . rn:4 rd:5 &rpr_esz +@pd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 . rd:4 &rpr_esz # One register operand, with governing predicate, no vector element size @rd_pg_rn_e0 ........ .. ... ... ... pg:3 rn:5 rd:5 &rpr_esz esz=0 @@ -748,6 +749,15 @@ FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn FRECPE 01100101 .. 001 110 001100 ..... ..... @rd_rn FRSQRTE 01100101 .. 001 111 001100 ..... ..... @rd_rn +### SVE FP Compare with Zero Group + +FCMGE_ppz0 01100101 .. 0100 00 001 ... ..... 0 .... @pd_pg_rn +FCMGT_ppz0 01100101 .. 0100 00 001 ... ..... 1 .... @pd_pg_rn +FCMLT_ppz0 01100101 .. 0100 01 001 ... ..... 0 .... @pd_pg_rn +FCMLE_ppz0 01100101 .. 0100 01 001 ... ..... 1 .... @pd_pg_rn +FCMEQ_ppz0 01100101 .. 0100 10 001 ... ..... 0 .... @pd_pg_rn +FCMNE_ppz0 01100101 .. 0100 11 001 ... ..... 0 .... @pd_pg_rn + ### SVE FP Accumulating Reduction Group # SVE floating-point serial reduction (predicated) From patchwork Wed Jun 27 04:33:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935281 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="OdXX1Dm4"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr3y6D7gz9s0w for ; Wed, 27 Jun 2018 14:47:10 +1000 (AEST) Received: from localhost ([::1]:56567 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2ME-0006aZ-4K for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:47:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60794) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29d-0004VI-63 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29b-0000oN-E3 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:05 -0400 Received: from mail-pl0-x231.google.com ([2607:f8b0:400e:c01::231]:40865) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29b-0000mi-55 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:03 -0400 Received: by mail-pl0-x231.google.com with SMTP id t6-v6so413248plo.7 for ; Tue, 26 Jun 2018 21:34:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=byKdj6HVM6KPhixbbVwIcI/5hCpEok+1J/IcwVGk+xs=; b=OdXX1Dm4dTgWBJZHV6pMZl1YbdHcesjJlszOoiSq5/AqlyS4PjRQavT3U9Qu+VydpH DnQsyubLH9C+9sIndlpd+uNrFaDvil9ojEucfcaV75LyrdC9dZOUL+gQoPSbbOVEDRlL t/Z9CdP46cH23n5zxqcuhC8YzFUQCZuLWI0Go= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=byKdj6HVM6KPhixbbVwIcI/5hCpEok+1J/IcwVGk+xs=; b=ptpRd4ixsKt25o6DW0f6KDpXvwJ1TpLe2JQJnHfenghz235kvOWsGVscINEoL9M2bP vuY7NO/JmQC2uW6ZIDWa2CVtw6Sgz0Ziif5hqCmSoic8nCPtYRXzhEjtNPfazt+Xnlsk yzmc/MnDH7L2vBV/Oyh+Vxu42wz8WiJMES2lXD9IOtIVEFLp5vc5PDsgW16vhDFn/pC0 yogw2DPiJTputNql2XMazlBrKuXlrrhYAF4hm0HNqRJFIqtOyCKji5nU0FNdnkBWwUyV pMoQ+Yw2lhN0eHknpwjMaAUSqhTrPZw2VMzr/nzj62HAlFlRln2uxDOyND+/cBBJydu1 5kQw== X-Gm-Message-State: APt69E2Xd9mS4hpSK1kMrwPbEZd37PIb9SjAdgIN3tXKRsjeWVM9eLVL JD3SOWkMUdcpEkH/OL81yaJ4zGT7pTw= X-Google-Smtp-Source: ADUXVKJXKLG3I6nYSPCd+24bga/BhEr+t3DEjMyicMHSYnCIOCIkFa6RlnDCVNaGrYXXVUCYaVysYA== X-Received: by 2002:a17:902:9681:: with SMTP id n1-v6mr4593204plp.244.1530074041940; Tue, 26 Jun 2018 21:34:01 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.00 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:01 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:15 -0700 Message-Id: <20180627043328.11531-23-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::231 Subject: [Qemu-devel] [PATCH v6 22/35] target/arm: Implement SVE floating-point trig multiply-add coefficient X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 4 +++ target/arm/sve_helper.c | 70 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 27 +++++++++++++++ target/arm/sve.decode | 3 ++ 4 files changed, 104 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 44a98440c9..aca137fc37 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1037,6 +1037,10 @@ DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ftmad_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ftmad_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_ftmad_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 0486cb1e5e..79358c804b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3428,6 +3428,76 @@ DO_FPCMP_PPZ0_ALL(sve_fcmlt0, DO_FCMLT) DO_FPCMP_PPZ0_ALL(sve_fcmeq0, DO_FCMEQ) DO_FPCMP_PPZ0_ALL(sve_fcmne0, DO_FCMNE) +/* FP Trig Multiply-Add. */ + +void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm, void *vs, uint32_t desc) +{ + static const float16 coeff[16] = { + 0x3c00, 0xb155, 0x2030, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, + 0x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, + }; + intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16); + intptr_t x = simd_data(desc); + float16 *d = vd, *n = vn, *m = vm; + for (i = 0; i < opr_sz; i++) { + float16 mm = m[i]; + intptr_t xx = x; + if (float16_is_neg(mm)) { + mm = float16_abs(mm); + xx += 8; + } + d[i] = float16_muladd(n[i], mm, coeff[xx], 0, vs); + } +} + +void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm, void *vs, uint32_t desc) +{ + static const float32 coeff[16] = { + 0x3f800000, 0xbe2aaaab, 0x3c088886, 0xb95008b9, + 0x36369d6d, 0x00000000, 0x00000000, 0x00000000, + 0x3f800000, 0xbf000000, 0x3d2aaaa6, 0xbab60705, + 0x37cd37cc, 0x00000000, 0x00000000, 0x00000000, + }; + intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32); + intptr_t x = simd_data(desc); + float32 *d = vd, *n = vn, *m = vm; + for (i = 0; i < opr_sz; i++) { + float32 mm = m[i]; + intptr_t xx = x; + if (float32_is_neg(mm)) { + mm = float32_abs(mm); + xx += 8; + } + d[i] = float32_muladd(n[i], mm, coeff[xx], 0, vs); + } +} + +void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm, void *vs, uint32_t desc) +{ + static const float64 coeff[16] = { + 0x3ff0000000000000ull, 0xbfc5555555555543ull, + 0x3f8111111110f30cull, 0xbf2a01a019b92fc6ull, + 0x3ec71de351f3d22bull, 0xbe5ae5e2b60f7b91ull, + 0x3de5d8408868552full, 0x0000000000000000ull, + 0x3ff0000000000000ull, 0xbfe0000000000000ull, + 0x3fa5555555555536ull, 0xbf56c16c16c13a0bull, + 0x3efa01a019b1e8d8ull, 0xbe927e4f7282f468ull, + 0x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull, + }; + intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64); + intptr_t x = simd_data(desc); + float64 *d = vd, *n = vn, *m = vm; + for (i = 0; i < opr_sz; i++) { + float64 mm = m[i]; + intptr_t xx = x; + if (float64_is_neg(mm)) { + mm = float64_abs(mm); + xx += 8; + } + d[i] = float64_muladd(n[i], mm, coeff[xx], 0, vs); + } +} + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index cfee256be9..a86ebc0a91 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3597,6 +3597,33 @@ DO_PPZ(FCMNE_ppz0, fcmne0) #undef DO_PPZ +/* + *** SVE floating-point trig multiply-add coefficient + */ + +static bool trans_FTMAD(DisasContext *s, arg_FTMAD *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[3] = { + gen_helper_sve_ftmad_h, + gen_helper_sve_ftmad_s, + gen_helper_sve_ftmad_d, + }; + + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + status, vsz, vsz, a->imm, fns[a->esz - 1]); + tcg_temp_free_ptr(status); + } + return true; +} + /* *** SVE Floating Point Accumulating Reduction Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a774becd6c..fdcc252eaa 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -800,6 +800,9 @@ FMINNM_zpzi 01100101 .. 011 101 100 ... 0000 . ..... @rdn_i1 FMAX_zpzi 01100101 .. 011 110 100 ... 0000 . ..... @rdn_i1 FMIN_zpzi 01100101 .. 011 111 100 ... 0000 . ..... @rdn_i1 +# SVE floating-point trig multiply-add coefficient +FTMAD 01100101 esz:2 010 imm:3 100000 rm:5 rd:5 rn=%reg_movprfx + ### SVE FP Multiply-Add Group # SVE floating-point multiply-accumulate writing addend From patchwork Wed Jun 27 04:33:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935287 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Jtn+ygzI"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr9M54Hhz9s0w for ; Wed, 27 Jun 2018 14:51:51 +1000 (AEST) Received: from localhost ([::1]:56594 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Qn-0001lt-6P for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:51:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60830) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29e-0004X0-Co for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29d-0000qg-1P for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:06 -0400 Received: from mail-pf0-x230.google.com ([2607:f8b0:400e:c00::230]:40028) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29c-0000ph-Pm for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:04 -0400 Received: by mail-pf0-x230.google.com with SMTP id z24-v6so385776pfe.7 for ; Tue, 26 Jun 2018 21:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ARHrlhgg1qoi/kKQH76w+UP/4e6kfGk5NvSiCi8sUrE=; b=Jtn+ygzIuXNiS/zIzmIn2rYUzjEF8F8Ym7d0YrquinSL7nWmuRADgOJGW+twiLp2x2 9H87RD0jAVOoR9ewWeCKJY1T24y3wb3IQC9YJNE+yUQRBn/QBgm7xtdJPfCZV6MXm3M5 fXStuFY8vOJOtt8HLtpaYE95eSwJd3tDESyFE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ARHrlhgg1qoi/kKQH76w+UP/4e6kfGk5NvSiCi8sUrE=; b=gUq2ZE2+sp0l60ADRhRnLUbcazch7d4y41jN0ujF8VJSO7nQGGIW8T1/7lC0F7TICl 7DbDXi5G3Zo6a3w/4u5CsEBE81r8JO/yRD04dDC85BW3ajakqpX1JdLJoosGP57FZUYA dGyh8AvJlUm7jFd5rnFtS0pd14vI+De/jbkJw4rw2hm9T6tIPtkvrWTVLMgwcMeQZcFD uP/1ViuPqbSULJD6JEqc7lWorVCfR4Tszr/xw6/jK8fEJLtRAKcJzyVBm+yu/HfnMWFk nhRZLiZJWU1x1uX/vVoHfZl+l5k6f/72AuNlVfX/zcFsk2zgL1oC3KHYOoOHoYsefTxt RPWg== X-Gm-Message-State: APt69E3fNSPZ3xxOBMsDXCFxgGczbLy0gfS+u5vVyb+skGyIyK0DJ0Io bzt5S7BouCBGfH+jyDDpIXz2TyBJBdA= X-Google-Smtp-Source: AAOMgpe68rf1msbCAk5TGGkWl/GtPKvsfJMnPG1HB6svmXuQ+Im5Yd2EqFS7ihIeMZW4ZuZd7dA6nQ== X-Received: by 2002:a62:4ad3:: with SMTP id c80-v6mr4267754pfj.23.1530074043517; Tue, 26 Jun 2018 21:34:03 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:02 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:16 -0700 Message-Id: <20180627043328.11531-24-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::230 Subject: [Qemu-devel] [PATCH v6 23/35] target/arm: Implement SVE floating-point convert precision X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Squish fz16 a-la vfp_fcvt_f16_to_f32 --- target/arm/helper-sve.h | 13 +++++++++ target/arm/sve_helper.c | 55 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 30 +++++++++++++++++++++ target/arm/sve.decode | 8 ++++++ 4 files changed, 106 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index aca137fc37..4c379dbb05 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -942,6 +942,19 @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_hs, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 79358c804b..4b36c1eecf 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3147,6 +3147,61 @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \ } while (i != 0); \ } +/* SVE fp16 conversions always use IEEE mode. Like AdvSIMD, they ignore + * FZ16. When converting from fp16, this affects flushing input denormals; + * when converting to fp16, this affects flushing output denormals. + */ +static inline float32 sve_f16_to_f32(float16 f, float_status *fpst) +{ + flag save = get_flush_inputs_to_zero(fpst); + float32 ret; + + set_flush_inputs_to_zero(false, fpst); + ret = float16_to_float32(f, true, fpst); + set_flush_inputs_to_zero(save, fpst); + return ret; +} + +static inline float64 sve_f16_to_f64(float16 f, float_status *fpst) +{ + flag save = get_flush_inputs_to_zero(fpst); + float64 ret; + + set_flush_inputs_to_zero(false, fpst); + ret = float16_to_float64(f, true, fpst); + set_flush_inputs_to_zero(save, fpst); + return ret; +} + +static inline float16 sve_f32_to_f16(float32 f, float_status *fpst) +{ + flag save = get_flush_to_zero(fpst); + float16 ret; + + set_flush_to_zero(false, fpst); + ret = float32_to_float16(f, true, fpst); + set_flush_to_zero(save, fpst); + return ret; +} + +static inline float16 sve_f64_to_f16(float64 f, float_status *fpst) +{ + flag save = get_flush_to_zero(fpst); + float16 ret; + + set_flush_to_zero(false, fpst); + ret = float64_to_float16(f, true, fpst); + set_flush_to_zero(save, fpst); + return ret; +} + +DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16) +DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32) +DO_ZPZ_FP(sve_fcvt_dh, uint64_t, , sve_f64_to_f16) +DO_ZPZ_FP(sve_fcvt_hd, uint64_t, , sve_f16_to_f64) +DO_ZPZ_FP(sve_fcvt_ds, uint64_t, , float64_to_float32) +DO_ZPZ_FP(sve_fcvt_sd, uint64_t, , float32_to_float64) + DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16) DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16) DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index a86ebc0a91..37ad1c9459 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3940,6 +3940,36 @@ static bool do_zpz_ptr(DisasContext *s, int rd, int rn, int pg, return true; } +static bool trans_FCVT_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_sh); +} + +static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs); +} + +static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_dh); +} + +static bool trans_FCVT_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hd); +} + +static bool trans_FCVT_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_ds); +} + +static bool trans_FCVT_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_sd); +} + static bool trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) { return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh); diff --git a/target/arm/sve.decode b/target/arm/sve.decode index fdcc252eaa..18c174e92d 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -821,6 +821,14 @@ FNMLS_zpzzz 01100101 .. 1 ..... 111 ... ..... ..... @rdn_pg_rm_ra ### SVE FP Unary Operations Predicated Group +# SVE floating-point convert precision +FCVT_sh 01100101 10 0010 00 101 ... ..... ..... @rd_pg_rn_e0 +FCVT_hs 01100101 10 0010 01 101 ... ..... ..... @rd_pg_rn_e0 +FCVT_dh 01100101 11 0010 00 101 ... ..... ..... @rd_pg_rn_e0 +FCVT_hd 01100101 11 0010 01 101 ... ..... ..... @rd_pg_rn_e0 +FCVT_ds 01100101 11 0010 10 101 ... ..... ..... @rd_pg_rn_e0 +FCVT_sd 01100101 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0 + # SVE integer convert to floating-point SCVTF_hh 01100101 01 010 01 0 101 ... ..... ..... @rd_pg_rn_e0 SCVTF_sh 01100101 01 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 From patchwork Wed Jun 27 04:33:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935298 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="bo4xU/2m"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrKm3MbGz9s0w for ; Wed, 27 Jun 2018 14:59:08 +1000 (AEST) Received: from localhost ([::1]:56643 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Xq-0007kJ-4N for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:59:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60873) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29g-0004XW-C1 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29e-0000sr-Ik for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:08 -0400 Received: from mail-pl0-x22d.google.com ([2607:f8b0:400e:c01::22d]:43435) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29e-0000rX-9o for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:06 -0400 Received: by mail-pl0-x22d.google.com with SMTP id c41-v6so408586plj.10 for ; Tue, 26 Jun 2018 21:34:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=m8EeR13dv640yuyK0rXXtqtq5xr8hnRzuIVWIpMzTzs=; b=bo4xU/2m3rutKw1oxt+RpVEDQ0EJbQ/ngm2n+vUodzWYhgPZT5nVW9hEgCGgstsZZD OY5dq8IlRvc8ijj8Y+98k6LYqr99d1UVLOJvsQj7MKJYr7229VRTt280o2PyW1iwidB5 zguhdnB2saDa103+GW1QsdszQ1yyTpkZoy8Cw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=m8EeR13dv640yuyK0rXXtqtq5xr8hnRzuIVWIpMzTzs=; b=dK/gCWLZa9cQWGZWGki8tYXp9XphEIrPHObVcLb+sAo3YPPuEaITAgXo8vxh0UQ9/4 7wRQe9wu1Adc30d2v4ne1g26aUux+QxOA6/uiEtS/jfUtru2Rq+Aqbyx4Odo+3rbHUWx eKYTX8Pva6dwkPH1TwnO+tr2tYQtTIQ4cSKygDXytR6ECuNAVqaxCDVcUOaDfBKaNVBV lmVi2kvG1huHyYGY2zomCZxuutNBMeQko0OkXP+zWphNpupj0MS9gtgb2vB24YdcQTRM D4FJAH6/STVZTNxJ631VbObTgwsVH4NJMJDRZSi8wK2ZtRqJf2LhjosdDMoLkn1sjqoM f4fg== X-Gm-Message-State: APt69E09PDO2B7xkXzE2YJ3/gHOdMBikrZ/pMs3a7ZKoX82dSNGnvUFF +k6shts3F3ngUL2Y4hiVqvKdwSSg8l8= X-Google-Smtp-Source: ADUXVKL1RN/z1r5VzGP82RlGeb4ScUpRP0FjMmo9WAdWv/FU2c3ebhesx7bYJwFBw2Jizk/x1I3ndw== X-Received: by 2002:a17:902:1566:: with SMTP id b35-v6mr4486419plh.107.1530074044958; Tue, 26 Jun 2018 21:34:04 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.03 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:04 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:17 -0700 Message-Id: <20180627043328.11531-25-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::22d Subject: [Qemu-devel] [PATCH v6 24/35] target/arm: Implement SVE floating-point convert to integer X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 30 +++++++++++++ target/arm/helper.h | 12 +++--- target/arm/helper.c | 2 +- target/arm/sve_helper.c | 88 ++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 70 ++++++++++++++++++++++++++++++ target/arm/sve.decode | 16 +++++++ 6 files changed, 211 insertions(+), 7 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 4c379dbb05..37fa9eb9bb 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -955,6 +955,36 @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_hs, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_ss, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_hd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzs_dd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fcvtzu_hh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_hs, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_ss, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_hd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fcvtzu_dd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/helper.h b/target/arm/helper.h index ad9cb6c7d5..8607077dda 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -134,12 +134,12 @@ DEF_HELPER_2(vfp_touid, i32, f64, ptr) DEF_HELPER_2(vfp_touizh, i32, f16, ptr) DEF_HELPER_2(vfp_touizs, i32, f32, ptr) DEF_HELPER_2(vfp_touizd, i32, f64, ptr) -DEF_HELPER_2(vfp_tosih, i32, f16, ptr) -DEF_HELPER_2(vfp_tosis, i32, f32, ptr) -DEF_HELPER_2(vfp_tosid, i32, f64, ptr) -DEF_HELPER_2(vfp_tosizh, i32, f16, ptr) -DEF_HELPER_2(vfp_tosizs, i32, f32, ptr) -DEF_HELPER_2(vfp_tosizd, i32, f64, ptr) +DEF_HELPER_2(vfp_tosih, s32, f16, ptr) +DEF_HELPER_2(vfp_tosis, s32, f32, ptr) +DEF_HELPER_2(vfp_tosid, s32, f64, ptr) +DEF_HELPER_2(vfp_tosizh, s32, f16, ptr) +DEF_HELPER_2(vfp_tosizs, s32, f32, ptr) +DEF_HELPER_2(vfp_tosizd, s32, f64, ptr) DEF_HELPER_3(vfp_toshs_round_to_zero, i32, f32, i32, ptr) DEF_HELPER_3(vfp_tosls_round_to_zero, i32, f32, i32, ptr) diff --git a/target/arm/helper.c b/target/arm/helper.c index 1248d84e6f..a36f5b1899 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -11360,7 +11360,7 @@ ftype HELPER(name)(uint32_t x, void *fpstp) \ } #define CONV_FTOI(name, ftype, fsz, sign, round) \ -uint32_t HELPER(name)(ftype x, void *fpstp) \ +sign##int32_t HELPER(name)(ftype x, void *fpstp) \ { \ float_status *fpst = fpstp; \ if (float##fsz##_is_any_nan(x)) { \ diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4b36c1eecf..b6421ec19c 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3195,6 +3195,78 @@ static inline float16 sve_f64_to_f16(float64 f, float_status *fpst) return ret; } +static inline int16_t vfp_float16_to_int16_rtz(float16 f, float_status *s) +{ + if (float16_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float16_to_int16_round_to_zero(f, s); +} + +static inline int64_t vfp_float16_to_int64_rtz(float16 f, float_status *s) +{ + if (float16_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float16_to_int64_round_to_zero(f, s); +} + +static inline int64_t vfp_float32_to_int64_rtz(float32 f, float_status *s) +{ + if (float32_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float32_to_int64_round_to_zero(f, s); +} + +static inline int64_t vfp_float64_to_int64_rtz(float64 f, float_status *s) +{ + if (float64_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float64_to_int64_round_to_zero(f, s); +} + +static inline uint16_t vfp_float16_to_uint16_rtz(float16 f, float_status *s) +{ + if (float16_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float16_to_uint16_round_to_zero(f, s); +} + +static inline uint64_t vfp_float16_to_uint64_rtz(float16 f, float_status *s) +{ + if (float16_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float16_to_uint64_round_to_zero(f, s); +} + +static inline uint64_t vfp_float32_to_uint64_rtz(float32 f, float_status *s) +{ + if (float32_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float32_to_uint64_round_to_zero(f, s); +} + +static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s) +{ + if (float64_is_any_nan(f)) { + float_raise(float_flag_invalid, s); + return 0; + } + return float64_to_uint64_round_to_zero(f, s); +} + DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16) DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32) DO_ZPZ_FP(sve_fcvt_dh, uint64_t, , sve_f64_to_f16) @@ -3202,6 +3274,22 @@ DO_ZPZ_FP(sve_fcvt_hd, uint64_t, , sve_f16_to_f64) DO_ZPZ_FP(sve_fcvt_ds, uint64_t, , float64_to_float32) DO_ZPZ_FP(sve_fcvt_sd, uint64_t, , float32_to_float64) +DO_ZPZ_FP(sve_fcvtzs_hh, uint16_t, H1_2, vfp_float16_to_int16_rtz) +DO_ZPZ_FP(sve_fcvtzs_hs, uint32_t, H1_4, helper_vfp_tosizh) +DO_ZPZ_FP(sve_fcvtzs_ss, uint32_t, H1_4, helper_vfp_tosizs) +DO_ZPZ_FP(sve_fcvtzs_hd, uint64_t, , vfp_float16_to_int64_rtz) +DO_ZPZ_FP(sve_fcvtzs_sd, uint64_t, , vfp_float32_to_int64_rtz) +DO_ZPZ_FP(sve_fcvtzs_ds, uint64_t, , helper_vfp_tosizd) +DO_ZPZ_FP(sve_fcvtzs_dd, uint64_t, , vfp_float64_to_int64_rtz) + +DO_ZPZ_FP(sve_fcvtzu_hh, uint16_t, H1_2, vfp_float16_to_uint16_rtz) +DO_ZPZ_FP(sve_fcvtzu_hs, uint32_t, H1_4, helper_vfp_touizh) +DO_ZPZ_FP(sve_fcvtzu_ss, uint32_t, H1_4, helper_vfp_touizs) +DO_ZPZ_FP(sve_fcvtzu_hd, uint64_t, , vfp_float16_to_uint64_rtz) +DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t, , vfp_float32_to_uint64_rtz) +DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t, , helper_vfp_touizd) +DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t, , vfp_float64_to_uint64_rtz) + DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16) DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16) DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 37ad1c9459..be589a1cf2 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3970,6 +3970,76 @@ static bool trans_FCVT_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_sd); } +static bool trans_FCVTZS_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hh); +} + +static bool trans_FCVTZU_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hh); +} + +static bool trans_FCVTZS_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hs); +} + +static bool trans_FCVTZU_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hs); +} + +static bool trans_FCVTZS_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hd); +} + +static bool trans_FCVTZU_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hd); +} + +static bool trans_FCVTZS_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_ss); +} + +static bool trans_FCVTZU_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_ss); +} + +static bool trans_FCVTZS_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_sd); +} + +static bool trans_FCVTZU_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_sd); +} + +static bool trans_FCVTZS_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_ds); +} + +static bool trans_FCVTZU_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_ds); +} + +static bool trans_FCVTZS_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_dd); +} + +static bool trans_FCVTZU_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_dd); +} + static bool trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) { return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh); diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 18c174e92d..ddfb5316c9 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -829,6 +829,22 @@ FCVT_hd 01100101 11 0010 01 101 ... ..... ..... @rd_pg_rn_e0 FCVT_ds 01100101 11 0010 10 101 ... ..... ..... @rd_pg_rn_e0 FCVT_sd 01100101 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0 +# SVE floating-point convert to integer +FCVTZS_hh 01100101 01 011 01 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_hh 01100101 01 011 01 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_hs 01100101 01 011 10 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_hs 01100101 01 011 10 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_hd 01100101 01 011 11 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_hd 01100101 01 011 11 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_ss 01100101 10 011 10 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_ss 01100101 10 011 10 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_ds 01100101 11 011 00 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_ds 01100101 11 011 00 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_sd 01100101 11 011 10 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_sd 01100101 11 011 10 1 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZS_dd 01100101 11 011 11 0 101 ... ..... ..... @rd_pg_rn_e0 +FCVTZU_dd 01100101 11 011 11 1 101 ... ..... ..... @rd_pg_rn_e0 + # SVE integer convert to floating-point SCVTF_hh 01100101 01 010 01 0 101 ... ..... ..... @rd_pg_rn_e0 SCVTF_sh 01100101 01 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 From patchwork Wed Jun 27 04:33:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935283 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="ey+h2wW8"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr5W07Yrz9s0w for ; Wed, 27 Jun 2018 14:48:31 +1000 (AEST) Received: from localhost ([::1]:56575 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2NY-0007XF-Ku for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:48:28 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60897) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29h-0004Xa-BN for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29f-0000uE-VY for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:09 -0400 Received: from mail-pf0-x22a.google.com ([2607:f8b0:400e:c00::22a]:39451) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29f-0000ti-OP for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:07 -0400 Received: by mail-pf0-x22a.google.com with SMTP id s21-v6so385001pfm.6 for ; Tue, 26 Jun 2018 21:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=A2/N6E/dNThqx3NglBGKSUe+klioN0aVIRT199XiPu4=; b=ey+h2wW89a1utQ0nawCvKF0ZY50GNIwZfhtzELxm0o1K6mU10Y1ULgJDUHEsGTmTI0 FvW6sHT4QPq+5pKIqGC4IJuAhnSCOlN63anYqYStU5LOQleIHBg+ZuZSvWnRkRu1ePKM Uy56sPKD/GAWM/eKFEbTA9JVpNrQhQVJ66sic= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=A2/N6E/dNThqx3NglBGKSUe+klioN0aVIRT199XiPu4=; b=KApyiA7kG7aLTSJBEZEuiR8Q1OyvFxIGbHhRgdV3HqQThhnM5W8yJHX2sQLgGcR57b YtDHFEDSD/VwqPXCI0zL5ChsuoRCZqLCNwOEndVfvy5kVuZ6U9DoQiohQEClSzC28TO6 7jLvEiUcqIg9xByz8txn3M7yuN2QX3wh/xRiAF8lG5j3C+ObPaTrVyc3fQdf1Chvdk0c FI1ALB2MAyEEVHN81lLYTSP2Aj4wa121NVyQ8aFxt/RjYu6WYjfRKVH3smSZCaksvPpF KDterzuw98w5zdu0d5i+sFaobRjIEZU4U8uycVtcDeMHkZMetvKL5nnvX+mNkW8lbVV5 b4XA== X-Gm-Message-State: APt69E2YiQ+ZvAwGP1r04EUl+9n1gj065U1lHN3Erk8LxrlzNs6+Mkl0 wejbT3JRwRpAj40WoO4UpxZ2zds6FOw= X-Google-Smtp-Source: ADUXVKLvtPPscp7gpzO5Sh+bIEIVrUwujOFxLZcYZRZ1v2ltjRZTn6so/b0Djyz35DgleZe42bAsAw== X-Received: by 2002:a63:7d4c:: with SMTP id m12-v6mr3693200pgn.201.1530074046319; Tue, 26 Jun 2018 21:34:06 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:05 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:18 -0700 Message-Id: <20180627043328.11531-26-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22a Subject: [Qemu-devel] [PATCH v6 25/35] target/arm: Implement SVE floating-point round to integral value X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 14 +++++++ target/arm/sve_helper.c | 8 ++++ target/arm/translate-sve.c | 77 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 9 +++++ 4 files changed, 108 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 37fa9eb9bb..36168c5bb2 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -985,6 +985,20 @@ DEF_HELPER_FLAGS_5(sve_fcvtzu_sd, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_fcvtzu_dd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frint_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frint_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frint_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_frintx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frintx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frintx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b6421ec19c..af8221c714 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3290,6 +3290,14 @@ DO_ZPZ_FP(sve_fcvtzu_sd, uint64_t, , vfp_float32_to_uint64_rtz) DO_ZPZ_FP(sve_fcvtzu_ds, uint64_t, , helper_vfp_touizd) DO_ZPZ_FP(sve_fcvtzu_dd, uint64_t, , vfp_float64_to_uint64_rtz) +DO_ZPZ_FP(sve_frint_h, uint16_t, H1_2, helper_advsimd_rinth) +DO_ZPZ_FP(sve_frint_s, uint32_t, H1_4, helper_rints) +DO_ZPZ_FP(sve_frint_d, uint64_t, , helper_rintd) + +DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int) +DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int) +DO_ZPZ_FP(sve_frintx_d, uint64_t, , float64_round_to_int) + DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16) DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16) DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index be589a1cf2..270bf9101b 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4040,6 +4040,83 @@ static bool trans_FCVTZU_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn) return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_dd); } +static gen_helper_gvec_3_ptr * const frint_fns[3] = { + gen_helper_sve_frint_h, + gen_helper_sve_frint_s, + gen_helper_sve_frint_d +}; + +static bool trans_FRINTI(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + if (a->esz == 0) { + return false; + } + return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, + frint_fns[a->esz - 1]); +} + +static bool trans_FRINTX(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[3] = { + gen_helper_sve_frintx_h, + gen_helper_sve_frintx_s, + gen_helper_sve_frintx_d + }; + if (a->esz == 0) { + return false; + } + return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]); +} + +static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode) +{ + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_i32 tmode = tcg_const_i32(mode); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + + gen_helper_set_rmode(tmode, tmode, status); + + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + pred_full_reg_offset(s, a->pg), + status, vsz, vsz, 0, frint_fns[a->esz - 1]); + + gen_helper_set_rmode(tmode, tmode, status); + tcg_temp_free_i32(tmode); + tcg_temp_free_ptr(status); + } + return true; +} + +static bool trans_FRINTN(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_frint_mode(s, a, float_round_nearest_even); +} + +static bool trans_FRINTP(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_frint_mode(s, a, float_round_up); +} + +static bool trans_FRINTM(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_frint_mode(s, a, float_round_down); +} + +static bool trans_FRINTZ(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_frint_mode(s, a, float_round_to_zero); +} + +static bool trans_FRINTA(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + return do_frint_mode(s, a, float_round_ties_away); +} + static bool trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) { return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh); diff --git a/target/arm/sve.decode b/target/arm/sve.decode index ddfb5316c9..e45faaec3a 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -845,6 +845,15 @@ FCVTZU_sd 01100101 11 011 10 1 101 ... ..... ..... @rd_pg_rn_e0 FCVTZS_dd 01100101 11 011 11 0 101 ... ..... ..... @rd_pg_rn_e0 FCVTZU_dd 01100101 11 011 11 1 101 ... ..... ..... @rd_pg_rn_e0 +# SVE floating-point round to integral value +FRINTN 01100101 .. 000 000 101 ... ..... ..... @rd_pg_rn +FRINTP 01100101 .. 000 001 101 ... ..... ..... @rd_pg_rn +FRINTM 01100101 .. 000 010 101 ... ..... ..... @rd_pg_rn +FRINTZ 01100101 .. 000 011 101 ... ..... ..... @rd_pg_rn +FRINTA 01100101 .. 000 100 101 ... ..... ..... @rd_pg_rn +FRINTX 01100101 .. 000 110 101 ... ..... ..... @rd_pg_rn +FRINTI 01100101 .. 000 111 101 ... ..... ..... @rd_pg_rn + # SVE integer convert to floating-point SCVTF_hh 01100101 01 010 01 0 101 ... ..... ..... @rd_pg_rn_e0 SCVTF_sh 01100101 01 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 From patchwork Wed Jun 27 04:33:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935297 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Sl35h3Rw"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrK00MHqz9s0w for ; Wed, 27 Jun 2018 14:58:28 +1000 (AEST) Received: from localhost ([::1]:56636 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2XB-0007B7-HM for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:58:25 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60947) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29i-0004Xf-NI for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29h-0000w3-8h for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: from mail-pf0-x22c.google.com ([2607:f8b0:400e:c00::22c]:37612) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29h-0000v4-1x for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:09 -0400 Received: by mail-pf0-x22c.google.com with SMTP id y5-v6so388814pfn.4 for ; Tue, 26 Jun 2018 21:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=aQtfq69suyZFnSQntfxBl6/IxAKwudRlVXWZxIK1/oo=; b=Sl35h3RwZXpLhYT1l20GWAfBzUBVT78xnEEjlQ7iDgAUUoC1KA/1saARwmaIoSakAE ZJ9j1rnaKvYHNMwgkA7ulADf/a+y+o4p4AIuqiZnZU93JPIKkuh3PmLbpR4Ds7a1qdrz hUAAtU6Sc6gyXwQT5CEOstcL1xhMsvdT8wfpQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=aQtfq69suyZFnSQntfxBl6/IxAKwudRlVXWZxIK1/oo=; b=Ubb4Fitwp+Zks7DdUTkDrt0WWx56CldNkZ6Ojy6agUASaOHiNXwsLkgQn+S0Tuoui4 e7QkFjHYtekivmFKQLOw391wHg4R2HhDvGD9pV2rHHgNILmgG0UACfMU2uJrDTXDw2bz RNeLAXxps2KYS/x2kumKZKjdhkOTjeF5A93KQO+anSz49SaHP6DO5YlgSOPjMp1jYP+4 AC08duEKrJ/SCwUBfXPqJ/5rEapLhbKILSx9mpWvMdXqy43V1l86FrLprKoZOEoZ4CgC gri6CRVK3wUkfk3ssYOeBaWzY7EuJNELvcAtcjikhIOGo+SP6n4VVscNN/gfMqXEdU5m n1Nw== X-Gm-Message-State: APt69E0nxzRvLP2a7S1iZ4OT7ij5WuKOtSuPIctOEr5bvOhCgBvgvdL5 f+1vyPMPftDJG7CZoam7XKlfYKvLNQ0= X-Google-Smtp-Source: ADUXVKKbktaGT2gUxICXayYJVhk4buctczTVsxPJgSmj0vB5XPGYS7pySQlD/1mqrmZyT7ZEueke+Q== X-Received: by 2002:a65:6491:: with SMTP id e17-v6mr3776006pgv.44.1530074047823; Tue, 26 Jun 2018 21:34:07 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:06 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:19 -0700 Message-Id: <20180627043328.11531-27-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22c Subject: [Qemu-devel] [PATCH v6 26/35] target/arm: Implement SVE floating-point unary operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 14 ++++++++++++++ target/arm/sve_helper.c | 8 ++++++++ target/arm/translate-sve.c | 26 ++++++++++++++++++++++++++ target/arm/sve.decode | 4 ++++ 4 files changed, 52 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 36168c5bb2..891346a5ac 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -999,6 +999,20 @@ DEF_HELPER_FLAGS_5(sve_frintx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_frintx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frecpx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frecpx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_frecpx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve_fsqrt_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fsqrt_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve_fsqrt_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG, diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index af8221c714..83bd8c4269 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3298,6 +3298,14 @@ DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int) DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int) DO_ZPZ_FP(sve_frintx_d, uint64_t, , float64_round_to_int) +DO_ZPZ_FP(sve_frecpx_h, uint16_t, H1_2, helper_frecpx_f16) +DO_ZPZ_FP(sve_frecpx_s, uint32_t, H1_4, helper_frecpx_f32) +DO_ZPZ_FP(sve_frecpx_d, uint64_t, , helper_frecpx_f64) + +DO_ZPZ_FP(sve_fsqrt_h, uint16_t, H1_2, float16_sqrt) +DO_ZPZ_FP(sve_fsqrt_s, uint32_t, H1_4, float32_sqrt) +DO_ZPZ_FP(sve_fsqrt_d, uint64_t, , float64_sqrt) + DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16) DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16) DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 270bf9101b..ff8ae67e2b 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4117,6 +4117,32 @@ static bool trans_FRINTA(DisasContext *s, arg_rpr_esz *a, uint32_t insn) return do_frint_mode(s, a, float_round_ties_away); } +static bool trans_FRECPX(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[3] = { + gen_helper_sve_frecpx_h, + gen_helper_sve_frecpx_s, + gen_helper_sve_frecpx_d + }; + if (a->esz == 0) { + return false; + } + return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]); +} + +static bool trans_FSQRT(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[3] = { + gen_helper_sve_fsqrt_h, + gen_helper_sve_fsqrt_s, + gen_helper_sve_fsqrt_d + }; + if (a->esz == 0) { + return false; + } + return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]); +} + static bool trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn) { return do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh); diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e45faaec3a..2aca9f0bb0 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -854,6 +854,10 @@ FRINTA 01100101 .. 000 100 101 ... ..... ..... @rd_pg_rn FRINTX 01100101 .. 000 110 101 ... ..... ..... @rd_pg_rn FRINTI 01100101 .. 000 111 101 ... ..... ..... @rd_pg_rn +# SVE floating-point unary operations +FRECPX 01100101 .. 001 100 101 ... ..... ..... @rd_pg_rn +FSQRT 01100101 .. 001 101 101 ... ..... ..... @rd_pg_rn + # SVE integer convert to floating-point SCVTF_hh 01100101 01 010 01 0 101 ... ..... ..... @rd_pg_rn_e0 SCVTF_sh 01100101 01 010 10 0 101 ... ..... ..... @rd_pg_rn_e0 From patchwork Wed Jun 27 04:33:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935286 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="GhdOHjFY"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41Fr8G0bH6z9s0w for ; Wed, 27 Jun 2018 14:50:54 +1000 (AEST) Received: from localhost ([::1]:56586 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Pr-0000zQ-NX for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:50:51 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60969) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29j-0004Z6-NW for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29i-0000xu-L7 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:11 -0400 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:38542) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29i-0000wq-GX for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:10 -0400 Received: by mail-pf0-x241.google.com with SMTP id j17-v6so387475pfn.5 for ; Tue, 26 Jun 2018 21:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jLi5kdXWigewkTKucYfYsu8zvwTxz4aJuC7qXtngMOQ=; b=GhdOHjFYVgRUMkS6M7sRvhqp3XKkXgR70riE4PCAKKRi2qW6NQdd7XtsUQb4eAunCu bwKcdZuSn0VyUqRF7O6VjoiLsemR/6HF7RUGeKj9shZsrZzggNVFgZYqVnK5320jVY59 L5i2SCZThddkhIe/qMnaAluXspX6XW1deNV0Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jLi5kdXWigewkTKucYfYsu8zvwTxz4aJuC7qXtngMOQ=; b=C5p0o4aLr9e0+FsveJO5US4a9JhWchl0LiclBN2RH6MM6w7F2sBfeFOtys50pLVwRb pWUxLaZxWF9tfbkIRZ7ffZ6Rq+hnlZGOM/FEYynNJW4gXAwjX0l/1JNoTkebhH4olXFX oX9liJVddZ4Sh6548yysyG3uXLcKLQkhF5J5LUg5Mwe7T3OrppzhpYbDT0HuDEObfQv9 E6Po3RrEQ2Q1ROv96aA6Q0x4bDa//hHMzoOPXXLonAsUoQuIo4FLUoIwybkKfesHwMav RX5WsuLJmkNMKIW/RETMPmpCpCVG3VpZx2r9MtKYMXgLyoIvSEx0kJvzfdmhqa7hC6N4 tduQ== X-Gm-Message-State: APt69E28w/Nrqnuz94ZWjd909op6rqp0uTYgqU0O54ed+4s09BZjbU8y g4vsN7uS9BRI4gcxIl0Wzx1XQhIsCTg= X-Google-Smtp-Source: ADUXVKIUTNm2VxT/lm/W6tB1ngZt7T5/R5pzN2Pgib8A644aRxjvqGjMZnf9tXfg91jzr5JJbA/2KA== X-Received: by 2002:a65:6689:: with SMTP id b9-v6mr3747646pgw.326.1530074049246; Tue, 26 Jun 2018 21:34:09 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:08 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:20 -0700 Message-Id: <20180627043328.11531-28-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v6 27/35] target/arm: Implement SVE MOVPRFX X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Fix comment typos --- target/arm/translate-sve.c | 60 +++++++++++++++++++++++++++++++++++++- target/arm/sve.decode | 7 +++++ 2 files changed, 66 insertions(+), 1 deletion(-) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index ff8ae67e2b..4883de3fab 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -351,6 +351,23 @@ static bool do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn) return true; } +/* Select active elememnts from Zn and inactive elements from Zm, + * storing the result in Zd. + */ +static void do_sel_z(DisasContext *s, int rd, int rn, int rm, int pg, int esz) +{ + static gen_helper_gvec_4 * const fns[4] = { + gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h, + gen_helper_sve_sel_zpzz_s, gen_helper_sve_sel_zpzz_d + }; + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + pred_full_reg_offset(s, pg), + vsz, vsz, 0, fns[esz]); +} + #define DO_ZPZZ(NAME, name) \ static bool trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, \ uint32_t insn) \ @@ -401,7 +418,13 @@ static bool trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) return do_zpzz_ool(s, a, fns[a->esz]); } -DO_ZPZZ(SEL, sel) +static bool trans_SEL_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) +{ + if (sve_access_check(s)) { + do_sel_z(s, a->rd, a->rn, a->rm, a->pg, a->esz); + } + return true; +} #undef DO_ZPZZ @@ -5035,3 +5058,38 @@ static bool trans_PRF_rr(DisasContext *s, arg_PRF_rr *a, uint32_t insn) sve_access_check(s); return true; } + +/* + * Move Prefix + * + * TODO: The implementation so far could handle predicated merging movprfx. + * The helper functions as written take an extra source register to + * use in the operation, but the result is only written when predication + * succeeds. For unpredicated movprfx, we need to rearrange the helpers + * to allow the final write back to the destination to be unconditional. + * For predicated zeroing movprfx, we need to rearrange the helpers to + * allow the final write back to zero inactives. + * + * In the meantime, just emit the moves. + */ + +static bool trans_MOVPRFX(DisasContext *s, arg_MOVPRFX *a, uint32_t insn) +{ + return do_mov_z(s, a->rd, a->rn); +} + +static bool trans_MOVPRFX_m(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + if (sve_access_check(s)) { + do_sel_z(s, a->rd, a->rn, a->rd, a->pg, a->esz); + } + return true; +} + +static bool trans_MOVPRFX_z(DisasContext *s, arg_rpr_esz *a, uint32_t insn) +{ + if (sve_access_check(s)) { + do_movz_zpz(s, a->rd, a->rn, a->pg, a->esz); + } + return true; +} diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 2aca9f0bb0..c725ee2584 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -270,6 +270,10 @@ ORV 00000100 .. 011 000 001 ... ..... ..... @rd_pg_rn EORV 00000100 .. 011 001 001 ... ..... ..... @rd_pg_rn ANDV 00000100 .. 011 010 001 ... ..... ..... @rd_pg_rn +# SVE constructive prefix (predicated) +MOVPRFX_z 00000100 .. 010 000 001 ... ..... ..... @rd_pg_rn +MOVPRFX_m 00000100 .. 010 001 001 ... ..... ..... @rd_pg_rn + # SVE integer add reduction (predicated) # Note that saddv requires size != 3. UADDV 00000100 .. 000 001 001 ... ..... ..... @rd_pg_rn @@ -418,6 +422,9 @@ ADR_p64 00000100 11 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm ### SVE Integer Misc - Unpredicated Group +# SVE constructive prefix (unpredicated) +MOVPRFX 00000100 00 1 00000 101111 rn:5 rd:5 + # SVE floating-point exponential accelerator # Note esz != 0 FEXPA 00000100 .. 1 00000 101110 ..... ..... @rd_rn From patchwork Wed Jun 27 04:33:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935301 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="C03UqQks"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrPW0DNxz9s0w for ; Wed, 27 Jun 2018 15:02:23 +1000 (AEST) Received: from localhost ([::1]:56679 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2ay-0002NK-M1 for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 01:02:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32794) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29m-0004dI-Pc for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29j-0000ye-VZ for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:14 -0400 Received: from mail-pf0-x236.google.com ([2607:f8b0:400e:c00::236]:43393) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29j-0000yA-Of for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:11 -0400 Received: by mail-pf0-x236.google.com with SMTP id y8-v6so382645pfm.10 for ; Tue, 26 Jun 2018 21:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=51qnQVLdHtI56j1vCLTEgJ6dbb/yME4EFqLsFYPoNTk=; b=C03UqQks117ENZNnF30MpCLAG87vlsifSRCsTZkn10WzIkxHXnLTjG369cYOG/WExa 6KXRhdlA7UBzWbAOd/23yoNEFfE554mSJeU1MBD58lT5Snx3k7ECwO+lAmQ84TTTD3x6 6pPvekeNFApjCU8rM6kWo/OZijP2ecSlYRduM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=51qnQVLdHtI56j1vCLTEgJ6dbb/yME4EFqLsFYPoNTk=; b=mkqiu0gRWiIiRleGlROSEBnQpExZ4p69IquuqsPPB+8eP5LQrwsI/TPnghjBm15jDz LyQSc4KTWa24gJGK6Ep6aYNZ/JKQIwFEN3llELdDwN1mJTJFWtcOgzG36v5ElSKMUJsS hyyeCsgZndthFeX/cmMRriQytNcg+ZeWPqHbUpm3jWl3ZkAXfKdCqysNCFqxjsxWf0bm DbzpioDiAtHUL2H7ip2j4sUaHmmfy5W8dWLzL7RqfJGVnDT60WTgnJgJuzgpWS7RGTDu Rkb0SCTp+LbOfo3urnz9bYW1wPpzscY+Fne5KDiZXigjtRU81WiknFpavOF5bGuIoXeh w+ig== X-Gm-Message-State: APt69E2RUy3ZuuuE8xehqMqU2oDAuTpmywdbZO+oP4XsFQYam13V5gL/ Z9XHQ9FEZPY3nJjbgQE9qvSZLQAEUbw= X-Google-Smtp-Source: ADUXVKKf4LmjIlZWVjaSpGN3pecfVl+7M71hJwGG3qvt/YEvVFvbPJd1WGPfDcbfrq+hLEQP8PE0IQ== X-Received: by 2002:a63:3190:: with SMTP id x138-v6mr3762791pgx.60.1530074050533; Tue, 26 Jun 2018 21:34:10 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:09 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:21 -0700 Message-Id: <20180627043328.11531-29-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::236 Subject: [Qemu-devel] [PATCH v6 28/35] target/arm: Implement SVE floating-point complex add X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 7 +++ target/arm/sve_helper.c | 100 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 24 +++++++++ target/arm/sve.decode | 4 ++ 4 files changed, 135 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 891346a5ac..0bd9fe2f28 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1092,6 +1092,13 @@ DEF_HELPER_FLAGS_6(sve_facgt_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sve_facgt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve_fcadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 83bd8c4269..bdb7565779 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3657,6 +3657,106 @@ void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm, void *vs, uint32_t desc) } } +/* + * FP Complex Add + */ + +void HELPER(sve_fcadd_h)(void *vd, void *vn, void *vm, void *vg, + void *vs, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + uint64_t *g = vg; + float16 neg_imag = float16_set_sign(0, simd_data(desc)); + float16 neg_real = float16_chs(neg_imag); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float16 e0, e1, e2, e3; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float16); + i -= 2 * sizeof(float16); + + e0 = *(float16 *)(vn + H1_2(i)); + e1 = *(float16 *)(vm + H1_2(j)) ^ neg_real; + e2 = *(float16 *)(vn + H1_2(j)); + e3 = *(float16 *)(vm + H1_2(i)) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + *(float16 *)(vd + H1_2(i)) = float16_add(e0, e1, vs); + } + if (likely((pg >> (j & 63)) & 1)) { + *(float16 *)(vd + H1_2(j)) = float16_add(e2, e3, vs); + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fcadd_s)(void *vd, void *vn, void *vm, void *vg, + void *vs, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + uint64_t *g = vg; + float32 neg_imag = float32_set_sign(0, simd_data(desc)); + float32 neg_real = float32_chs(neg_imag); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float32 e0, e1, e2, e3; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float32); + i -= 2 * sizeof(float32); + + e0 = *(float32 *)(vn + H1_2(i)); + e1 = *(float32 *)(vm + H1_2(j)) ^ neg_real; + e2 = *(float32 *)(vn + H1_2(j)); + e3 = *(float32 *)(vm + H1_2(i)) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + *(float32 *)(vd + H1_2(i)) = float32_add(e0, e1, vs); + } + if (likely((pg >> (j & 63)) & 1)) { + *(float32 *)(vd + H1_2(j)) = float32_add(e2, e3, vs); + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg, + void *vs, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + uint64_t *g = vg; + float64 neg_imag = float64_set_sign(0, simd_data(desc)); + float64 neg_real = float64_chs(neg_imag); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float64 e0, e1, e2, e3; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float64); + i -= 2 * sizeof(float64); + + e0 = *(float64 *)(vn + H1_2(i)); + e1 = *(float64 *)(vm + H1_2(j)) ^ neg_real; + e2 = *(float64 *)(vn + H1_2(j)); + e3 = *(float64 *)(vm + H1_2(i)) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + *(float64 *)(vd + H1_2(i)) = float64_add(e0, e1, vs); + } + if (likely((pg >> (j & 63)) & 1)) { + *(float64 *)(vd + H1_2(j)) = float64_add(e2, e3, vs); + } + } while (i & 63); + } while (i != 0); +} + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4883de3fab..b1764f099b 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3895,6 +3895,30 @@ DO_FPCMP(FACGT, facgt) #undef DO_FPCMP +static bool trans_FCADD(DisasContext *s, arg_FCADD *a, uint32_t insn) +{ + static gen_helper_gvec_4_ptr * const fns[3] = { + gen_helper_sve_fcadd_h, + gen_helper_sve_fcadd_s, + gen_helper_sve_fcadd_d + }; + + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + pred_full_reg_offset(s, a->pg), + status, vsz, vsz, a->rot, fns[a->esz - 1]); + tcg_temp_free_ptr(status); + } + return true; +} + typedef void gen_helper_sve_fmla(TCGv_env, TCGv_ptr, TCGv_i32); static bool do_fmla(DisasContext *s, arg_rprrr_esz *a, gen_helper_sve_fmla *fn) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index c725ee2584..e5f8f43254 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -725,6 +725,10 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u # SVE integer multiply immediate (unpredicated) MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s +# SVE floating-point complex add (predicated) +FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ + rn=%reg_movprfx + ### SVE FP Multiply-Add Indexed Group # SVE floating-point multiply-add (indexed) From patchwork Wed Jun 27 04:33:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935302 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="Bvdz6coe"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrS26VYGz9s0w for ; Wed, 27 Jun 2018 15:04:34 +1000 (AEST) Received: from localhost ([::1]:56685 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2d6-0003ay-GY for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 01:04:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32804) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29n-0004dr-8f for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29l-00010b-El for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:15 -0400 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:42165) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29l-0000zp-72 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:13 -0400 Received: by mail-pl0-x241.google.com with SMTP id w17-v6so411779pll.9 for ; Tue, 26 Jun 2018 21:34:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jAT3MbivOAuh2cXOphZuTzwQ/j70NHuupQD49/hoOCQ=; b=Bvdz6coePgMV9aY0yCHnfbjN+wjvXlOGI3HX+feoE4GZPLoGgdCgmDHC2S5bVlZmJN e/f65YmpnBwtuAhIH0iNM3N5FCAFNjup4Aol4lDKdADuJyXCAvS1ZlGo8LrJwC1UZ4Al mYNPkwsBgMKoZbB439NY62H2T9EIKKt9Ibg4Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jAT3MbivOAuh2cXOphZuTzwQ/j70NHuupQD49/hoOCQ=; b=s68GPBNB/AuleelkGgS3o1vr7tbLXD84bEbNUCM/wvsMOtC9JTDSSZUPqyRcivsYVT ix6M3ykGqzC4ZvtNv4qIzapPRelO94zAeQ0tfwd/q4Sops3bRvlGSUNkTGi5B/mWJWmR mu4oWombBs+tevsWu9xcB1kiz1ntgVKvj/TeL5iGw+DTPl+FfQ/7dCiUH9YGbxqV3mgw nK9M5qd47e6Ir2Zq+otDeAlfL8AXZgKw8n5phHJTkwHKP65xxUclejIlaR2HF871WHlw rhUj2dY3iqJF3fQ2h8myPDsTrOWuKbuZxDkuf+h2GBeJvdzjdKHoTvb69iBwfmYyxL5q F1cA== X-Gm-Message-State: APt69E0VsUbVXJsd3mYzhpYfISSX3y+tZwz2Q3uiIlKp3vgKxMHum8Ff H8hnN0clgx6oiJM1HqTZR58tfNrlQyU= X-Google-Smtp-Source: ADUXVKKXw0TYBplr5d1PZYnwk68imk0K8U0/3vTbR63jb73SLYbRKvvfALHqX4d4RXS0Xst7amx82A== X-Received: by 2002:a17:902:3c5:: with SMTP id d63-v6mr4518297pld.163.1530074051944; Tue, 26 Jun 2018 21:34:11 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:11 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:22 -0700 Message-Id: <20180627043328.11531-30-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v6 29/35] target/arm: Implement SVE fp complex multiply add X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 4 + target/arm/sve_helper.c | 162 +++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 37 +++++++++ target/arm/sve.decode | 4 + 4 files changed, 207 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 0bd9fe2f28..023952a9a4 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1115,6 +1115,10 @@ DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32) +DEF_HELPER_FLAGS_3(sve_fcmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32) + DEF_HELPER_FLAGS_5(sve_ftmad_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_ftmad_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_ftmad_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bdb7565779..790cbacd14 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3757,6 +3757,168 @@ void HELPER(sve_fcadd_d)(void *vd, void *vn, void *vm, void *vg, } while (i != 0); } +/* + * FP Complex Multiply + */ + +QEMU_BUILD_BUG_ON(SIMD_DATA_SHIFT + 22 > 32); + +void HELPER(sve_fcmla_zpzzz_h)(CPUARMState *env, void *vg, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + unsigned rot = extract32(desc, SIMD_DATA_SHIFT + 20, 2); + bool flip = rot & 1; + float16 neg_imag, neg_real; + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + neg_imag = float16_set_sign(0, (rot & 2) != 0); + neg_real = float16_set_sign(0, rot == 1 || rot == 2); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float16 e1, e2, e3, e4, nr, ni, mr, mi, d; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float16); + i -= 2 * sizeof(float16); + + nr = *(float16 *)(vn + H1_2(i)); + ni = *(float16 *)(vn + H1_2(j)); + mr = *(float16 *)(vm + H1_2(i)); + mi = *(float16 *)(vm + H1_2(j)); + + e2 = (flip ? ni : nr); + e1 = (flip ? mi : mr) ^ neg_real; + e4 = e2; + e3 = (flip ? mr : mi) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + d = *(float16 *)(va + H1_2(i)); + d = float16_muladd(e2, e1, d, 0, &env->vfp.fp_status_f16); + *(float16 *)(vd + H1_2(i)) = d; + } + if (likely((pg >> (j & 63)) & 1)) { + d = *(float16 *)(va + H1_2(j)); + d = float16_muladd(e4, e3, d, 0, &env->vfp.fp_status_f16); + *(float16 *)(vd + H1_2(j)) = d; + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fcmla_zpzzz_s)(CPUARMState *env, void *vg, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + unsigned rot = extract32(desc, SIMD_DATA_SHIFT + 20, 2); + bool flip = rot & 1; + float32 neg_imag, neg_real; + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + neg_imag = float32_set_sign(0, (rot & 2) != 0); + neg_real = float32_set_sign(0, rot == 1 || rot == 2); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float32 e1, e2, e3, e4, nr, ni, mr, mi, d; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float32); + i -= 2 * sizeof(float32); + + nr = *(float32 *)(vn + H1_2(i)); + ni = *(float32 *)(vn + H1_2(j)); + mr = *(float32 *)(vm + H1_2(i)); + mi = *(float32 *)(vm + H1_2(j)); + + e2 = (flip ? ni : nr); + e1 = (flip ? mi : mr) ^ neg_real; + e4 = e2; + e3 = (flip ? mr : mi) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + d = *(float32 *)(va + H1_2(i)); + d = float32_muladd(e2, e1, d, 0, &env->vfp.fp_status); + *(float32 *)(vd + H1_2(i)) = d; + } + if (likely((pg >> (j & 63)) & 1)) { + d = *(float32 *)(va + H1_2(j)); + d = float32_muladd(e4, e3, d, 0, &env->vfp.fp_status); + *(float32 *)(vd + H1_2(j)) = d; + } + } while (i & 63); + } while (i != 0); +} + +void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc) +{ + intptr_t j, i = simd_oprsz(desc); + unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5); + unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5); + unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5); + unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5); + unsigned rot = extract32(desc, SIMD_DATA_SHIFT + 20, 2); + bool flip = rot & 1; + float64 neg_imag, neg_real; + void *vd = &env->vfp.zregs[rd]; + void *vn = &env->vfp.zregs[rn]; + void *vm = &env->vfp.zregs[rm]; + void *va = &env->vfp.zregs[ra]; + uint64_t *g = vg; + + neg_imag = float64_set_sign(0, (rot & 2) != 0); + neg_real = float64_set_sign(0, rot == 1 || rot == 2); + + do { + uint64_t pg = g[(i - 1) >> 6]; + do { + float64 e1, e2, e3, e4, nr, ni, mr, mi, d; + + /* I holds the real index; J holds the imag index. */ + j = i - sizeof(float64); + i -= 2 * sizeof(float64); + + nr = *(float64 *)(vn + H1_2(i)); + ni = *(float64 *)(vn + H1_2(j)); + mr = *(float64 *)(vm + H1_2(i)); + mi = *(float64 *)(vm + H1_2(j)); + + e2 = (flip ? ni : nr); + e1 = (flip ? mi : mr) ^ neg_real; + e4 = e2; + e3 = (flip ? mr : mi) ^ neg_imag; + + if (likely((pg >> (i & 63)) & 1)) { + d = *(float64 *)(va + H1_2(i)); + d = float64_muladd(e2, e1, d, 0, &env->vfp.fp_status); + *(float64 *)(vd + H1_2(i)) = d; + } + if (likely((pg >> (j & 63)) & 1)) { + d = *(float64 *)(va + H1_2(j)); + d = float64_muladd(e4, e3, d, 0, &env->vfp.fp_status); + *(float64 *)(vd + H1_2(j)) = d; + } + } while (i & 63); + } while (i != 0); +} + /* * Load contiguous data, protected by a governing predicate. */ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b1764f099b..7ce3222158 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3968,6 +3968,43 @@ DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz) #undef DO_FMLA +static bool trans_FCMLA_zpzzz(DisasContext *s, + arg_FCMLA_zpzzz *a, uint32_t insn) +{ + static gen_helper_sve_fmla * const fns[3] = { + gen_helper_sve_fcmla_zpzzz_h, + gen_helper_sve_fcmla_zpzzz_s, + gen_helper_sve_fcmla_zpzzz_d, + }; + + if (a->esz == 0) { + return false; + } + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + unsigned desc; + TCGv_i32 t_desc; + TCGv_ptr pg = tcg_temp_new_ptr(); + + /* We would need 7 operands to pass these arguments "properly". + * So we encode all the register numbers into the descriptor. + */ + desc = deposit32(a->rd, 5, 5, a->rn); + desc = deposit32(desc, 10, 5, a->rm); + desc = deposit32(desc, 15, 5, a->ra); + desc = deposit32(desc, 20, 2, a->rot); + desc = sextract32(desc, 0, 22); + desc = simd_desc(vsz, vsz, desc); + + t_desc = tcg_const_i32(desc); + tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg)); + fns[a->esz - 1](cpu_env, pg, t_desc); + tcg_temp_free_i32(t_desc); + tcg_temp_free_ptr(pg); + } + return true; +} + /* *** SVE Floating Point Unary Operations Prediated Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e5f8f43254..e342cfdf14 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -729,6 +729,10 @@ MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ rn=%reg_movprfx +# SVE floating-point complex multiply-add (predicated) +FCMLA_zpzzz 01100100 esz:2 0 rm:5 0 rot:2 pg:3 rn:5 rd:5 \ + ra=%reg_movprfx + ### SVE FP Multiply-Add Indexed Group # SVE floating-point multiply-add (indexed) From patchwork Wed Jun 27 04:33:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935290 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="idJ00xcq"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrD93S0Yz9s0w for ; Wed, 27 Jun 2018 14:54:17 +1000 (AEST) Received: from localhost ([::1]:56605 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2T9-0003XP-4E for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:54:15 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29o-0004fK-F8 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29m-000120-Mm for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:16 -0400 Received: from mail-pl0-x233.google.com ([2607:f8b0:400e:c01::233]:37558) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29m-000119-Ei for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:14 -0400 Received: by mail-pl0-x233.google.com with SMTP id 31-v6so416411plc.4 for ; Tue, 26 Jun 2018 21:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=r45jIobnF+wtoBKzXFQCmSBJ8FmsuLpFIk2TKAlXpxU=; b=idJ00xcqkfipev5FaBl0Bv4pSnx6QzXpkhpn+XRsROXKHe2ZRz2DEYqFDkZkaUbloQ c+ThOR9mczsV6bJVUbiu+170XU74PENysndxc7jdnHByNyN5o9HYpcR+L+r2MCMvGqCC 2mETfakPAe3o7MIy0SfghYcppOVsG+xGfUQIY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=r45jIobnF+wtoBKzXFQCmSBJ8FmsuLpFIk2TKAlXpxU=; b=TOuux+F/ZJbcQwimSl9w+MgYg6bktC/dzXQ9DcO2o6CqiZmf42TEh/njD+8kZ6i7tT On4fzSjhiGWORgfgYtdiSODOInEAIFW535OIZT3Gp/AXhZmo7zUC+tK3wOGEj7i84i8Y VDr/fnpEsSSwGgTb8xr4C4MatV5QQJWh6hLc1VESwqlmNzw42EvjKFRHLI8wPEDZDXe3 2Omb9ib4y9oe3kEBfjkbltXD68cXUQWCX63VRKQ/Uda/9dgmL6o+4XrsuLs1rYbqGW8B pv/FEn2iUl0W9/Dpe4upDsajo5pkUUq3ELiKKHpq1lOLljz/P2OsSl8ScQv8n+oenYuP Fq0g== X-Gm-Message-State: APt69E1QEi2dxVz6099MwrwLB7XAKYyPh3Af+F+IxbveXfKdAOmWy7bp jvka1FQ8k8SsSE3aFs7Rq5nN7QoZeLQ= X-Google-Smtp-Source: ADUXVKJdY9iAO06h2HpEazVTalkBBNjo8LuKEuWZIMuzTqLNrVRA8JeP0LRUhBdMExYUsOipO6Zp0w== X-Received: by 2002:a17:902:722:: with SMTP id 31-v6mr4541650pli.3.1530074053168; Tue, 26 Jun 2018 21:34:13 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.11 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:12 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:23 -0700 Message-Id: <20180627043328.11531-31-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::233 Subject: [Qemu-devel] [PATCH v6 30/35] target/arm: Pass index to AdvSIMD FCMLA (indexed) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" For aa64 advsimd, we had been passing the pre-indexed vector. However, sve applies the index to each 128-bit segment, so we need to pass in the index separately. For aa32 advsimd, the fp32 operation always has index 0, but we failed to interpret the fp16 index correctly. Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée --- v6: * Fix double-indexing in translate-a64.c * Fix non-indexing of fp16 in translate.c. --- target/arm/translate-a64.c | 21 ++++++++++++--------- target/arm/translate.c | 32 +++++++++++++++++++++++--------- target/arm/vec_helper.c | 10 ++++++---- 3 files changed, 41 insertions(+), 22 deletions(-) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 8d8a4cecb0..eb3a4ab2f0 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -12669,15 +12669,18 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) case 0x13: /* FCMLA #90 */ case 0x15: /* FCMLA #180 */ case 0x17: /* FCMLA #270 */ - tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd), - vec_full_reg_offset(s, rn), - vec_reg_offset(s, rm, index, size), fpst, - is_q ? 16 : 8, vec_full_reg_size(s), - extract32(insn, 13, 2), /* rot */ - size == MO_64 - ? gen_helper_gvec_fcmlas_idx - : gen_helper_gvec_fcmlah_idx); - tcg_temp_free_ptr(fpst); + { + int rot = extract32(insn, 13, 2); + int data = (index << 2) | rot; + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), fpst, + is_q ? 16 : 8, vec_full_reg_size(s), data, + size == MO_64 + ? gen_helper_gvec_fcmlas_idx + : gen_helper_gvec_fcmlah_idx); + tcg_temp_free_ptr(fpst); + } return; } diff --git a/target/arm/translate.c b/target/arm/translate.c index 2a3e4f5d4c..a7a980b1f2 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -7826,26 +7826,42 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn) { - int rd, rn, rm, rot, size, opr_sz; + gen_helper_gvec_3_ptr *fn_gvec_ptr; + int rd, rn, rm, opr_sz, data; TCGv_ptr fpst; bool q; q = extract32(insn, 6, 1); VFP_DREG_D(rd, insn); VFP_DREG_N(rn, insn); - VFP_DREG_M(rm, insn); if ((rd | rn) & q) { return 1; } if ((insn & 0xff000f10) == 0xfe000800) { /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */ - rot = extract32(insn, 20, 2); - size = extract32(insn, 23, 1); - if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA) - || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) { + int rot = extract32(insn, 20, 2); + int size = extract32(insn, 23, 1); + int index; + + if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) { return 1; } + if (size == 0) { + if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) { + return 1; + } + /* For fp16, rm is just Vm, and index is M. */ + rm = extract32(insn, 0, 4); + index = extract32(insn, 5, 1); + } else { + /* For fp32, rm is the usual M:Vm, and index is 0. */ + VFP_DREG_M(rm, insn); + index = 0; + } + data = (index << 2) | rot; + fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx + : gen_helper_gvec_fcmlah_idx); } else { return 1; } @@ -7864,9 +7880,7 @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn) tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), vfp_reg_offset(1, rn), vfp_reg_offset(1, rm), fpst, - opr_sz, opr_sz, rot, - size ? gen_helper_gvec_fcmlas_idx - : gen_helper_gvec_fcmlah_idx); + opr_sz, opr_sz, data, fn_gvec_ptr); tcg_temp_free_ptr(fpst); return 0; } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 073e5c58e7..8f2dc4b989 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -317,10 +317,11 @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, float_status *fpst = vfpst; intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1); uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1); + intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2); uint32_t neg_real = flip ^ neg_imag; uintptr_t i; - float16 e1 = m[H2(flip)]; - float16 e3 = m[H2(1 - flip)]; + float16 e1 = m[H2(2 * index + flip)]; + float16 e3 = m[H2(2 * index + 1 - flip)]; /* Shift boolean to the sign bit so we can xor to negate. */ neg_real <<= 15; @@ -377,10 +378,11 @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, float_status *fpst = vfpst; intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1); uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1); + intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2); uint32_t neg_real = flip ^ neg_imag; uintptr_t i; - float32 e1 = m[H4(flip)]; - float32 e3 = m[H4(1 - flip)]; + float32 e1 = m[H4(2 * index + flip)]; + float32 e3 = m[H4(2 * index + 1 - flip)]; /* Shift boolean to the sign bit so we can xor to negate. */ neg_real <<= 31; From patchwork Wed Jun 27 04:33:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935296 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="baKByzqK"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrHf071Lz9s0w for ; Wed, 27 Jun 2018 14:57:17 +1000 (AEST) Received: from localhost ([::1]:56630 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2W3-0006F0-Dj for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:57:15 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32880) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29p-0004fh-Sh for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29o-00015Q-63 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:17 -0400 Received: from mail-pl0-x233.google.com ([2607:f8b0:400e:c01::233]:40867) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29n-00013g-Su for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:16 -0400 Received: by mail-pl0-x233.google.com with SMTP id t6-v6so413524plo.7 for ; Tue, 26 Jun 2018 21:34:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uDnlhaTXRgEX+y5t8YAQfJ33Fc348zNrhQ6d3KrL1HE=; b=baKByzqKrKlNjAT5LqXXr2gDTA3uiKhlK96mJgFfygFT5KCpkpKataljpgdIWOuUst 53gpjg/lUYh46bpgvKVqw5+Tj90nCFvHKiytF5+ZjMiNQnqjWGho8f2U/JFi8zDI3qgJ 52fyTrGXWnUkE6ZPkDVjtJ8/ProVq8f+X1DPE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uDnlhaTXRgEX+y5t8YAQfJ33Fc348zNrhQ6d3KrL1HE=; b=G0p4MpVHO17M3RqS3kmn+IlK5zDuLXbZiCBcJLKGxubUIMKsiCzb01yLiBWu3r0uOQ IMD6qEwoLs18BFHvhTMggw9Y5O/peC62TgBzbSFwLuYA5ASBqFweXVTGFmPnTDKSni1+ lKjSDwh0BvJ73ilT39dIBu4PxmYrr7SlefPPZ+wL55n4OPzQoacEXKH95XKPO4w72CNc DQX9hgCNfstXwKgBItl67kdShrhoHPuC4NC7tAzO9F5MFuErh8olAuR2IguWRI9KY9vt PlDNnwWsecWZ7Owc3uJ1EGYbnKfL8RRuv6g94EdGJjVsJEbLnX29udbgsEErLR/nCkJx f5yg== X-Gm-Message-State: APt69E3dhXpdk6dekfJx5VKiIL11Voox9mvKKUIXYLc+Tvw3gPARP/tD RUh6FYLqNv+xZVW5VY2vPhdV5XB8e9E= X-Google-Smtp-Source: ADUXVKK3MfioeumX2L9X+xJ2SnpZJj8shxx+qY994ekm9+8LStmifbttwwxhJ5iAjQYTDUa7iP137g== X-Received: by 2002:a17:902:822:: with SMTP id 31-v6mr4420602plk.172.1530074054617; Tue, 26 Jun 2018 21:34:14 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:13 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:24 -0700 Message-Id: <20180627043328.11531-32-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::233 Subject: [Qemu-devel] [PATCH v6 31/35] target/arm: Implement SVE fp complex multiply add (indexed) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Enhance the existing helpers to support SVE, which takes the index from each 128-bit segment. The change has no effect for AdvSIMD, since there is only one such segment. Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée --- target/arm/translate-sve.c | 23 ++++++++++++++++++ target/arm/vec_helper.c | 50 +++++++++++++++++++++++--------------- target/arm/sve.decode | 6 +++++ 3 files changed, 59 insertions(+), 20 deletions(-) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7ce3222158..4f2152fb70 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4005,6 +4005,29 @@ static bool trans_FCMLA_zpzzz(DisasContext *s, return true; } +static bool trans_FCMLA_zzxz(DisasContext *s, arg_FCMLA_zzxz *a, uint32_t insn) +{ + static gen_helper_gvec_3_ptr * const fns[2] = { + gen_helper_gvec_fcmlah_idx, + gen_helper_gvec_fcmlas_idx, + }; + + tcg_debug_assert(a->esz == 1 || a->esz == 2); + tcg_debug_assert(a->rd == a->ra); + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16); + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + status, vsz, vsz, + a->index * 4 + a->rot, + fns[a->esz - 1]); + tcg_temp_free_ptr(status); + } + return true; +} + /* *** SVE Floating Point Unary Operations Prediated Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 8f2dc4b989..db5aeb9f24 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -319,22 +319,27 @@ void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm, uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1); intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2); uint32_t neg_real = flip ^ neg_imag; - uintptr_t i; - float16 e1 = m[H2(2 * index + flip)]; - float16 e3 = m[H2(2 * index + 1 - flip)]; + intptr_t elements = opr_sz / sizeof(float16); + intptr_t eltspersegment = 16 / sizeof(float16); + intptr_t i, j; /* Shift boolean to the sign bit so we can xor to negate. */ neg_real <<= 15; neg_imag <<= 15; - e1 ^= neg_real; - e3 ^= neg_imag; - for (i = 0; i < opr_sz / 2; i += 2) { - float16 e2 = n[H2(i + flip)]; - float16 e4 = e2; + for (i = 0; i < elements; i += eltspersegment) { + float16 mr = m[H2(i + 2 * index + 0)]; + float16 mi = m[H2(i + 2 * index + 1)]; + float16 e1 = neg_real ^ (flip ? mi : mr); + float16 e3 = neg_imag ^ (flip ? mr : mi); - d[H2(i)] = float16_muladd(e2, e1, d[H2(i)], 0, fpst); - d[H2(i + 1)] = float16_muladd(e4, e3, d[H2(i + 1)], 0, fpst); + for (j = i; j < i + eltspersegment; j += 2) { + float16 e2 = n[H2(j + flip)]; + float16 e4 = e2; + + d[H2(j)] = float16_muladd(e2, e1, d[H2(j)], 0, fpst); + d[H2(j + 1)] = float16_muladd(e4, e3, d[H2(j + 1)], 0, fpst); + } } clear_tail(d, opr_sz, simd_maxsz(desc)); } @@ -380,22 +385,27 @@ void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm, uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1); intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 2, 2); uint32_t neg_real = flip ^ neg_imag; - uintptr_t i; - float32 e1 = m[H4(2 * index + flip)]; - float32 e3 = m[H4(2 * index + 1 - flip)]; + intptr_t elements = opr_sz / sizeof(float32); + intptr_t eltspersegment = 16 / sizeof(float32); + intptr_t i, j; /* Shift boolean to the sign bit so we can xor to negate. */ neg_real <<= 31; neg_imag <<= 31; - e1 ^= neg_real; - e3 ^= neg_imag; - for (i = 0; i < opr_sz / 4; i += 2) { - float32 e2 = n[H4(i + flip)]; - float32 e4 = e2; + for (i = 0; i < elements; i += eltspersegment) { + float32 mr = m[H4(i + 2 * index + 0)]; + float32 mi = m[H4(i + 2 * index + 1)]; + float32 e1 = neg_real ^ (flip ? mi : mr); + float32 e3 = neg_imag ^ (flip ? mr : mi); - d[H4(i)] = float32_muladd(e2, e1, d[H4(i)], 0, fpst); - d[H4(i + 1)] = float32_muladd(e4, e3, d[H4(i + 1)], 0, fpst); + for (j = i; j < i + eltspersegment; j += 2) { + float32 e2 = n[H4(j + flip)]; + float32 e4 = e2; + + d[H4(j)] = float32_muladd(e2, e1, d[H4(j)], 0, fpst); + d[H4(j + 1)] = float32_muladd(e4, e3, d[H4(j + 1)], 0, fpst); + } } clear_tail(d, opr_sz, simd_maxsz(desc)); } diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e342cfdf14..62365ed90f 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -733,6 +733,12 @@ FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ FCMLA_zpzzz 01100100 esz:2 0 rm:5 0 rot:2 pg:3 rn:5 rd:5 \ ra=%reg_movprfx +# SVE floating-point complex multiply-add (indexed) +FCMLA_zzxz 01100100 10 1 index:2 rm:3 0001 rot:2 rn:5 rd:5 \ + ra=%reg_movprfx esz=1 +FCMLA_zzxz 01100100 11 1 index:1 rm:4 0001 rot:2 rn:5 rd:5 \ + ra=%reg_movprfx esz=2 + ### SVE FP Multiply-Add Indexed Group # SVE floating-point multiply-add (indexed) From patchwork Wed Jun 27 04:33:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935303 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="CdrHTbeh"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrVJ4KvNz9s0w for ; Wed, 27 Jun 2018 15:06:32 +1000 (AEST) Received: from localhost ([::1]:56699 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2f0-00051M-6R for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 01:06:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32909) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29q-0004fs-Ox for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29p-00017o-IB for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:18 -0400 Received: from mail-pg0-x235.google.com ([2607:f8b0:400e:c05::235]:45291) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29p-00016V-Ah for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:17 -0400 Received: by mail-pg0-x235.google.com with SMTP id z1-v6so357110pgv.12 for ; Tue, 26 Jun 2018 21:34:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wmvBZcu08nQNVj+xJr8ERwUWjrrrFWxOlfD78cTaOXQ=; b=CdrHTbehnXlfkMpSCMvX5mI7eOPVh/Ne1bybOWPinnAwG9qawCjjwQ5Sys1rQ2ngsU i60S55DlULd3VQk/frnUKFyTkH7+kDByuVyQJc021x/b2Zj9mHb5c4FO69KTOqbZxJXq /Wj/hxAufaJIhavDR3YiXCCaIWPVN6rby7zhA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=wmvBZcu08nQNVj+xJr8ERwUWjrrrFWxOlfD78cTaOXQ=; b=PXD/Q4wfoa1Jll+dNg6PYHkB1XnbxxF248WAHPLSdpUYi7vqXgctaX48Yvy7BBKAEK GQ6rEhrrPkPJBIzJHB7mkoayjI3lEktMd2ew8TB2ei2rQU28DaJn1/CITSxeKfGMGUwD l6PDLinR23ZLfX0IKvT5MACD/OCgP1DbynRS5p3nob0kGRXs1aHTRkhmcxQOIoqEDn7j 2t4E9EPQw2uniuQiic8wJbXyKv4Gq/Wn4VTUoEP5Bd0oL3UZHBdlnmS+lo/A2Sj25lu6 WIAILMTTlXyNgYDFqxvKBUf1m+YAh5XtykayUt/HDZX6CbnRYBS9bTZr4XF7Ii9raLtK qScg== X-Gm-Message-State: APt69E2CyPcQokSIdIbR6SkJiRYEdOqL+IvxtdCkbrfKvQjgjCaVRRfK Nf21KH3lMkSLoDCNFDm3L3qpfgQvmQ8= X-Google-Smtp-Source: AAOMgpfOIkPVSn4iILdsodZhd+7a1uJEO9RAeh+12nUd6qku8AT9QqKVNs2HHMcXEFf1CoBp3Mp+MA== X-Received: by 2002:a63:3c0c:: with SMTP id j12-v6mr685319pga.440.1530074056036; Tue, 26 Jun 2018 21:34:16 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:15 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:25 -0700 Message-Id: <20180627043328.11531-33-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::235 Subject: [Qemu-devel] [PATCH v6 32/35] target/arm: Implement SVE dot product (vectors) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper.h | 5 +++ target/arm/translate-sve.c | 17 ++++++++++ target/arm/vec_helper.c | 67 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 3 ++ 4 files changed, 92 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 8607077dda..e23ce7ff19 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -583,6 +583,11 @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4f2152fb70..8a2bd1f8c5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3423,6 +3423,23 @@ DO_ZZI(UMIN, umin) #undef DO_ZZI +static bool trans_DOT_zzz(DisasContext *s, arg_DOT_zzz *a, uint32_t insn) +{ + static gen_helper_gvec_3 * const fns[2][2] = { + { gen_helper_gvec_sdot_b, gen_helper_gvec_sdot_h }, + { gen_helper_gvec_udot_b, gen_helper_gvec_udot_h } + }; + + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, 0, fns[a->u][a->sz]); + } + return true; +} + /* *** SVE Floating Point Multiply-Add Indexed Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index db5aeb9f24..c16a30c3b5 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -194,6 +194,73 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +/* Integer 8 and 16-bit dot-product. + * + * Note that for the loops herein, host endianness does not matter + * with respect to the ordering of data within the 64-bit lanes. + * All elements are treated equally, no matter where they are. + */ + +void HELPER(gvec_sdot_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd; + int8_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] += n[i * 4 + 0] * m[i * 4 + 0] + + n[i * 4 + 1] * m[i * 4 + 1] + + n[i * 4 + 2] * m[i * 4 + 2] + + n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd; + uint8_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] += n[i * 4 + 0] * m[i * 4 + 0] + + n[i * 4 + 1] * m[i * 4 + 1] + + n[i * 4 + 2] * m[i * 4 + 2] + + n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_sdot_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd; + int16_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + d[i] += (int64_t)n[i * 4 + 0] * m[i * 4 + 0] + + (int64_t)n[i * 4 + 1] * m[i * 4 + 1] + + (int64_t)n[i * 4 + 2] * m[i * 4 + 2] + + (int64_t)n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd; + uint16_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + d[i] += (uint64_t)n[i * 4 + 0] * m[i * 4 + 0] + + (uint64_t)n[i * 4 + 1] * m[i * 4 + 1] + + (uint64_t)n[i * 4 + 2] * m[i * 4 + 2] + + (uint64_t)n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, void *vfpst, uint32_t desc) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 62365ed90f..35415bfb6c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -725,6 +725,9 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u # SVE integer multiply immediate (unpredicated) MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s +# SVE integer dot product (unpredicated) +DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 + # SVE floating-point complex add (predicated) FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ rn=%reg_movprfx From patchwork Wed Jun 27 04:33:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935300 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="KDe8YeJP"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrNW3VYVz9s0w for ; Wed, 27 Jun 2018 15:01:31 +1000 (AEST) Received: from localhost ([::1]:56666 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2a9-0001gh-6X for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 01:01:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32952) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29s-0004j2-RC for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29r-00019q-4L for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:20 -0400 Received: from mail-pg0-x22f.google.com ([2607:f8b0:400e:c05::22f]:40253) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29q-00018l-Rm for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:19 -0400 Received: by mail-pg0-x22f.google.com with SMTP id w8-v6so363067pgp.7 for ; Tue, 26 Jun 2018 21:34:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=k58LLRl/ubomF3QyWjTfEzFxj0hKy1XlanWjizd0aaM=; b=KDe8YeJPzp4V6bDofkISoXbfUvSgNaoDKI3JVQPI5vaszV7gbR6dhCKfl/CFHJOCNh kdFKhYHOAEBDLJO5AcRZJH1ekwI+EEkZD9ShzFJ89rTGktF80rXk7oMBVuWZSq4gMpoj yVHSGbENKGxkH1il+nxLeoPvhbKgGzt+rbGsM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=k58LLRl/ubomF3QyWjTfEzFxj0hKy1XlanWjizd0aaM=; b=jfr4H5sUnRpqiNaLII1AV+xxwYn14BHJaJAHqhr3LzeRREz32i+IS1yVkHAauAaBkW t4xsyRTgPO3YT6KmgXSAsSNwMRpj64C6k+Q0XSRND2YYFO1N4O2cQhx/tDAEuSU+T10+ y21dWsX8GQUyfshkonP+yC+8jsGVKw+NWT2mhWm0XoZ62MHGA08QB0SBIGdWjpCEim0T ZvQXQj0ywE1GyTVS6UaA7HxCv/PDghfpJ0z0bOVgy+9KqHmMfKMz6+eFXrDC4Uxqq5XC M4F2WYmzKJCHeDnm/NjBxuKXzE5EzSe1SQfg3sskE8cke73kau5NS2pHcbENEFxJNHHz bWQA== X-Gm-Message-State: APt69E2aKMq9ETiPm2Q3mgU4Hsf5m+fcf4UYFzxDd1TnUZueiDG4Bxr5 JWrsl8YJCHVJ96ZVqA9bz2/sN3Urv5w= X-Google-Smtp-Source: AAOMgpcOBPv/gVRUW4/7qCoaAkea3qc7YlhjUSJ6e8CdrRhzn0jGvGFHxBMKE8AX+OzBOgbETBszOA== X-Received: by 2002:a62:b02:: with SMTP id t2-v6mr4306504pfi.36.1530074057444; Tue, 26 Jun 2018 21:34:17 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:16 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:26 -0700 Message-Id: <20180627043328.11531-34-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22f Subject: [Qemu-devel] [PATCH v6 33/35] target/arm: Implement SVE dot product (indexed) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell Reviewed-by: Alex Bennée --- v6: Rearrange the loops. The compiler does well with this form and hopefully they are also easier to read. --- target/arm/helper.h | 5 ++ target/arm/translate-sve.c | 18 ++++++ target/arm/vec_helper.c | 124 +++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 8 ++- 4 files changed, 154 insertions(+), 1 deletion(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index e23ce7ff19..59e8c3bd1b 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -588,6 +588,11 @@ DEF_HELPER_FLAGS_4(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_idx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_idx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 8a2bd1f8c5..3cff71cae8 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3440,6 +3440,24 @@ static bool trans_DOT_zzz(DisasContext *s, arg_DOT_zzz *a, uint32_t insn) return true; } +static bool trans_DOT_zzx(DisasContext *s, arg_DOT_zzx *a, uint32_t insn) +{ + static gen_helper_gvec_3 * const fns[2][2] = { + { gen_helper_gvec_sdot_idx_b, gen_helper_gvec_sdot_idx_h }, + { gen_helper_gvec_udot_idx_b, gen_helper_gvec_udot_idx_h } + }; + + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, a->index, fns[a->u][a->sz]); + } + return true; +} + + /* *** SVE Floating Point Multiply-Add Indexed Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index c16a30c3b5..37f338732e 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -261,6 +261,130 @@ void HELPER(gvec_udot_h)(void *vd, void *vn, void *vm, uint32_t desc) clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; + intptr_t index = simd_data(desc); + uint32_t *d = vd; + int8_t *n = vn; + int8_t *m_indexed = (int8_t *)vm + index * 4; + + /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. + * Otherwise opr_sz is a multiple of 16. + */ + segend = MIN(4, opr_sz_4); + i = 0; + do { + int8_t m0 = m_indexed[i * 4 + 0]; + int8_t m1 = m_indexed[i * 4 + 1]; + int8_t m2 = m_indexed[i * 4 + 2]; + int8_t m3 = m_indexed[i * 4 + 3]; + + do { + d[i] += n[i * 4 + 0] * m0 + + n[i * 4 + 1] * m1 + + n[i * 4 + 2] * m2 + + n[i * 4 + 3] * m3; + } while (++i < segend); + segend = i + 4; + } while (i < opr_sz_4); + + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; + intptr_t index = simd_data(desc); + uint32_t *d = vd; + uint8_t *n = vn; + uint8_t *m_indexed = (uint8_t *)vm + index * 4; + + /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. + * Otherwise opr_sz is a multiple of 16. + */ + segend = MIN(4, opr_sz_4); + i = 0; + do { + uint8_t m0 = m_indexed[i * 4 + 0]; + uint8_t m1 = m_indexed[i * 4 + 1]; + uint8_t m2 = m_indexed[i * 4 + 2]; + uint8_t m3 = m_indexed[i * 4 + 3]; + + do { + d[i] += n[i * 4 + 0] * m0 + + n[i * 4 + 1] * m1 + + n[i * 4 + 2] * m2 + + n[i * 4 + 3] * m3; + } while (++i < segend); + segend = i + 4; + } while (i < opr_sz_4); + + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_sdot_idx_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc), opr_sz_8 = opr_sz / 8; + intptr_t index = simd_data(desc); + uint64_t *d = vd; + int16_t *n = vn; + int16_t *m_indexed = (int16_t *)vm + index * 4; + + /* This is supported by SVE only, so opr_sz is always a multiple of 16. + * Process the entire segment all at once, writing back the results + * only after we've consumed all of the inputs. + */ + for (i = 0; i < opr_sz_8 ; i += 2) { + uint64_t d0, d1; + + d0 = n[i * 4 + 0] * (int64_t)m_indexed[i * 4 + 0]; + d0 += n[i * 4 + 1] * (int64_t)m_indexed[i * 4 + 1]; + d0 += n[i * 4 + 2] * (int64_t)m_indexed[i * 4 + 2]; + d0 += n[i * 4 + 3] * (int64_t)m_indexed[i * 4 + 3]; + d1 = n[i * 4 + 4] * (int64_t)m_indexed[i * 4 + 0]; + d1 += n[i * 4 + 5] * (int64_t)m_indexed[i * 4 + 1]; + d1 += n[i * 4 + 6] * (int64_t)m_indexed[i * 4 + 2]; + d1 += n[i * 4 + 7] * (int64_t)m_indexed[i * 4 + 3]; + + d[i + 0] += d0; + d[i + 1] += d1; + } + + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_idx_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc), opr_sz_8 = opr_sz / 8; + intptr_t index = simd_data(desc); + uint64_t *d = vd; + uint16_t *n = vn; + uint16_t *m_indexed = (uint16_t *)vm + index * 4; + + /* This is supported by SVE only, so opr_sz is always a multiple of 16. + * Process the entire segment all at once, writing back the results + * only after we've consumed all of the inputs. + */ + for (i = 0; i < opr_sz_8 ; i += 2) { + uint64_t d0, d1; + + d0 = n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0]; + d0 += n[i * 4 + 1] * (uint64_t)m_indexed[i * 4 + 1]; + d0 += n[i * 4 + 2] * (uint64_t)m_indexed[i * 4 + 2]; + d0 += n[i * 4 + 3] * (uint64_t)m_indexed[i * 4 + 3]; + d1 = n[i * 4 + 4] * (uint64_t)m_indexed[i * 4 + 0]; + d1 += n[i * 4 + 5] * (uint64_t)m_indexed[i * 4 + 1]; + d1 += n[i * 4 + 6] * (uint64_t)m_indexed[i * 4 + 2]; + d1 += n[i * 4 + 7] * (uint64_t)m_indexed[i * 4 + 3]; + + d[i + 0] += d0; + d[i + 1] += d1; + } + + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, void *vfpst, uint32_t desc) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 35415bfb6c..e10b689454 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -726,7 +726,13 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s # SVE integer dot product (unpredicated) -DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 +DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 ra=%reg_movprfx + +# SVE integer dot product (indexed) +DOT_zzx 01000100 101 index:2 rm:3 00000 u:1 rn:5 rd:5 \ + sz=0 ra=%reg_movprfx +DOT_zzx 01000100 111 index:1 rm:4 00000 u:1 rn:5 rd:5 \ + sz=1 ra=%reg_movprfx # SVE floating-point complex add (predicated) FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ From patchwork Wed Jun 27 04:33:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935294 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="jA5Ssl3N"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrGJ45Hzz9s0w for ; Wed, 27 Jun 2018 14:56:08 +1000 (AEST) Received: from localhost ([::1]:56622 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Uv-0005Ha-Tw for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:56:06 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29v-0004lJ-1O for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29s-0001B4-7J for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:23 -0400 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:41956) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29s-0001AI-1w for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:20 -0400 Received: by mail-pf0-x241.google.com with SMTP id a11-v6so385133pff.8 for ; Tue, 26 Jun 2018 21:34:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=z9Jmsw8tQfNqgdH0zzy7MGQpGBhTgMjmj8JLLOriGMI=; b=jA5Ssl3N7FWqWEse2DYt3YpcZWiAfOEGZBiHJWrmiZKKdhOeb7MLNzKtlgqRRLS2u6 +Ifo1ObrOH7AjVWVVbLHuSQXO+47DClZyxgyRpz846QcR2oq6fJ9xOSHekB9okSWKC7w /OPDFIJ7kgWoS0ZLiK4QBHO24kkCXp+KPrMU0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z9Jmsw8tQfNqgdH0zzy7MGQpGBhTgMjmj8JLLOriGMI=; b=tZeMFkmH9qZhlrg6st8oWrHL1+Dbu7b7DSLT7S/k+n7+d/glNZ7Ge8MZ44oQl14hhq /aXb9z+bAhOGLh72WhQFIYSMH+FtlUItoitZlzfB1P4JCUmgqhFntG5zg/Se0V50rWbb oVbrEmru7QHaQG6hNE2azqJCcsR5wTLdlrUdH54w3u5lho6uk9mEIuOBnZx/zj3NPcq7 kV0LrcWS5jjailR9C4I2o6Ohuwp9qgaovtNCnUdH4IWVjUyFkc2BoIdDgKsw3DsAA42t dz+TPPz8Dc3EwD2QaryScBTbcOkvoYSln5kCfq1mZjZgXIuDyNF8dH2fvJXWCwRhJmau tS1g== X-Gm-Message-State: APt69E0GVefZDhs62TJFpKfC5Wc613cxcEbD6C4qSJPvPonG8tMMOpXB 4dfhEkkT1NOZCx91IR30VC1vaGbWwM0= X-Google-Smtp-Source: ADUXVKKvlUFWtyCiz9wFqUnZlNLRMVcteITD34/D4B7Cq7417ozSqgEB1Prb8wufgt+nJlM6VAhrNg== X-Received: by 2002:a63:5014:: with SMTP id e20-v6mr3692043pgb.133.1530074058871; Tue, 26 Jun 2018 21:34:18 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.17 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:18 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:27 -0700 Message-Id: <20180627043328.11531-35-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v6 34/35] target/arm: Enable SVE for aarch64-linux-user X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Enable ARM_FEATURE_SVE for the generic "max" cpu. Tested-by: Alex Bennée Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Set ARM_HWCAP_A64_SVE. --- linux-user/elfload.c | 1 + target/arm/cpu.c | 7 +++++++ target/arm/cpu64.c | 1 + 3 files changed, 9 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 13bc78d0c8..d1231ad07a 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -584,6 +584,7 @@ static uint32_t get_elf_hwcap(void) GET_FEATURE(ARM_FEATURE_V8_ATOMICS, ARM_HWCAP_A64_ATOMICS); GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM); GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA); + GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE); #undef GET_FEATURE return hwcaps; diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 2ae4fffafb..6dcc552e14 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -164,6 +164,13 @@ static void arm_cpu_reset(CPUState *s) env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE; /* and to the FP/Neon instructions */ env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3); + /* and to the SVE instructions */ + env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3); + env->cp15.cptr_el[3] |= CPTR_EZ; + /* with maximum vector length */ + env->vfp.zcr_el[1] = ARM_MAX_VQ - 1; + env->vfp.zcr_el[2] = ARM_MAX_VQ - 1; + env->vfp.zcr_el[3] = ARM_MAX_VQ - 1; #else /* Reset into the highest available EL */ if (arm_feature(env, ARM_FEATURE_EL3)) { diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index c50dcd4077..0360d7efc5 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -252,6 +252,7 @@ static void aarch64_max_initfn(Object *obj) set_feature(&cpu->env, ARM_FEATURE_V8_RDM); set_feature(&cpu->env, ARM_FEATURE_V8_FP16); set_feature(&cpu->env, ARM_FEATURE_V8_FCMA); + set_feature(&cpu->env, ARM_FEATURE_SVE); /* For usermode -cpu max we can use a larger and more efficient DCZ * blocksize since we don't have to follow what the hardware does. */ From patchwork Wed Jun 27 04:33:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 935299 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.b="dLFrUTTJ"; dkim-atps=neutral Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41FrKr277yz9s0w for ; Wed, 27 Jun 2018 14:59:12 +1000 (AEST) Received: from localhost ([::1]:56644 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2Xt-0007nP-Su for incoming@patchwork.ozlabs.org; Wed, 27 Jun 2018 00:59:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32984) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29v-0004lk-Ch for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29t-0001D2-M5 for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:23 -0400 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]:37019) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29t-0001CF-Dy for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:21 -0400 Received: by mail-pf0-x242.google.com with SMTP id y5-v6so389018pfn.4 for ; Tue, 26 Jun 2018 21:34:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=4Em03dcLzUXKwfE4k626bcy0xT4sCsg9jJV+yGM77rM=; b=dLFrUTTJ+xMOaKAInXYw5LtP/QmqIKJcCT2H3IyslkhuGYFZ1qet3YCApuWTxiC98y xXxCPuxHmljMsJ40N4getwaWoodKx4fdN+NuhCdX/FAcGec8B0MzOeThE4Uf1CaLoTQl ECsAtFy11xQdwX6jiDCzmFdrt6QzQSb9Di/SA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=4Em03dcLzUXKwfE4k626bcy0xT4sCsg9jJV+yGM77rM=; b=UlnIJSOR1nmwzEmkENLsg8T6Dc0RNF/GkjBZqZMsZyrkTc/1zI9EZahrhIxfpm2lbY 9g0uV9tI3D12bd8AwhF3PQIRUQOdmqBdYF3Qy0rCZHDGSQLu8FymvLhRjS8FM4bSI7GO 9ma9BnG7k8wFlgn7Re5WAfmCUFrYx8FpPQHb8NEQ7gt8sq+gOD+FEXpxY2yyDLVbO+d0 xLOBVQ8TMEIBKxKtcORvO8fBzsqSDYii4XQnHsY3u3bNFLKbvEkgB0b4Wam+4ldLAfPR ZCBXzlZN9wOIcnQ23pPMwd4qQhm148A85Os4SO/QvHm2PyhbGEdTucSNjm0lXulO6/gq 0glQ== X-Gm-Message-State: APt69E0z4Hk4fSoN6gnYZ704NHWXXUJh5XFYsVQEExenSZLpqWQiEuc9 sw7Zhvvu6kN3eWUO1sOZO/KsKO1J+lk= X-Google-Smtp-Source: AAOMgpe6/oWA8N9vJaIdbs25Q+i7eN9mHlyqFTMKiQtjE5NBXyh40e1xDrm2H5GYwOXwC9bnvtDZbg== X-Received: by 2002:a62:449b:: with SMTP id m27-v6mr4298284pfi.130.1530074060128; Tue, 26 Jun 2018 21:34:20 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:19 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:28 -0700 Message-Id: <20180627043328.11531-36-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::242 Subject: [Qemu-devel] [PATCH v6 35/35] target/arm: Implement ARMv8.2-DotProd X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" We've already added the helpers with an SVE patch, all that remains is to wire up the aa64 and aa32 translators. Enable the feature within -cpu max for CONFIG_USER_ONLY. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- v6: Fix aa32 index form. --- target/arm/cpu.h | 1 + linux-user/elfload.c | 1 + target/arm/cpu.c | 1 + target/arm/cpu64.c | 1 + target/arm/translate-a64.c | 36 +++++++++++++++++++ target/arm/translate.c | 74 +++++++++++++++++++++++++++----------- 6 files changed, 93 insertions(+), 21 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index a4507a2d6f..6a8441c2dd 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1480,6 +1480,7 @@ enum arm_features { ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */ ARM_FEATURE_V8_ATOMICS, /* ARMv8.1-Atomics feature */ ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */ + ARM_FEATURE_V8_DOTPROD, /* implements v8.2 simd dot product */ ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */ ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */ ARM_FEATURE_M_MAIN, /* M profile Main Extension */ diff --git a/linux-user/elfload.c b/linux-user/elfload.c index d1231ad07a..942a1b661f 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -583,6 +583,7 @@ static uint32_t get_elf_hwcap(void) ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP); GET_FEATURE(ARM_FEATURE_V8_ATOMICS, ARM_HWCAP_A64_ATOMICS); GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM); + GET_FEATURE(ARM_FEATURE_V8_DOTPROD, ARM_HWCAP_A64_ASIMDDP); GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA); GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE); #undef GET_FEATURE diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 6dcc552e14..aa62315cea 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -1805,6 +1805,7 @@ static void arm_max_initfn(Object *obj) set_feature(&cpu->env, ARM_FEATURE_V8_PMULL); set_feature(&cpu->env, ARM_FEATURE_CRC); set_feature(&cpu->env, ARM_FEATURE_V8_RDM); + set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD); set_feature(&cpu->env, ARM_FEATURE_V8_FCMA); #endif } diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index 0360d7efc5..3b4bc73ffa 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -250,6 +250,7 @@ static void aarch64_max_initfn(Object *obj) set_feature(&cpu->env, ARM_FEATURE_CRC); set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS); set_feature(&cpu->env, ARM_FEATURE_V8_RDM); + set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD); set_feature(&cpu->env, ARM_FEATURE_V8_FP16); set_feature(&cpu->env, ARM_FEATURE_V8_FCMA); set_feature(&cpu->env, ARM_FEATURE_SVE); diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index eb3a4ab2f0..f986340832 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -640,6 +640,16 @@ static void gen_gvec_op3(DisasContext *s, bool is_q, int rd, vec_full_reg_size(s), gvec_op); } +/* Expand a 3-operand operation using an out-of-line helper. */ +static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd, + int rn, int rm, int data, gen_helper_gvec_3 *fn) +{ + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), data, fn); +} + /* Expand a 3-operand + env pointer operation using * an out-of-line helper. */ @@ -11336,6 +11346,14 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn) } feature = ARM_FEATURE_V8_RDM; break; + case 0x02: /* SDOT (vector) */ + case 0x12: /* UDOT (vector) */ + if (size != MO_32) { + unallocated_encoding(s); + return; + } + feature = ARM_FEATURE_V8_DOTPROD; + break; case 0x8: /* FCMLA, #0 */ case 0x9: /* FCMLA, #90 */ case 0xa: /* FCMLA, #180 */ @@ -11389,6 +11407,11 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn) } return; + case 0x2: /* SDOT / UDOT */ + gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, + u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b); + return; + case 0x8: /* FCMLA, #0 */ case 0x9: /* FCMLA, #90 */ case 0xa: /* FCMLA, #180 */ @@ -12568,6 +12591,13 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) return; } break; + case 0x0e: /* SDOT */ + case 0x1e: /* UDOT */ + if (size != MO_32 || !arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) { + unallocated_encoding(s); + return; + } + break; case 0x11: /* FCMLA #0 */ case 0x13: /* FCMLA #90 */ case 0x15: /* FCMLA #180 */ @@ -12665,6 +12695,12 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) } switch (16 * u + opcode) { + case 0x0e: /* SDOT */ + case 0x1e: /* UDOT */ + gen_gvec_op3_ool(s, is_q, rd, rn, rm, index, + u ? gen_helper_gvec_udot_idx_b + : gen_helper_gvec_sdot_idx_b); + return; case 0x11: /* FCMLA #0 */ case 0x13: /* FCMLA #90 */ case 0x15: /* FCMLA #180 */ diff --git a/target/arm/translate.c b/target/arm/translate.c index a7a980b1f2..f845da7c63 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -7762,9 +7762,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn) */ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) { - gen_helper_gvec_3_ptr *fn_gvec_ptr; - int rd, rn, rm, rot, size, opr_sz; - TCGv_ptr fpst; + gen_helper_gvec_3 *fn_gvec = NULL; + gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL; + int rd, rn, rm, opr_sz; + int data = 0; bool q; q = extract32(insn, 6, 1); @@ -7777,8 +7778,8 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) if ((insn & 0xfe200f10) == 0xfc200800) { /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */ - size = extract32(insn, 20, 1); - rot = extract32(insn, 23, 2); + int size = extract32(insn, 20, 1); + data = extract32(insn, 23, 2); /* rot */ if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA) || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) { return 1; @@ -7786,13 +7787,20 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah; } else if ((insn & 0xfea00f10) == 0xfc800800) { /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */ - size = extract32(insn, 20, 1); - rot = extract32(insn, 24, 1); + int size = extract32(insn, 20, 1); + data = extract32(insn, 24, 1); /* rot */ if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA) || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) { return 1; } fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh; + } else if ((insn & 0xfeb00f00) == 0xfc200d00) { + /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */ + bool u = extract32(insn, 4, 1); + if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) { + return 1; + } + fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b; } else { return 1; } @@ -7807,12 +7815,19 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) } opr_sz = (1 + q) * 8; - fpst = get_fpstatus_ptr(1); - tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), - vfp_reg_offset(1, rn), - vfp_reg_offset(1, rm), fpst, - opr_sz, opr_sz, rot, fn_gvec_ptr); - tcg_temp_free_ptr(fpst); + if (fn_gvec_ptr) { + TCGv_ptr fpst = get_fpstatus_ptr(1); + tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), + vfp_reg_offset(1, rn), + vfp_reg_offset(1, rm), fpst, + opr_sz, opr_sz, data, fn_gvec_ptr); + tcg_temp_free_ptr(fpst); + } else { + tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), + vfp_reg_offset(1, rn), + vfp_reg_offset(1, rm), + opr_sz, opr_sz, data, fn_gvec); + } return 0; } @@ -7826,9 +7841,9 @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn) static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn) { - gen_helper_gvec_3_ptr *fn_gvec_ptr; + gen_helper_gvec_3 *fn_gvec = NULL; + gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL; int rd, rn, rm, opr_sz, data; - TCGv_ptr fpst; bool q; q = extract32(insn, 6, 1); @@ -7862,6 +7877,16 @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn) data = (index << 2) | rot; fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx : gen_helper_gvec_fcmlah_idx); + } else if ((insn & 0xffb00f00) == 0xfe200d00) { + /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */ + int u = extract32(insn, 4, 1); + if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) { + return 1; + } + fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b; + /* rm is just Vm, and index is M. */ + data = extract32(insn, 5, 1); /* index */ + rm = extract32(insn, 0, 4); } else { return 1; } @@ -7876,12 +7901,19 @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn) } opr_sz = (1 + q) * 8; - fpst = get_fpstatus_ptr(1); - tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), - vfp_reg_offset(1, rn), - vfp_reg_offset(1, rm), fpst, - opr_sz, opr_sz, data, fn_gvec_ptr); - tcg_temp_free_ptr(fpst); + if (fn_gvec_ptr) { + TCGv_ptr fpst = get_fpstatus_ptr(1); + tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), + vfp_reg_offset(1, rn), + vfp_reg_offset(1, rm), fpst, + opr_sz, opr_sz, data, fn_gvec_ptr); + tcg_temp_free_ptr(fpst); + } else { + tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), + vfp_reg_offset(1, rn), + vfp_reg_offset(1, rm), + opr_sz, opr_sz, data, fn_gvec); + } return 0; }