[RFC,v3,26/71] target/riscv: rvv-1.0: update vext_max_elems() for load/store insns

Message ID	20200806104709.13235-27-frank.chang@sifive.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: frank.chang@sifive.com To: qemu-devel@nongnu.org, qemu-riscv@nongnu.org Subject: [RFC v3 26/71] target/riscv: rvv-1.0: update vext_max_elems() for load/store insns Date: Thu, 6 Aug 2020 18:46:23 +0800 Message-Id: <20200806104709.13235-27-frank.chang@sifive.com> In-Reply-To: <20200806104709.13235-1-frank.chang@sifive.com> References: <20200806104709.13235-1-frank.chang@sifive.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::435; envelope-from=frank.chang@sifive.com; helo=mail-pf1-x435.google.com Precedence: list Cc: Frank Chang <frank.chang@sifive.com>, Alistair Francis <Alistair.Francis@wdc.com>, Palmer Dabbelt <palmer@dabbelt.com>, Sagar Karandikar <sagark@eecs.berkeley.edu>, Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	target/riscv: support vector extension v1.0 \| expand [RFC,v3,00/71] target/riscv: support vector extension v1.0 [RFC,v3,01/71] target/riscv: drop vector 0.7.1 and add 1.0 support [RFC,v3,02/71] target/riscv: Use FIELD_EX32() to extract wd field [RFC,v3,03/71] target/riscv: rvv-1.0: add mstatus VS field [RFC,v3,04/71] target/riscv: rvv-1.0: add sstatus VS field [RFC,v3,05/71] target/riscv: rvv-1.0: introduce writable misa.v field [RFC,v3,06/71] target/riscv: rvv-1.0: add translation-time vector context status [RFC,v3,07/71] target/riscv: rvv-1.0: remove vxrm and vxsat fields from fcsr register [RFC,v3,08/71] target/riscv: rvv-1.0: add vcsr register [RFC,v3,09/71] target/riscv: rvv-1.0: add vlenb register [RFC,v3,10/71] target/riscv: rvv-1.0: check MSTATUS_VS when accessing vector csr registers [RFC,v3,11/71] target/riscv: rvv-1.0: remove MLEN calculations [RFC,v3,12/71] target/riscv: rvv-1.0: add fractional LMUL [RFC,v3,13/71] target/riscv: rvv-1.0: add VMA and VTA [RFC,v3,14/71] target/riscv: rvv-1.0: update check functions [RFC,v3,15/71] target/riscv: introduce more imm value modes in translator functions [RFC,v3,16/71] target/riscv: add fp16 nan-box check generator function [RFC,v3,17/71] target/riscv: rvv:1.0: add translation-time nan-box helper function [RFC,v3,18/71] target/riscv: rvv-1.0: apply nanbox helper in opfvf_trans [RFC,v3,19/71] target/riscv: rvv-1.0: configure instructions [RFC,v3,20/71] target/riscv: rvv-1.0: stride load and store instructions [RFC,v3,21/71] target/riscv: rvv-1.0: index load and store instructions [RFC,v3,22/71] target/riscv: rvv-1.0: fix address index overflow bug of indexed load/store insns [RFC,v3,23/71] target/riscv: rvv-1.0: fault-only-first unit stride load [RFC,v3,24/71] target/riscv: rvv-1.0: amo operations [RFC,v3,25/71] target/riscv: rvv-1.0: load/store whole register instructions [RFC,v3,26/71] target/riscv: rvv-1.0: update vext_max_elems() for load/store insns [RFC,v3,27/71] target/riscv: rvv-1.0: take fractional LMUL into vector max elements calculation [RFC,v3,28/71] target/riscv: rvv-1.0: floating-point square-root instruction [RFC,v3,29/71] target/riscv: rvv-1.0: floating-point classify instructions [RFC,v3,30/71] target/riscv: rvv-1.0: mask population count instruction [RFC,v3,31/71] target/riscv: rvv-1.0: find-first-set mask bit instruction [RFC,v3,32/71] target/riscv: rvv-1.0: set-X-first mask bit instructions [RFC,v3,33/71] target/riscv: rvv-1.0: iota instruction [RFC,v3,34/71] target/riscv: rvv-1.0: element index instruction [RFC,v3,35/71] target/riscv: rvv-1.0: allow load element with sign-extended [RFC,v3,36/71] target/riscv: rvv-1.0: register gather instructions [RFC,v3,37/71] target/riscv: rvv-1.0: integer scalar move instructions [RFC,v3,38/71] target/riscv: rvv-1.0: floating-point move instruction [RFC,v3,39/71] target/riscv: rvv-1.0: floating-point scalar move instructions [RFC,v3,40/71] target/riscv: rvv-1.0: whole register move instructions [RFC,v3,41/71] target/riscv: rvv-1.0: integer extension instructions [RFC,v3,42/71] target/riscv: rvv-1.0: single-width averaging add and subtract instructions [RFC,v3,43/71] target/riscv: rvv-1.0: single-width bit shift instructions [RFC,v3,44/71] target/riscv: rvv-1.0: integer add-with-carry/subtract-with-borrow [RFC,v3,45/71] target/riscv: rvv-1.0: narrowing integer right shift instructions [RFC,v3,46/71] target/riscv: rvv-1.0: widening integer multiply-add instructions [RFC,v3,47/71] target/riscv: rvv-1.0: add Zvqmac extension [RFC,v3,48/71] target/riscv: rvv-1.0: quad-widening integer multiply-add instructions [RFC,v3,49/71] target/riscv: rvv-1.0: single-width saturating add and subtract instructions [RFC,v3,50/71] target/riscv: rvv-1.0: integer comparison instructions [RFC,v3,51/71] target/riscv: use softfloat lib float16 comparison functions [RFC,v3,52/71] target/riscv: rvv-1.0: floating-point compare instructions [RFC,v3,53/71] target/riscv: rvv-1.0: mask-register logical instructions [RFC,v3,54/71] target/riscv: rvv-1.0: slide instructions [RFC,v3,55/71] target/riscv: rvv-1.0: floating-point slide instructions [RFC,v3,56/71] target/riscv: rvv-1.0: narrowing fixed-point clip instructions [RFC,v3,57/71] target/riscv: rvv-1.0: single-width floating-point reduction [RFC,v3,58/71] target/riscv: rvv-1.0: widening floating-point reduction instructions [RFC,v3,59/71] target/riscv: rvv-1.0: single-width scaling shift instructions [RFC,v3,60/71] target/riscv: rvv-1.0: remove widening saturating scaled multiply-add [RFC,v3,61/71] target/riscv: rvv-1.0: remove vmford.vv and vmford.vf [RFC,v3,62/71] target/riscv: rvv-1.0: remove integer extract instruction [RFC,v3,63/71] target/riscv: rvv-1.0: floating-point min/max instructions [RFC,v3,64/71] target/riscv: introduce floating-point rounding mode enum [RFC,v3,65/71] target/riscv: rvv-1.0: floating-point/integer type-convert instructions [RFC,v3,66/71] target/riscv: rvv-1.0: widening floating-point/integer type-convert [RFC,v3,67/71] target/riscv: add "set round to odd" rounding mode helper function [RFC,v3,68/71] target/riscv: rvv-1.0: narrowing floating-point/integer type-convert [RFC,v3,69/71] target/riscv: gdb: modify gdb csr xml file to align with csr register map [RFC,v3,70/71] target/riscv: gdb: support vector registers for rv64 [RFC,v3,71/71] target/riscv: gdb: support vector registers for rv32

diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c index c7e094b6e5b..725f36fcfcc 100644 --- a/target/riscv/insn_trans/trans_rvv.inc.c +++ b/target/riscv/insn_trans/trans_rvv.inc.c @@ -609,7 +609,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data, /* * As simd_desc supports at most 256 bytes, and in this implementation, - * the max vector group length is 2048 bytes. So split it into two parts. + * the max vector group length is 1024 bytes. So split it into two parts. * * The first part is vlen in bytes, encoded in maxsz of simd_desc. * The second part is lmul, encoded in data of simd_desc. @@ -653,6 +653,7 @@ static bool ld_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldst_us_trans(a->rd, a->rs1, data, fn, s, false); } @@ -689,6 +690,7 @@ static bool st_us_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldst_us_trans(a->rd, a->rs1, data, fn, s, true); } @@ -763,6 +765,7 @@ static bool ld_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s, false); } @@ -791,6 +794,7 @@ static bool st_stride_op(DisasContext *s, arg_rnfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); fn = fns[seq]; if (fn == NULL) { @@ -889,6 +893,7 @@ static bool ld_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s, false); } @@ -940,6 +945,7 @@ static bool st_index_op(DisasContext *s, arg_rnfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s, true); } @@ -1005,6 +1011,7 @@ static bool ldff_op(DisasContext *s, arg_r2nfvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, NF, a->nf); return ldff_trans(a->rd, a->rs1, data, fn, s); } @@ -1189,6 +1196,7 @@ static bool amo_op(DisasContext *s, arg_rwdvm *a, uint8_t seq) data = FIELD_DP32(data, VDATA, VM, a->vm); data = FIELD_DP32(data, VDATA, LMUL, s->lmul); + data = FIELD_DP32(data, VDATA, SEW, s->sew); data = FIELD_DP32(data, VDATA, WD, a->wd); return amo_trans(a->rd, a->rs1, a->rs2, data, fn, s); } diff --git a/target/riscv/internals.h b/target/riscv/internals.h index bca48297dab..4fb683a7399 100644 --- a/target/riscv/internals.h +++ b/target/riscv/internals.h @@ -24,8 +24,9 @@ /* share data between vector helpers and decode code */ FIELD(VDATA, VM, 0, 1) FIELD(VDATA, LMUL, 1, 3) -FIELD(VDATA, NF, 4, 4) -FIELD(VDATA, WD, 4, 1) +FIELD(VDATA, SEW, 4, 3) +FIELD(VDATA, NF, 7, 4) +FIELD(VDATA, WD, 7, 1) /* float point classify helpers */ target_ulong fclass_h(uint64_t frs1); diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index a58051ded93..77f62c86e02 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -17,6 +17,7 @@ */ #include "qemu/osdep.h" +#include "qemu/host-utils.h" #include "cpu.h" #include "exec/memop.h" #include "exec/exec-all.h" @@ -98,6 +99,11 @@ static inline uint32_t vext_vm(uint32_t desc) return FIELD_EX32(simd_data(desc), VDATA, VM); } +static inline uint32_t vext_sew(uint32_t desc) +{ + return FIELD_EX32(simd_data(desc), VDATA, SEW); +} + /* * Encode LMUL to lmul as following: * LMUL vlmul lmul @@ -122,14 +128,35 @@ static uint32_t vext_wd(uint32_t desc) } /* - * Get vector group length in bytes. Its range is [64, 2048]. + * Get the maximum number of elements can be operated. * - * As simd_desc support at most 256, the max vlen is 512 bits. - * So vlen in bytes is encoded as maxsz. + * Use ctzl() to get log2(esz) and log2(vlenb) + * so that we can use shifts for all arithmetics. */ -static inline uint32_t vext_maxsz(uint32_t desc) +static inline uint32_t vext_max_elems(uint32_t desc, uint32_t esz, bool is_ldst) { - return simd_maxsz(desc) << vext_lmul(desc); + /* + * As simd_desc support at most 256 bytes, the max vlen is 256 bits. + * so vlen in bytes (vlenb) is encoded as maxsz. + */ + uint32_t vlenb = simd_maxsz(desc); + + if (is_ldst) { + /* + * Vector load/store instructions have the EEW encoded + * directly in the instructions. The maximum vector size is + * calculated with EMUL rather than LMUL. + */ + uint32_t eew = ctzl(esz); + uint32_t sew = vext_sew(desc); + uint32_t lmul = vext_lmul(desc); + int32_t emul = eew - sew + lmul; + uint32_t emul_r = emul < 0 ? 0 : emul; + return 1 << (ctzl(vlenb) + emul_r - ctzl(esz)); + } else { + /* Return VLMAX */ + return 1 << (ctzl(vlenb) + vext_lmul(desc) - ctzl(esz)); + } } /* @@ -224,7 +251,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base, { uint32_t i, k; uint32_t nf = vext_nf(desc); - uint32_t vlmax = vext_maxsz(desc) / esz; + uint32_t max_elems = vext_max_elems(desc, esz, true); /* probe every access*/ for (i = 0; i < env->vl; i++) { @@ -241,7 +268,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base, } while (k < nf) { target_ulong addr = base + stride * i + k * esz; - ldst_elem(env, addr, i + k * vlmax, vd, ra); + ldst_elem(env, addr, i + k * max_elems, vd, ra); k++; } } @@ -289,7 +316,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, { uint32_t i, k; uint32_t nf = vext_nf(desc); - uint32_t vlmax = vext_maxsz(desc) / esz; + uint32_t max_elems = vext_max_elems(desc, esz, true); /* probe every access */ probe_pages(env, base, env->vl * nf * esz, ra, access_type); @@ -298,7 +325,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc, k = 0; while (k < nf) { target_ulong addr = base + (i * nf + k) * esz; - ldst_elem(env, addr, i + k * vlmax, vd, ra); + ldst_elem(env, addr, i + k * max_elems, vd, ra); k++; } } @@ -379,7 +406,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base, uint32_t i, k; uint32_t nf = vext_nf(desc); uint32_t vm = vext_vm(desc); - uint32_t vlmax = vext_maxsz(desc) / esz; + uint32_t max_elems = vext_max_elems(desc, esz, true); /* probe every access*/ for (i = 0; i < env->vl; i++) { @@ -397,7 +424,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base, } while (k < nf) { abi_ptr addr = get_index_addr(base, i, vs2) + k * esz; - ldst_elem(env, addr, i + k * vlmax, vd, ra); + ldst_elem(env, addr, i + k * max_elems, vd, ra); k++; } } @@ -467,7 +494,7 @@ vext_ldff(void *vd, void *v0, target_ulong base, uint32_t i, k, vl = 0; uint32_t nf = vext_nf(desc); uint32_t vm = vext_vm(desc); - uint32_t vlmax = vext_maxsz(desc) / esz; + uint32_t max_elems = vext_max_elems(desc, esz, true); target_ulong addr, offset, remain; /* probe every access*/ @@ -518,7 +545,7 @@ ProbeSuccess: } while (k < nf) { target_ulong addr = base + (i * nf + k) * esz; - ldst_elem(env, addr, i + k * vlmax, vd, ra); + ldst_elem(env, addr, i + k * max_elems, vd, ra); k++; } } @@ -1226,7 +1253,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, \ void *vs2, CPURISCVState *env, uint32_t desc) \ { \ uint32_t vl = env->vl; \ - uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE); \ + uint32_t vlmax = vext_max_elems(desc, sizeof(ETYPE), false);\ uint32_t i; \ \ for (i = 0; i < vl; i++) { \ @@ -3887,7 +3914,7 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void *vs2, \ { \ uint32_t vm = vext_vm(desc); \ uint32_t vl = env->vl; \ - uint32_t vlmax = vext_maxsz(desc) / sizeof(ETYPE); \ + uint32_t vlmax = vext_max_elems(desc, sizeof(ETYPE), false); \ uint32_t i; \ \ for (i = 0; i < vl; i++) { \ @@ -4692,7 +4719,7 @@ GEN_VEXT_VSLIDE1DOWN_VX(vslide1down_vx_d, uint64_t, H8) void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, \ CPURISCVState *env, uint32_t desc) \ { \ - uint32_t vlmax = env_archcpu(env)->cfg.vlen; \ + uint32_t vlmax = vext_max_elems(desc, sizeof(ETYPE), false); \ uint32_t vm = vext_vm(desc); \ uint32_t vl = env->vl; \ uint32_t index, i; \ @@ -4720,7 +4747,7 @@ GEN_VEXT_VRGATHER_VV(vrgather_vv_d, uint64_t, H8) void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2, \ CPURISCVState *env, uint32_t desc) \ { \ - uint32_t vlmax = env_archcpu(env)->cfg.vlen; \ + uint32_t vlmax = vext_max_elems(desc, sizeof(ETYPE), false); \ uint32_t vm = vext_vm(desc); \ uint32_t vl = env->vl; \ uint32_t index = s1, i; \

[RFC,v3,26/71] target/riscv: rvv-1.0: update vext_max_elems() for load/store insns

Commit Message

Comments

Patch