Message ID | 20190411100836.646-10-david@redhat.com |
---|---|
State | New |
Headers | show |
Series | s390x/tcg: Vector Instruction Support Part 2 | expand |
On 4/11/19 12:08 AM, David Hildenbrand wrote: > + read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32); > + for (i = 0; i < 4; i++) { > + read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32); > + tcg_gen_add_i32(sum, sum, tmp); > + tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp); > + tcg_gen_add_i32(sum, sum, tmp); > + } > + zero_vec(get_field(s->fields, v1)); > + write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32); It seems like it should be possible to implement this with i64, and fold the carry around at the end -- 2 insns instead of 12 for managing carry. But I can't quite tell if that produces the same results. You could use tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero); tcg_gen_add_i32(sum, sum, tmp); instead of computing carry manually with setcond. That said, your code exactly matches the language in the manual, so Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 13.04.19 01:01, Richard Henderson wrote: > On 4/11/19 12:08 AM, David Hildenbrand wrote: >> + read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32); >> + for (i = 0; i < 4; i++) { >> + read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32); >> + tcg_gen_add_i32(sum, sum, tmp); >> + tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp); >> + tcg_gen_add_i32(sum, sum, tmp); >> + } >> + zero_vec(get_field(s->fields, v1)); >> + write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32); > > It seems like it should be possible to implement this with i64, and fold the > carry around at the end -- 2 insns instead of 12 for managing carry. But I > can't quite tell if that produces the same results. I had the same in mind but also wasn't sure if it would produce the exact same result. Feels like it should. > > You could use > > tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero); > tcg_gen_add_i32(sum, sum, tmp); That makes perfect sense, I will use that for now, thanks! > > instead of computing carry manually with setcond. > > That said, your code exactly matches the language in the manual, so > > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > > > r~ >
On 4/15/19 10:58 PM, David Hildenbrand wrote: >> You could use >> >> tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero); >> tcg_gen_add_i32(sum, sum, tmp); > That makes perfect sense, I will use that for now, thanks! > Here's a funny one. We can do this in one operation: tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp); The lower (sum+tmp) carries into the upper (sum+tmp). We take the upper result and discard the lower. r~
On 16.04.19 11:08, Richard Henderson wrote: > On 4/15/19 10:58 PM, David Hildenbrand wrote: >>> You could use >>> >>> tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero); >>> tcg_gen_add_i32(sum, sum, tmp); >> That makes perfect sense, I will use that for now, thanks! >> > > Here's a funny one. We can do this in one operation: > > tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp); :D I had to look at it 10 times. Very nice trick. > > The lower (sum+tmp) carries into the upper (sum+tmp). > We take the upper result and discard the lower. > > > r~ >
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def index 9889dc0b01..64459465c5 100644 --- a/target/s390x/insn-data.def +++ b/target/s390x/insn-data.def @@ -1072,6 +1072,8 @@ F(0xe7f2, VAVG, VRR_c, V, 0, 0, 0, 0, vavg, 0, IF_VEC) /* VECTOR AVERAGE LOGICAL */ F(0xe7f0, VAVGL, VRR_c, V, 0, 0, 0, 0, vavgl, 0, IF_VEC) +/* VECTOR CHECKSUM */ + F(0xe766, VCKSM, VRR_c, V, 0, 0, 0, 0, vcksm, 0, IF_VEC) #ifndef CONFIG_USER_ONLY /* COMPARE AND SWAP AND PURGE */ diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c index a190ac57ee..7a7e185d43 100644 --- a/target/s390x/translate_vx.inc.c +++ b/target/s390x/translate_vx.inc.c @@ -90,6 +90,33 @@ static void read_vec_element_i64(TCGv_i64 dst, uint8_t reg, uint8_t enr, } } +static void read_vec_element_i32(TCGv_i32 dst, uint8_t reg, uint8_t enr, + TCGMemOp memop) +{ + const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE); + + switch (memop) { + case ES_8: + tcg_gen_ld8u_i32(dst, cpu_env, offs); + break; + case ES_16: + tcg_gen_ld16u_i32(dst, cpu_env, offs); + break; + case ES_8 | MO_SIGN: + tcg_gen_ld8s_i32(dst, cpu_env, offs); + break; + case ES_16 | MO_SIGN: + tcg_gen_ld16s_i32(dst, cpu_env, offs); + break; + case ES_32: + case ES_32 | MO_SIGN: + tcg_gen_ld_i32(dst, cpu_env, offs); + break; + default: + g_assert_not_reached(); + } +} + static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr, TCGMemOp memop) { @@ -113,6 +140,25 @@ static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr, } } +static void write_vec_element_i32(TCGv_i32 src, int reg, uint8_t enr, + TCGMemOp memop) +{ + const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE); + + switch (memop) { + case ES_8: + tcg_gen_st8_i32(src, cpu_env, offs); + break; + case ES_16: + tcg_gen_st16_i32(src, cpu_env, offs); + break; + case ES_32: + tcg_gen_st_i32(src, cpu_env, offs); + break; + default: + g_assert_not_reached(); + } +} static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr, uint8_t es) @@ -1260,3 +1306,24 @@ static DisasJumpType op_vavgl(DisasContext *s, DisasOps *o) get_field(s->fields, v3), &g[es]); return DISAS_NEXT; } + +static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o) +{ + TCGv_i32 tmp = tcg_temp_new_i32(); + TCGv_i32 sum = tcg_temp_new_i32(); + int i; + + read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32); + for (i = 0; i < 4; i++) { + read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32); + tcg_gen_add_i32(sum, sum, tmp); + tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp); + tcg_gen_add_i32(sum, sum, tmp); + } + zero_vec(get_field(s->fields, v1)); + write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32); + + tcg_temp_free_i32(tmp); + tcg_temp_free_i32(sum); + return DISAS_NEXT; +}
Time to introduce read_vec_element_i32 and write_vec_element_i32. Take proper care of properly adding the carry. Signed-off-by: David Hildenbrand <david@redhat.com> --- target/s390x/insn-data.def | 2 + target/s390x/translate_vx.inc.c | 67 +++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+)