diff mbox series

[v1,03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY

Message ID 20190411100836.646-4-david@redhat.com
State New
Headers show
Series s390x/tcg: Vector Instruction Support Part 2 | expand

Commit Message

David Hildenbrand April 11, 2019, 10:07 a.m. UTC
Only 64 bit handling is really easy. 128 bit handling is performed
via an ool handler, introducing s390_vec_add() that will be reused later.

8/16/32 bit handling is black magic inspired by gen_addv_mask(). If
there is every a bug detected in there, throw it away and simply use
ool helpers for 8/16 bit handling and something like 64 bit handling for
32 bit handling.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/Makefile.objs      |  2 +-
 target/s390x/helper.h           |  3 ++
 target/s390x/insn-data.def      |  2 +
 target/s390x/translate_vx.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++
 5 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/vec_int_helper.c

Comments

Richard Henderson April 12, 2019, 9:05 p.m. UTC | #1
On 4/11/19 12:07 AM, David Hildenbrand wrote:
> +    static const GVecGen3 g[5] = {
> +        { .fni8 = gen_acc8_i64, },
> +        { .fni8 = gen_acc16_i64, },
> +        { .fni8 = gen_acc32_i64, },
> +        { .fni8 = gen_acc_i64, },
> +        { .fno = gen_helper_gvec_vacc128, },
> +    };

Vector versions of the first four are fairly simple too.

static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
{
    tcgv_vec t = tcg_temp_new_vec_matching(d);

    tcg_gen_add_vec(vece, t, a, b);
    tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
    tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
}

  { .fni8 = gen_acc8_i64,
    .fniv = gen_acc_vec,
    .opc = INDEX_op_cmp_vec,
    .vece = MO_8 },
  ...


I'm surprised that you're expanding the 128-bit addition out-of-line.
One possible expansion is

  tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
  tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
  tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
  /* carry out in th */


r~
David Hildenbrand April 16, 2019, 8:01 a.m. UTC | #2
On 12.04.19 23:05, Richard Henderson wrote:
> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>> +    static const GVecGen3 g[5] = {
>> +        { .fni8 = gen_acc8_i64, },
>> +        { .fni8 = gen_acc16_i64, },
>> +        { .fni8 = gen_acc32_i64, },
>> +        { .fni8 = gen_acc_i64, },
>> +        { .fno = gen_helper_gvec_vacc128, },
>> +    };
> 
> Vector versions of the first four are fairly simple too.
> 
> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
> {
>     tcgv_vec t = tcg_temp_new_vec_matching(d);
> 
>     tcg_gen_add_vec(vece, t, a, b);
>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
> }
> 
>   { .fni8 = gen_acc8_i64,
>     .fniv = gen_acc_vec,
>     .opc = INDEX_op_cmp_vec,
>     .vece = MO_8 },
>   ...
> 

Indeed, I didn't really explore vector operations yet. This is more
compact than I expected :)

> 
> I'm surprised that you're expanding the 128-bit addition out-of-line.
> One possible expansion is
> 
>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>   /* carry out in th */

Nice trick. Just so I get it right, the third line should actually be

tcg_gen_add2_i64(tl, th, tl, th, bh, zero);

right? Thanks!
Richard Henderson April 16, 2019, 8:17 a.m. UTC | #3
On 4/15/19 10:01 PM, David Hildenbrand wrote:
> On 12.04.19 23:05, Richard Henderson wrote:
>> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>>> +    static const GVecGen3 g[5] = {
>>> +        { .fni8 = gen_acc8_i64, },
>>> +        { .fni8 = gen_acc16_i64, },
>>> +        { .fni8 = gen_acc32_i64, },
>>> +        { .fni8 = gen_acc_i64, },
>>> +        { .fno = gen_helper_gvec_vacc128, },
>>> +    };
>>
>> Vector versions of the first four are fairly simple too.
>>
>> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
>> {
>>     tcgv_vec t = tcg_temp_new_vec_matching(d);
>>
>>     tcg_gen_add_vec(vece, t, a, b);
>>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
>> }
>>
>>   { .fni8 = gen_acc8_i64,
>>     .fniv = gen_acc_vec,
>>     .opc = INDEX_op_cmp_vec,
>>     .vece = MO_8 },
>>   ...
>>
> 
> Indeed, I didn't really explore vector operations yet. This is more
> compact than I expected :)

:-)

That said, in implementing vector variable shifts today,
I've come up with a representational problem here.
You may want to hold off on these until I can address them.


>>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>>   /* carry out in th */
> 
> Nice trick. Just so I get it right, the third line should actually be
> 
> tcg_gen_add2_i64(tl, th, tl, th, bh, zero);
> 
> right? Thanks!

Oops, yes, typo indeed.


r~
David Hildenbrand April 16, 2019, 8:33 a.m. UTC | #4
>>
>> Indeed, I didn't really explore vector operations yet. This is more
>> compact than I expected :)
> 
> :-)
> 
> That said, in implementing vector variable shifts today,
> I've come up with a representational problem here.
> You may want to hold off on these until I can address them.

I assume you mean vector helpers *in general* in this file. Yes, we can
add them later.

What exact problem are you dealing with?
Richard Henderson April 16, 2019, 8:43 a.m. UTC | #5
On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>
>>> Indeed, I didn't really explore vector operations yet. This is more
>>> compact than I expected :)
>>
>> :-)
>>
>> That said, in implementing vector variable shifts today,
>> I've come up with a representational problem here.
>> You may want to hold off on these until I can address them.
> 
> I assume you mean vector helpers *in general* in this file. Yes, we can
> add them later.
> 
> What exact problem are you dealing with?

The .opc field lets you only specify one opcode which is optional in the
backend on which you depend.  In writing support for AArch64 USHL, I find that
I needed 3 optional opcodes.

I'm thinking of a mass change whereby .opc becomes a pointer to an array with
terminator.  But then, for debugging purposes, I think I need to validate that
array, so that it's not missing things that ought to be specified, but which
happen to be supported by the current host.


r~
David Hildenbrand April 16, 2019, 8:46 a.m. UTC | #6
On 16.04.19 10:43, Richard Henderson wrote:
> On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>>
>>>> Indeed, I didn't really explore vector operations yet. This is more
>>>> compact than I expected :)
>>>
>>> :-)
>>>
>>> That said, in implementing vector variable shifts today,
>>> I've come up with a representational problem here.
>>> You may want to hold off on these until I can address them.
>>
>> I assume you mean vector helpers *in general* in this file. Yes, we can
>> add them later.
>>
>> What exact problem are you dealing with?
> 
> The .opc field lets you only specify one opcode which is optional in the
> backend on which you depend.  In writing support for AArch64 USHL, I find that
> I needed 3 optional opcodes.

I was asking myself this exact thing when looking at the opc field in
the example you gave ("which instruction is one supposed to indicate
here") :)

> 
> I'm thinking of a mass change whereby .opc becomes a pointer to an array with
> terminator.  But then, for debugging purposes, I think I need to validate that
> array, so that it's not missing things that ought to be specified, but which
> happen to be supported by the current host.

Makes sense!
diff mbox series

Patch

diff --git a/target/s390x/Makefile.objs b/target/s390x/Makefile.objs
index 68eeee3d2f..993ac93ed6 100644
--- a/target/s390x/Makefile.objs
+++ b/target/s390x/Makefile.objs
@@ -1,7 +1,7 @@ 
 obj-y += cpu.o cpu_models.o cpu_features.o gdbstub.o interrupt.o helper.o
 obj-$(CONFIG_TCG) += translate.o cc_helper.o excp_helper.o fpu_helper.o
 obj-$(CONFIG_TCG) += int_helper.o mem_helper.o misc_helper.o crypto_helper.o
-obj-$(CONFIG_TCG) += vec_helper.o
+obj-$(CONFIG_TCG) += vec_helper.o vec_int_helper.o
 obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o mmu_helper.o diag.o
 obj-$(CONFIG_SOFTMMU) += sigp.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 0b494a2fd2..2c1b223248 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -145,6 +145,9 @@  DEF_HELPER_5(gvec_vpkls_cc64, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
+/* === Vector Integer Instructions === */
+DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
 DEF_HELPER_4(diag, void, env, i32, i32, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 74a0ccc770..f0e62b9aa8 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1058,6 +1058,8 @@ 
 
 /* VECTOR ADD */
     F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
+/* VECTOR ADD COMPUTE CARRY */
+    F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 2f84ea0511..c3bc47f1a9 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -136,6 +136,9 @@  static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_3(v1, v2, v3, gen) \
+    tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   vec_full_reg_offset(v3), 16, 16, gen)
 #define gen_gvec_3_ool(v1, v2, v3, data, fn) \
     tcg_gen_gvec_3_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), 16, 16, data, fn)
@@ -985,3 +988,74 @@  static DisasJumpType op_va(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_acc(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, uint8_t es)
+{
+    const uint8_t msb_bit_nr = NUM_VEC_ELEMENT_BITS(es) - 1;
+    TCGv_i64 msb_mask = tcg_const_i64(dup_const(es, 1ull << msb_bit_nr));
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+
+    /* Calculate the carry into the MSB, ignoring the old MSBs */
+    tcg_gen_andc_i64(t1, a, msb_mask);
+    tcg_gen_andc_i64(t2, b, msb_mask);
+    tcg_gen_add_i64(t1, t1, t2);
+    /* Calculate the MSB without any carry into it */
+    tcg_gen_xor_i64(t3, a, b);
+    /* Calculate the carry out of the MSB in the MSB bit position */
+    tcg_gen_and_i64(d, a, b);
+    tcg_gen_and_i64(t1, t1, t3);
+    tcg_gen_or_i64(d, d, t1);
+    /* Isolate and shift the carry into position */
+    tcg_gen_and_i64(d, d, msb_mask);
+    tcg_gen_shri_i64(d, d, msb_bit_nr);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
+}
+
+static void gen_acc8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_8);
+}
+
+static void gen_acc16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_16);
+}
+
+static void gen_acc32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_32);
+}
+
+static void gen_acc_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_add_i64(t, a, b);
+    tcg_gen_setcond_i64(TCG_COND_LTU, d, t, b);
+    tcg_temp_free_i64(t);
+}
+
+static DisasJumpType op_vacc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[5] = {
+        { .fni8 = gen_acc8_i64, },
+        { .fni8 = gen_acc16_i64, },
+        { .fni8 = gen_acc32_i64, },
+        { .fni8 = gen_acc_i64, },
+        { .fno = gen_helper_gvec_vacc128, },
+    };
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
new file mode 100644
index 0000000000..0b232571bc
--- /dev/null
+++ b/target/s390x/vec_int_helper.c
@@ -0,0 +1,47 @@ 
+/*
+ * QEMU TCG support -- s390x vector integer instruction support
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "vec.h"
+#include "exec/helper-proto.h"
+
+/*
+ * Add two 128 bit vectors, returning the carry.
+ */
+static bool s390_vec_add(S390Vector *d, const S390Vector *a,
+                         const S390Vector *b)
+{
+    bool low_carry = false, high_carry = false;
+
+    if (a->doubleword[0] + b->doubleword[0] < a->doubleword[0]) {
+        high_carry = true;
+    }
+    if (a->doubleword[1] + b->doubleword[1] < a->doubleword[1]) {
+        low_carry = true;
+        if (a->doubleword[0] == b->doubleword[0]) {
+            high_carry = true;
+        }
+    }
+    d->doubleword[0] = a->doubleword[0] + b->doubleword[0] + low_carry;
+    d->doubleword[1] = a->doubleword[1] + b->doubleword[1];
+    return high_carry;
+}
+
+void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
+                          uint32_t desc)
+{
+    S390Vector tmp, *dst = v1;
+
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = s390_vec_add(&tmp, v2, v3);
+}