diff mbox series

[PULL,16/20] target/arm: Use vector infrastructure for aa64 compares

Message ID 20180207225540.31698-17-richard.henderson@linaro.org
State New
Headers show
Series [PULL,01/20] tcg: Allow multiple word entries into the constant pool | expand

Commit Message

Richard Henderson Feb. 7, 2018, 10:55 p.m. UTC
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 93 +++++++++++++++++++++++++++++-----------------
 1 file changed, 59 insertions(+), 34 deletions(-)

Comments

Emilio Cota April 4, 2018, 4:49 p.m. UTC | #1
On Wed, Feb 07, 2018 at 14:55:36 -0800, Richard Henderson wrote:
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Just bisected a segfault for aarch64 nbench on a PowerPC host
to this commit (79d61de6bdc398). I've also tested on x86_64 and
aarch64 hosts, and these do not seem to be affected.

To reproduce:
1. grab this binary:
  http://cs.columbia.edu/~cota/qemu/nbench-aarch64
2. run it on a PowerPC host with:
  $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V

Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.

I can reproduce this consistently on a Power8 host -- I'm using gcc112
from the gcc compile farm.

Thanks,

		Emilio
Alex Bennée April 4, 2018, 6:24 p.m. UTC | #2
Emilio G. Cota <cota@braap.org> writes:

> On Wed, Feb 07, 2018 at 14:55:36 -0800, Richard Henderson wrote:
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> Just bisected a segfault for aarch64 nbench on a PowerPC host
> to this commit (79d61de6bdc398). I've also tested on x86_64 and
> aarch64 hosts, and these do not seem to be affected.

I wonder about other hosts without TCG_vec backend support (powerpc is
one)? I suspect it must be the fall-back for the vector expanders is
wrong.

>
> To reproduce:
> 1. grab this binary:
>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
> 2. run it on a PowerPC host with:
>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
>
> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.
>
> I can reproduce this consistently on a Power8 host -- I'm using gcc112
> from the gcc compile farm.
>
> Thanks,
>
> 		Emilio


--
Alex Bennée
Emilio Cota April 4, 2018, 9:39 p.m. UTC | #3
On Wed, Apr 04, 2018 at 19:24:52 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
> > On Wed, Feb 07, 2018 at 14:55:36 -0800, Richard Henderson wrote:
> >> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> >> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >
> > Just bisected a segfault for aarch64 nbench on a PowerPC host
> > to this commit (79d61de6bdc398). I've also tested on x86_64 and
> > aarch64 hosts, and these do not seem to be affected.
> 
> I wonder about other hosts without TCG_vec backend support (powerpc is
> one)? I suspect it must be the fall-back for the vector expanders is
> wrong.

I just tried on two additional machines:

- Reproduced on a different PowerPC host with different endianness
  (gcc110, which is power7 and therefore big endian)

- Could not reproduce on a mips64 host (gcc23)

So it might be a Power-only thing.

Thanks,

		Emilio
Richard Henderson April 5, 2018, 12:07 a.m. UTC | #4
On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
> 1. grab this binary:
>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
> 2. run it on a PowerPC host with:
>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
> 
> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.

How quickly?  I did not see one up until it exited for lack of NNET.DAT.

I will note that I am using gcc 7.2 on gcc112 (/opt/cfarm/gcc-latest/bin/gcc).

Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
That commit might have made a difference...


r~
Richard Henderson April 5, 2018, 12:54 a.m. UTC | #5
On 04/05/2018 10:07 AM, Richard Henderson wrote:
> On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
>> 1. grab this binary:
>>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
>> 2. run it on a PowerPC host with:
>>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
>>
>> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.
> 
> How quickly?  I did not see one up until it exited for lack of NNET.DAT.
> 
> I will note that I am using gcc 7.2 on gcc112 (/opt/cfarm/gcc-latest/bin/gcc).
> 
> Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
> That commit might have made a difference...

Bah.  I confirm that it doesn't, and that the test still fails.

Since qemu does work on the same host when built with gcc 7.2, I can only blame
this failure on a compiler bug wrt gcc 4.8 on ppc64.

I have not tracked down exactly what's going wrong, and probably won't unless
someone feels that it's worthwhile.  This only begs the question of whether we
should blacklist this compiler entirely.  Certainly it's old enough that it's
probably not worth fixing.


r~
Emilio Cota April 5, 2018, 4:45 a.m. UTC | #6
On Thu, Apr 05, 2018 at 10:54:46 +1000, Richard Henderson wrote:
> On 04/05/2018 10:07 AM, Richard Henderson wrote:
> > On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
> >> 1. grab this binary:
> >>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
> >> 2. run it on a PowerPC host with:
> >>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
> >>
> >> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.
> > 
> > How quickly?  I did not see one up until it exited for lack of NNET.DAT.
> > 
> > I will note that I am using gcc 7.2 on gcc112 (/opt/cfarm/gcc-latest/bin/gcc).
> > 
> > Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
> > That commit might have made a difference...
> 
> Bah.  I confirm that it doesn't, and that the test still fails.
> 
> Since qemu does work on the same host when built with gcc 7.2, I can only blame
> this failure on a compiler bug wrt gcc 4.8 on ppc64.

Thanks! I confirm it as well that it works with the more recent gcc.

> I have not tracked down exactly what's going wrong, and probably won't unless
> someone feels that it's worthwhile.  This only begs the question of whether we
> should blacklist this compiler entirely.  Certainly it's old enough that it's
> probably not worth fixing.

Agreed.

Thanks,

		Emilio
Thomas Huth April 5, 2018, 5:06 a.m. UTC | #7
On 05.04.2018 02:54, Richard Henderson wrote:
> On 04/05/2018 10:07 AM, Richard Henderson wrote:
>> On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
>>> 1. grab this binary:
>>>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
>>> 2. run it on a PowerPC host with:
>>>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
>>>
>>> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.
>>
>> How quickly?  I did not see one up until it exited for lack of NNET.DAT.
>>
>> I will note that I am using gcc 7.2 on gcc112 (/opt/cfarm/gcc-latest/bin/gcc).
>>
>> Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
>> That commit might have made a difference...
> 
> Bah.  I confirm that it doesn't, and that the test still fails.
> 
> Since qemu does work on the same host when built with gcc 7.2, I can only blame
> this failure on a compiler bug wrt gcc 4.8 on ppc64.
> 
> I have not tracked down exactly what's going wrong, and probably won't unless
> someone feels that it's worthwhile.  This only begs the question of whether we
> should blacklist this compiler entirely.  Certainly it's old enough that it's
> probably not worth fixing.

GCC 4.8 is still used in enterprise distros like RHEL7 and AFAIK also in
SLES 12 ... so blacklisting this compiler does not sound like a good
idea to me.

 Thomas
Richard Henderson April 6, 2018, 12:56 a.m. UTC | #8
On 04/05/2018 03:06 PM, Thomas Huth wrote:
> On 05.04.2018 02:54, Richard Henderson wrote:
>> On 04/05/2018 10:07 AM, Richard Henderson wrote:
>>> On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
>>>> 1. grab this binary:
>>>>   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
>>>> 2. run it on a PowerPC host with:
>>>>   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V
>>>>
>>>> Note: the "-V" (or "-v") flag is important! Without it, there's no segfault.
>>>
>>> How quickly?  I did not see one up until it exited for lack of NNET.DAT.
>>>
>>> I will note that I am using gcc 7.2 on gcc112 (/opt/cfarm/gcc-latest/bin/gcc).
>>>
>>> Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
>>> That commit might have made a difference...
>>
>> Bah.  I confirm that it doesn't, and that the test still fails.
>>
>> Since qemu does work on the same host when built with gcc 7.2, I can only blame
>> this failure on a compiler bug wrt gcc 4.8 on ppc64.
>>
>> I have not tracked down exactly what's going wrong, and probably won't unless
>> someone feels that it's worthwhile.  This only begs the question of whether we
>> should blacklist this compiler entirely.  Certainly it's old enough that it's
>> probably not worth fixing.
> 
> GCC 4.8 is still used in enterprise distros like RHEL7 and AFAIK also in
> SLES 12 ... so blacklisting this compiler does not sound like a good
> idea to me.

I have now tracked down this failure and it turns out not to be a compiler bug
at all; patch to follow.


r~
diff mbox series

Patch

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5717e85def..2ac297fc14 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -7209,6 +7209,28 @@  static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
     }
 }
 
+/* CMTST : test is "if (X & Y != 0)". */
+static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_and_i32(d, a, b);
+    tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
+    tcg_gen_neg_i32(d, d);
+}
+
+static void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_and_i64(d, a, b);
+    tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
+    tcg_gen_neg_i64(d, d);
+}
+
+static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    tcg_gen_and_vec(vece, d, a, b);
+    tcg_gen_dupi_vec(vece, a, 0);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
+}
+
 static void handle_3same_64(DisasContext *s, int opcode, bool u,
                             TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
 {
@@ -7252,10 +7274,7 @@  static void handle_3same_64(DisasContext *s, int opcode, bool u,
             cond = TCG_COND_EQ;
             goto do_cmop;
         }
-        /* CMTST : test is "if (X & Y != 0)". */
-        tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm);
-        tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0);
-        tcg_gen_neg_i64(tcg_rd, tcg_rd);
+        gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
@@ -9733,6 +9752,22 @@  static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
 /* Integer op subgroup of C3.6.16. */
 static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
 {
+    static const GVecGen3 cmtst_op[4] = {
+        { .fni4 = gen_helper_neon_tst_u8,
+          .fniv = gen_cmtst_vec,
+          .vece = MO_8 },
+        { .fni4 = gen_helper_neon_tst_u16,
+          .fniv = gen_cmtst_vec,
+          .vece = MO_16 },
+        { .fni4 = gen_cmtst_i32,
+          .fniv = gen_cmtst_vec,
+          .vece = MO_32 },
+        { .fni8 = gen_cmtst_i64,
+          .fniv = gen_cmtst_vec,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
     int is_q = extract32(insn, 30, 1);
     int u = extract32(insn, 29, 1);
     int size = extract32(insn, 22, 2);
@@ -9741,6 +9776,7 @@  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
     int pass;
+    TCGCond cond;
 
     switch (opcode) {
     case 0x13: /* MUL, PMUL */
@@ -9788,6 +9824,25 @@  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_add, size);
         }
         return;
+    case 0x11:
+        if (!u) { /* CMTST */
+            gen_gvec_op3(s, is_q, rd, rn, rm, &cmtst_op[size]);
+            return;
+        }
+        /* else CMEQ */
+        cond = TCG_COND_EQ;
+        goto do_gvec_cmp;
+    case 0x06: /* CMGT, CMHI */
+        cond = u ? TCG_COND_GTU : TCG_COND_GT;
+        goto do_gvec_cmp;
+    case 0x07: /* CMGE, CMHS */
+        cond = u ? TCG_COND_GEU : TCG_COND_GE;
+    do_gvec_cmp:
+        tcg_gen_gvec_cmp(cond, size, vec_full_reg_offset(s, rd),
+                         vec_full_reg_offset(s, rn),
+                         vec_full_reg_offset(s, rm),
+                         is_q ? 16 : 8, vec_full_reg_size(s));
+        return;
     }
 
     if (size == 3) {
@@ -9870,26 +9925,6 @@  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genenvfn = fns[size][u];
                 break;
             }
-            case 0x6: /* CMGT, CMHI */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_u8 },
-                    { gen_helper_neon_cgt_s16, gen_helper_neon_cgt_u16 },
-                    { gen_helper_neon_cgt_s32, gen_helper_neon_cgt_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
-            case 0x7: /* CMGE, CMHS */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_cge_s8, gen_helper_neon_cge_u8 },
-                    { gen_helper_neon_cge_s16, gen_helper_neon_cge_u16 },
-                    { gen_helper_neon_cge_s32, gen_helper_neon_cge_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x8: /* SSHL, USHL */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
@@ -9962,16 +9997,6 @@  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x11: /* CMTST, CMEQ */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_tst_u8, gen_helper_neon_ceq_u8 },
-                    { gen_helper_neon_tst_u16, gen_helper_neon_ceq_u16 },
-                    { gen_helper_neon_tst_u32, gen_helper_neon_ceq_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x13: /* MUL, PMUL */
                 if (u) {
                     /* PMUL */