diff mbox

[ARM,2/2] Vectorise lroundf, lfloorf, lceilf using the new ARMv8-A vcvt* instructions

Message ID 5405E39C.30100@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Sept. 2, 2014, 3:34 p.m. UTC
Hi all,

In continuation of patch [1/2]...
We can use the vector forms of the vcvt{a,p,m} instructions to vectorise 
the l{round, ceil, floor}f functions.
Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
implementation is updated to wire up the vectorised forms of these 
functions to the midend.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/62275
     * config/arm/neon.md
  (neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode>
     <v_cmp_result>): New pattern.
     * config/arm/iterators.md (NEON_VCVT): New int iterator.
     * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf, vcvtauv2sf,
     vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf,
     vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions.
     * config/arm/arm.c (arm_builtin_vectorized_function): Handle
     BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF.

2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/62275
     * gcc.target/arm/vect-lceilf_1.c: New test.
     * gcc.target/arm/vect-lfloorf_1.c: Likewise.
     * gcc.target/arm/vect-lroundf_1.c: Likewise.

Comments

Ramana Radhakrishnan Sept. 2, 2014, 3:48 p.m. UTC | #1
On 02/09/14 16:34, Kyrill Tkachov wrote:
> Hi all,
>
> In continuation of patch [1/2]...
> We can use the vector forms of the vcvt{a,p,m} instructions to vectorise
> the l{round, ceil, floor}f functions.
> Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> implementation is updated to wire up the vectorised forms of these
> functions to the midend.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf.
>
> Ok for trunk?

Ok - thanks.

Ramana
>
> Thanks,
> Kyrill
>
> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/62275
>       * config/arm/neon.md
>    (neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode>
>       <v_cmp_result>): New pattern.
>       * config/arm/iterators.md (NEON_VCVT): New int iterator.
>       * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf, vcvtauv2sf,
>       vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf,
>       vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions.
>       * config/arm/arm.c (arm_builtin_vectorized_function): Handle
>       BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF.
>
> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/62275
>       * gcc.target/arm/vect-lceilf_1.c: New test.
>       * gcc.target/arm/vect-lfloorf_1.c: Likewise.
>       * gcc.target/arm/vect-lroundf_1.c: Likewise.
>
Christophe Lyon Sept. 3, 2014, 7:42 a.m. UTC | #2
Hi Kyrill,

I've noticed that the tests you added with this patch fail
(scan-tree-dump-times) for the armeb-none-linux-gnueabihf target.
Not sure if you want to fix your patch or the tests?

Christophe.


On 2 September 2014 17:48, Ramana Radhakrishnan
<ramana.radhakrishnan@arm.com> wrote:
>
>
> On 02/09/14 16:34, Kyrill Tkachov wrote:
>>
>> Hi all,
>>
>> In continuation of patch [1/2]...
>> We can use the vector forms of the vcvt{a,p,m} instructions to vectorise
>> the l{round, ceil, floor}f functions.
>> Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> implementation is updated to wire up the vectorised forms of these
>> functions to the midend.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>
>> Ok for trunk?
>
>
> Ok - thanks.
>
> Ramana
>
>>
>> Thanks,
>> Kyrill
>>
>> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/62275
>>       * config/arm/neon.md
>>    (neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode>
>>       <v_cmp_result>): New pattern.
>>       * config/arm/iterators.md (NEON_VCVT): New int iterator.
>>       * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf,
>> vcvtauv2sf,
>>       vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf,
>>       vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions.
>>       * config/arm/arm.c (arm_builtin_vectorized_function): Handle
>>       BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF.
>>
>> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/62275
>>       * gcc.target/arm/vect-lceilf_1.c: New test.
>>       * gcc.target/arm/vect-lfloorf_1.c: Likewise.
>>       * gcc.target/arm/vect-lroundf_1.c: Likewise.
>>
>
Kyrylo Tkachov Sept. 3, 2014, 9:29 a.m. UTC | #3
On 03/09/14 08:42, Christophe Lyon wrote:
> Hi Kyrill,
>
> I've noticed that the tests you added with this patch fail
> (scan-tree-dump-times) for the armeb-none-linux-gnueabihf target.
> Not sure if you want to fix your patch or the tests?

Hi Christophe,

Ah, I reproduced it on armeb-none-eabi. The problem is that our NEON 
movmisalign pattern
is disabled for big-endian so the vectoriser refuses to do load from the 
input pointer:
vect-lceilf_1.c:13:3: note: Setting misalignment to -1.
vect-lceilf_1.c:13:3: note: not vectorized: unsupported unaligned load.*_9
vect-lceilf_1.c:13:3: note: bad data alignment.

Seems like that's deliberate:
(define_expand "movmisalign<mode>"
   [(set (match_operand:VDQX 0 "neon_perm_struct_or_reg_operand")
     (unspec:VDQX [(match_operand:VDQX 1 "neon_perm_struct_or_reg_operand")]
              UNSPEC_MISALIGNED_ACCESS))]
   "TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access"

I can also see the following tests fail on big-endian:
FAIL: gcc.target/arm/vect-rounding-btruncf.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.target/arm/vect-rounding-ceilf.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.target/arm/vect-rounding-floorf.c scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.target/arm/vect-rounding-roundf.c scan-tree-dump-times vect 
"vectorized 1 loops" 1

presumably for the same reason.
I guess the way to fix this is to make the input and output arrays 
global variables and force them to align to 128 bits so we don't have to 
use misaligned accesses.

I'll fix the tests up.

Thanks,
Kyrill


> Christophe.
>
>
> On 2 September 2014 17:48, Ramana Radhakrishnan
> <ramana.radhakrishnan@arm.com> wrote:
>>
>> On 02/09/14 16:34, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In continuation of patch [1/2]...
>>> We can use the vector forms of the vcvt{a,p,m} instructions to vectorise
>>> the l{round, ceil, floor}f functions.
>>> Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>>> implementation is updated to wire up the vectorised forms of these
>>> functions to the midend.
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>>
>>> Ok for trunk?
>>
>> Ok - thanks.
>>
>> Ramana
>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>        PR target/62275
>>>        * config/arm/neon.md
>>>     (neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode>
>>>        <v_cmp_result>): New pattern.
>>>        * config/arm/iterators.md (NEON_VCVT): New int iterator.
>>>        * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf,
>>> vcvtauv2sf,
>>>        vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf,
>>>        vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions.
>>>        * config/arm/arm.c (arm_builtin_vectorized_function): Handle
>>>        BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF.
>>>
>>> 2014-09-02  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>        PR target/62275
>>>        * gcc.target/arm/vect-lceilf_1.c: New test.
>>>        * gcc.target/arm/vect-lfloorf_1.c: Likewise.
>>>        * gcc.target/arm/vect-lroundf_1.c: Likewise.
>>>
diff mbox

Patch

commit 3854d95bace665f6d9d8c007702b6d26f6fe07c2
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Aug 22 17:23:20 2014 +0100

    [ARM] Vectorise lroundf, lfloorf, lceilf on ARMv8-A

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ff66c60..c3b8518 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29945,6 +29945,7 @@  arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 {
   enum machine_mode in_mode, out_mode;
   int in_n, out_n;
+  bool out_unsigned_p = TYPE_UNSIGNED (type_out);
 
   if (TREE_CODE (type_out) != VECTOR_TYPE
       || TREE_CODE (type_in) != VECTOR_TYPE)
@@ -29990,6 +29991,36 @@  arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
             return ARM_FIND_VRINT_VARIANT (vrintz);
           case BUILT_IN_ROUNDF:
             return ARM_FIND_VRINT_VARIANT (vrinta);
+#undef ARM_CHECK_BUILTIN_MODE_1
+#define ARM_CHECK_BUILTIN_MODE_1(C) \
+  (out_mode == SImode && out_n == C \
+   && in_mode == SFmode && in_n == C)
+
+#define ARM_FIND_VCVT_VARIANT(N) \
+  (ARM_CHECK_BUILTIN_MODE (2) \
+   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v2sfv2si, false) \
+   : (ARM_CHECK_BUILTIN_MODE (4) \
+     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sfv4si, false) \
+     : NULL_TREE))
+
+#define ARM_FIND_VCVTU_VARIANT(N) \
+  (ARM_CHECK_BUILTIN_MODE (2) \
+   ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv2sfv2si, false) \
+   : (ARM_CHECK_BUILTIN_MODE (4) \
+     ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv4sfv4si, false) \
+     : NULL_TREE))
+          case BUILT_IN_LROUNDF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvta)
+                     : ARM_FIND_VCVT_VARIANT (vcvta);
+          case BUILT_IN_LCEILF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvtp)
+                     : ARM_FIND_VCVT_VARIANT (vcvtp);
+          case BUILT_IN_LFLOORF:
+            return out_unsigned_p
+                     ? ARM_FIND_VCVTU_VARIANT (vcvtm)
+                     : ARM_FIND_VCVT_VARIANT (vcvtm);
 #undef ARM_CHECK_BUILTIN_MODE
 #define ARM_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
@@ -30020,9 +30051,12 @@  arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
     }
   return NULL_TREE;
 }
+#undef ARM_FIND_VCVT_VARIANT
+#undef ARM_FIND_VCVTU_VARIANT
 #undef ARM_CHECK_BUILTIN_MODE
 #undef ARM_FIND_VRINT_VARIANT
 
+
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
 static HOST_WIDE_INT
 arm_vector_alignment (const_tree type)
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index f4531f3..efe5bda 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -141,6 +141,18 @@  VAR2 (RINT, vrintp, v2sf, v4sf),
 VAR2 (RINT, vrintm, v2sf, v4sf),
 VAR2 (RINT, vrintz, v2sf, v4sf),
 VAR2 (RINT, vrintx, v2sf, v4sf),
+VAR1 (RINT, vcvtav2sf, v2si),
+VAR1 (RINT, vcvtav4sf, v4si),
+VAR1 (RINT, vcvtauv2sf, v2si),
+VAR1 (RINT, vcvtauv4sf, v4si),
+VAR1 (RINT, vcvtpv2sf, v2si),
+VAR1 (RINT, vcvtpv4sf, v4si),
+VAR1 (RINT, vcvtpuv2sf, v2si),
+VAR1 (RINT, vcvtpuv4sf, v4si),
+VAR1 (RINT, vcvtmv2sf, v2si),
+VAR1 (RINT, vcvtmv4sf, v4si),
+VAR1 (RINT, vcvtmuv2sf, v2si),
+VAR1 (RINT, vcvtmuv4sf, v4si),
 VAR1 (VTBL, vtbl1, v8qi),
 VAR1 (VTBL, vtbl2, v8qi),
 VAR1 (VTBL, vtbl3, v8qi),
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index f7e0e14..021372a 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -223,6 +223,8 @@  (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA])
 (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM
                               UNSPEC_NVRINTX UNSPEC_NVRINTA UNSPEC_NVRINTN])
 
+(define_int_iterator NEON_VCVT [UNSPEC_NVRINTP UNSPEC_NVRINTM UNSPEC_NVRINTA])
+
 (define_int_iterator CRC [UNSPEC_CRC32B UNSPEC_CRC32H UNSPEC_CRC32W
                           UNSPEC_CRC32CB UNSPEC_CRC32CH UNSPEC_CRC32CW])
 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index dc364ee..354a105 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -629,6 +629,17 @@  (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>"
   [(set_attr "type" "neon_fp_round_<V_elem_ch><q>")]
 )
 
+(define_insn "neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode><v_cmp_result>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand" "=w")
+	(FIXUORS:<V_cmp_result> (unspec:VCVTF
+			       [(match_operand:VCVTF 1 "register_operand" "w")]
+			       NEON_VCVT)))]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+  "vcvt<nvrint_variant>.<su>32.f32\\t%<V_reg>0, %<V_reg>1"
+  [(set_attr "type" "neon_fp_to_int_<V_elem_ch><q>")
+   (set_attr "predicable" "no")]
+)
+
 (define_insn "ior<mode>3"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
 	(ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
diff --git a/gcc/testsuite/gcc.target/arm/vect-lceilf_1.c b/gcc/testsuite/gcc.target/arm/vect-lceilf_1.c
new file mode 100644
index 0000000..75705ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-lceilf_1.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+#define N 32
+
+void
+foo (int *output, float *input)
+{
+  int i = 0;
+  /* Vectorizable.  */
+  for (i = 0; i < N; i++)
+    output[i] = __builtin_lceilf (input[i]);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/vect-lfloorf_1.c b/gcc/testsuite/gcc.target/arm/vect-lfloorf_1.c
new file mode 100644
index 0000000..298d54e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-lfloorf_1.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+#define N 32
+
+void
+foo (int *output, float *input)
+{
+  int i = 0;
+  /* Vectorizable.  */
+  for (i = 0; i < N; i++)
+    output[i] = __builtin_lfloorf (input[i]);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/arm/vect-lroundf_1.c b/gcc/testsuite/gcc.target/arm/vect-lroundf_1.c
new file mode 100644
index 0000000..6443821
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-lroundf_1.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-options "-O2 -ffast-math -ftree-vectorize -fdump-tree-vect-all" } */
+/* { dg-add-options arm_v8_neon } */
+
+#define N 32
+
+void
+foo (int *output, float *input)
+{
+  int i = 0;
+  /* Vectorizable.  */
+  for (i = 0; i < N; i++)
+    output[i] = __builtin_lroundf (input[i]);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */