[57/62] AVX512FP16: Add expander for fmahf4

Message ID	20210701061648.9447-58-hongtao.liu@intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3F9C13858034 To: gcc-patches@gcc.gnu.org Subject: [PATCH 57/62] AVX512FP16: Add expander for fmahf4 Date: Thu, 1 Jul 2021 14:16:43 +0800 Message-Id: <20210701061648.9447-58-hongtao.liu@intel.com> In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com> References: <20210701061648.9447-1-hongtao.liu@intel.com> Precedence: list From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: liuhongt <hongtao.liu@intel.com> Cc: jakub@redhat.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	Support all AVX512FP16 intrinsics. \| expand [00/62] Support all AVX512FP16 intrinsics. [01/62] AVX512FP16: Support vector init/broadcast for FP16. [02/62] AVX512FP16: Add testcase for vector init and broadcast intrinsics. [03/62] AVX512FP16: Fix HF vector passing in variable arguments. [04/62] AVX512FP16: Add ABI tests for xmm. [05/62] AVX512FP16: Add ABI test for ymm. [06/62] AVX512FP16: Add abi test for zmm [07/62] AVX512FP16: Add vaddph/vsubph/vdivph/vmulph. [08/62] AVX512FP16: Add testcase for vaddph/vsubph/vmulph/vdivph. [09/62] AVX512FP16: Enable _Float16 autovectorization [10/62] AVX512FP16: Add vaddsh/vsubsh/vmulsh/vdivsh. [11/62] AVX512FP16: Add testcase for vaddsh/vsubsh/vmulsh/vdivsh. [12/62] AVX512FP16: Add vmaxph/vminph/vmaxsh/vminsh. [13/62] AVX512FP16: Add testcase for vmaxph/vmaxsh/vminph/vminsh. [14/62] AVX512FP16: Add vcmpph/vcmpsh/vcomish/vucomish. [15/62] AVX512FP16: Add testcase for vcmpph/vcmpsh/vcomish/vucomish. [16/62] AVX512FP16: Add vsqrtph/vrsqrtph/vsqrtsh/vrsqrtsh. [17/62] AVX512FP16: Add testcase for vsqrtph/vsqrtsh/vrsqrtph/vrsqrtsh. [18/62] AVX512FP16: Add vrcpph/vrcpsh/vscalefph/vscalefsh. [19/62] AVX512FP16: Add testcase for vrcpph/vrcpsh/vscalefph/vscalefsh. [20/62] AVX512FP16: Add vreduceph/vreducesh/vrndscaleph/vrndscalesh. [21/62] AVX512FP16: Add testcase for vreduceph/vreducesh/vrndscaleph/vrndscalesh. [22/62] AVX512FP16: Add fpclass/getexp/getmant instructions. [23/62] AVX512FP16: Add testcase for fpclass/getmant/getexp instructions. [24/62] AVX512FP16: Add vmovw/vmovsh. [25/62] AVX512FP16: Add testcase for vmovsh/vmovw. [26/62] AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udq [27/62] AVX512FP16: Add testcase for vcvtph2w/vcvtph2uw/vcvtph2dq/vcvtph2udq/vcvtph2qq/vcvtph2uqq. [28/62] AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph [29/62] AVX512FP16: Add testcase for vcvtw2ph/vcvtuw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2ph. [30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. [31/62] AVX512FP16: Add testcase for vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh. [32/62] AVX512FP16: Add vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq [33/62] AVX512FP16: Add testcase for vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph… [34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi. [35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. [36/62] AVX512FP16: Add testcase for vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx. [37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh. [38/62] AVX512FP16: Add testcase for vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh. [39/62] AVX512FP16: Add intrinsics for casting between vector float16 and vector float32/float64/in… [40/62] AVX512FP16: Add vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. [41/62] AVX512FP16: Add testcase for vfmaddsub[132, 213, 231]ph/vfmsubadd[132, 213, 231]ph. [42/62] AVX512FP16: Add FP16 fma instructions. [43/62] AVX512FP16: Add testcase for fma instructions [44/62] AVX512FP16: Add scalar/vector bitwise operations, including [45/62] AVX512FP16: Add testcase for fp16 bitwise operations. [46/62] AVX512FP16: Enable FP16 mask load/store. [47/62] AVX512FP16: Add scalar fma instructions. [48/62] AVX512FP16: Add testcase for scalar FMA instructions. [49/62] AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph [50/62] AVX512FP16: Add testcases for vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcph. [51/62] AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. [52/62] AVX512FP16: Add testcases for vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh. [53/62] AVX512FP16: Add expander for sqrthf2. [54/62] AVX512FP16: Add expander for ceil/floor/trunc/roundeven. [55/62] AVX512FP16: Add expander for cstorehf4. [56/62] AVX512FP16: Optimize (_Float16) sqrtf ((float) f16) to sqrtf16 (f16). [57/62] AVX512FP16: Add expander for fmahf4 [58/62] AVX512FP16: Optimize for code like (_Float16) __builtin_ceif ((float) f16). [59/62] AVX512FP16: Support load/store/abs intrinsics. [60/62] AVX512FP16: Add reduce operators(add/mul/min/max). [61/62] AVX512FP16: Add complex conjugation intrinsic instructions. [62/62] AVX512FP16: Add permutation and mask blend intrinsics.

Message ID

20210701061648.9447-58-hongtao.liu@intel.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3F9C13858034
To: gcc-patches@gcc.gnu.org
Subject: [PATCH 57/62] AVX512FP16: Add expander for fmahf4
Date: Thu,  1 Jul 2021 14:16:43 +0800
Message-Id: <20210701061648.9447-58-hongtao.liu@intel.com>
In-Reply-To: <20210701061648.9447-1-hongtao.liu@intel.com>
References: <20210701061648.9447-1-hongtao.liu@intel.com>
Precedence: list
From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org>
Reply-To: liuhongt <hongtao.liu@intel.com>
Cc: jakub@redhat.com
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Series

Support all AVX512FP16 intrinsics. | expand

Commit Message

liuhongt July 1, 2021, 6:16 a.m. UTC

gcc/ChangeLog:

	* config/i386/sse.md (FMAMODEM): extend to handle FP16.
	(VFH_SF_AVX512VL): Extend to handle HFmode.
	(VF_SF_AVX512VL): Deleted.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512fp16-fma-1.c: New test.
	* gcc.target/i386/avx512fp16vl-fma-1.c: New test.
	* gcc.target/i386/avx512fp16vl-fma-vectorize-1.c: New test.
---
 gcc/config/i386/sse.md                        | 11 +--
 .../gcc.target/i386/avx512fp16-fma-1.c        | 69 ++++++++++++++++++
 .../gcc.target/i386/avx512fp16vl-fma-1.c      | 70 +++++++++++++++++++
 .../i386/avx512fp16vl-fma-vectorize-1.c       | 45 ++++++++++++
 4 files changed, 190 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f87f6893835..2b8d12086f4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4489,7 +4489,11 @@  (define_mode_iterator FMAMODEM
    (V8SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V4DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512VL")
    (V16SF "TARGET_AVX512F")
-   (V8DF "TARGET_AVX512F")])
+   (V8DF "TARGET_AVX512F")
+   (HF "TARGET_AVX512FP16")
+   (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (V32HF "TARGET_AVX512FP16")])
 
 (define_expand "fma<mode>4"
   [(set (match_operand:FMAMODEM 0 "register_operand")
@@ -4597,14 +4601,11 @@  (define_insn "*fma_fmadd_<mode>"
    (set_attr "mode" "<MODE>")])
 
 ;; Suppose AVX-512F as baseline
-(define_mode_iterator VF_SF_AVX512VL
-  [SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
-   DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
-
 (define_mode_iterator VFH_SF_AVX512VL
   [(V32HF "TARGET_AVX512FP16")
    (V16HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
    (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+   (HF "TARGET_AVX512FP16")
    SF V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")
    DF V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
new file mode 100644
index 00000000000..d78d7629838
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-fma-1.c
@@ -0,0 +1,69 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16" } */
+
+typedef _Float16 v32hf __attribute__ ((__vector_size__ (64)));
+
+_Float16
+foo1 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo2 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo3 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+_Float16
+foo4 (_Float16 a, _Float16 b, _Float16 c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132sh\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v32hf
+foo5 (v32hf a, v32hf b, v32hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo6 (v32hf a, v32hf b, v32hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo7 (v32hf a, v32hf b, v32hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
+v32hf
+foo8 (v32hf a, v32hf b, v32hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*zmm\[0-9\]" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
new file mode 100644
index 00000000000..1a832f37d6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-1.c
@@ -0,0 +1,70 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
+
+typedef _Float16 v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 v16hf __attribute__ ((__vector_size__ (32)));
+
+v8hf
+foo1 (v8hf a, v8hf b, v8hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo2 (v8hf a, v8hf b, v8hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo3 (v8hf a, v8hf b, v8hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v8hf
+foo4 (v8hf a, v8hf b, v8hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+v16hf
+foo5 (v16hf a, v16hf b, v16hf c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo6 (v16hf a, v16hf b, v16hf c)
+{
+  return -a * b + c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo7 (v16hf a, v16hf b, v16hf c)
+{
+  return a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
+v16hf
+foo8 (v16hf a, v16hf b, v16hf c)
+{
+  return -a * b - c;
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*ymm\[0-9\]" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c
new file mode 100644
index 00000000000..d0b8bec34f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-fma-vectorize-1.c
@@ -0,0 +1,45 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512fp16 -mavx512vl" } */
+
+typedef _Float16 v8hf __attribute__ ((__vector_size__ (16)));
+typedef _Float16 v16hf __attribute__ ((__vector_size__ (32)));
+
+void
+foo1 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = pa[i] * pb[i] + pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo2 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+    for (int i = 0; i != 8; i++)
+    pd[i] = -pa[i] * pb[i] + pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo3 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = pa[i] * pb[i] - pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */
+
+void
+foo4 (_Float16* __restrict pa, _Float16* __restrict pb,
+      _Float16* __restrict pc, _Float16* __restrict pd)
+{
+  for (int i = 0; i != 8; i++)
+    pd[i] = -pa[i] * pb[i] - pc[i];
+}
+
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 1 } } */

[57/62] AVX512FP16: Add expander for fmahf4

Commit Message

Patch