From patchwork Mon Jan 13 06:02:23 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Yukhin X-Patchwork-Id: 309690 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 92F252C0097 for ; Mon, 13 Jan 2014 17:02:53 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=tp8+IrqSdL5rbYkzO smbDEZt7Z3/Qhx4eLE/+PGMkhWGEv+3uz0LXgHywARAxmKDd3eDTPNUXRwwnR/M6 u5EM16F9v0Sj4rsEKE4tIQliLb61NIr8U2qE97lbZoEn3+Yt/Q99HN2R7j5OIovg 3GVK+LIp3dgzUPnoaJPzMVf+Pk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=uC06dSmf+4ChWCWEuklyZFn lTb8=; b=PNpDYuIF8/mVQhUScHuhWm5aH1tNQhqIvRf4MPDj3DK38E2AnPAiqxI MlkwOsBqzuCCGmBOv20wmoirDx0N2/dfY5UWUOGUTepoalX4RBJ28L2XOMeOlaPE F5Zk0BCZiqW0ngfVKrN5N2j/oetKwdZf0CFYkG8LEBZBgqMOC/mU= Received: (qmail 25900 invoked by alias); 13 Jan 2014 06:02:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 25883 invoked by uid 89); 13 Jan 2014 06:02:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-gg0-f175.google.com Received: from mail-gg0-f175.google.com (HELO mail-gg0-f175.google.com) (209.85.161.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 13 Jan 2014 06:02:37 +0000 Received: by mail-gg0-f175.google.com with SMTP id c2so1325741ggn.6 for ; Sun, 12 Jan 2014 22:02:35 -0800 (PST) X-Received: by 10.236.0.34 with SMTP id 22mr24698203yha.24.1389592955805; Sun, 12 Jan 2014 22:02:35 -0800 (PST) Received: from msticlxl57.ims.intel.com ([192.55.54.42]) by mx.google.com with ESMTPSA id d32sm26095979yhq.27.2014.01.12.22.02.32 for (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 12 Jan 2014 22:02:34 -0800 (PST) Date: Mon, 13 Jan 2014 09:02:23 +0300 From: Kirill Yukhin To: Uros Bizjak Cc: Jakub Jelinek , GCC Patches Subject: Re: [PATCH i386 10/8] [AVX512] Add missing AVX-512ER patterns, intrinsics, tests. Message-ID: <20140113060223.GA24431@msticlxl57.ims.intel.com> References: <20140110162038.GC63041@msticlxl57.ims.intel.com> <20140110162439.GD892@tucnak.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hello, On 11 Jan 12:42, Uros Bizjak wrote: > On Fri, Jan 10, 2014 at 5:24 PM, Jakub Jelinek wrote: > > This means you should ensure aligned_mem will be set for > > CODE_FOR_avx512f_movntdqa in ix86_expand_special_args_builtin. Fixed. Updated patch in the bottom. > > Leaving the rest of review to Uros/Richard. > > The rest is OK. Thanks! I'll check it in tomorrow if no more issues! --- Thanks, K gcc/config/i386/avx512erintrin.h | 62 +++++++++++++++++++ gcc/config/i386/avx512fintrin.h | 7 +++ gcc/config/i386/i386-builtin-types.def | 1 + gcc/config/i386/i386.c | 14 +++++ gcc/config/i386/sse.md | 71 +++++++++++++++------- gcc/config/i386/subst.md | 4 -- gcc/testsuite/gcc.target/i386/avx-1.c | 20 ++++-- gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c | 12 ++-- gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28pd-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28ps-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrcp28sd-1.c | 15 +++++ .../gcc.target/i386/avx512er-vrcp28sd-2.c | 29 +++++++++ .../gcc.target/i386/avx512er-vrcp28ss-1.c | 15 +++++ .../gcc.target/i386/avx512er-vrcp28ss-2.c | 29 +++++++++ .../gcc.target/i386/avx512er-vrsqrt28pd-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrsqrt28ps-1.c | 12 ++-- .../gcc.target/i386/avx512er-vrsqrt28sd-1.c | 15 +++++ .../gcc.target/i386/avx512er-vrsqrt28sd-2.c | 29 +++++++++ .../gcc.target/i386/avx512er-vrsqrt28ss-1.c | 15 +++++ .../gcc.target/i386/avx512er-vrsqrt28ss-2.c | 29 +++++++++ .../gcc.target/i386/avx512f-vmovntdqa-1.c | 14 +++++ .../gcc.target/i386/avx512f-vmovntdqa-2.c | 17 ++++++ gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c | 6 +- gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c | 10 +-- gcc/testsuite/gcc.target/i386/sse-22.c | 40 ++++++------ gcc/testsuite/gcc.target/i386/sse-23.c | 16 +++-- 27 files changed, 430 insertions(+), 100 deletions(-) diff --git a/gcc/config/i386/avx512erintrin.h b/gcc/config/i386/avx512erintrin.h index f442f2b..6fe05bc 100644 --- a/gcc/config/i386/avx512erintrin.h +++ b/gcc/config/i386/avx512erintrin.h @@ -159,6 +159,24 @@ _mm512_maskz_rcp28_round_ps (__mmask16 __U, __m512 __A, int __R) (__mmask16) __U, __R); } +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rcp28sd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rcp28ss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_rsqrt28_round_pd (__m512d __A, int __R) @@ -214,6 +232,25 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) (__v16sf) _mm512_setzero_ps (), (__mmask16) __U, __R); } + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rsqrt28sd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rsqrt28ss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + #else #define _mm512_exp2a23_round_pd(A, C) \ __builtin_ia32_exp2pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C) @@ -268,6 +305,19 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) #define _mm512_maskz_rsqrt28_round_ps(U, A, C) \ __builtin_ia32_rsqrt28ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_rcp28_round_sd(A, B, R) \ + __builtin_ia32_rcp28sd_round(A, B, R) + +#define _mm_rcp28_round_ss(A, B, R) \ + __builtin_ia32_rcp28ss_round(A, B, R) + +#define _mm_rsqrt28_round_sd(A, B, R) \ + __builtin_ia32_rsqrt28sd_round(A, B, R) + +#define _mm_rsqrt28_round_ss(A, B, R) \ + __builtin_ia32_rsqrt28ss_round(A, B, R) + #endif #define _mm512_exp2a23_pd(A) \ @@ -324,6 +374,18 @@ _mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) #define _mm512_maskz_rsqrt28_ps(U, A) \ _mm512_maskz_rsqrt28_round_ps(U, A, _MM_FROUND_CUR_DIRECTION) +#define _mm_rcp28_sd(A, B) \ + __builtin_ia32_rcp28sd_round(A, B, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rcp28_ss(A, B) \ + __builtin_ia32_rcp28ss_round(A, B, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rsqrt28_sd(A, B) \ + __builtin_ia32_rsqrt28sd_round(A, B, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rsqrt28_ss(A, B) \ + __builtin_ia32_rsqrt28ss_round(A, B, _MM_FROUND_CUR_DIRECTION) + #ifdef __DISABLE_AVX512ER__ #undef __DISABLE_AVX512ER__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index a2ee88e..26f8cb6 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -7809,6 +7809,13 @@ _mm512_stream_pd (double *__P, __m512d __A) __builtin_ia32_movntpd512 (__P, (__v8df) __A); } +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_stream_load_si512 (void *__P) +{ + return __builtin_ia32_movntdqa512 ((__v8di *)__P); +} + #ifdef __OPTIMIZE__ extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index d19ca84..acf2f32 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -287,6 +287,7 @@ DEF_FUNCTION_TYPE (V8DI, PV4DI) DEF_FUNCTION_TYPE (V8DF, PV4DF) DEF_FUNCTION_TYPE (V8UHI, V8UHI) DEF_FUNCTION_TYPE (V8USI, V8USI) +DEF_FUNCTION_TYPE (V8DI, PV8DI) DEF_FUNCTION_TYPE (DI, V2DI, INT) DEF_FUNCTION_TYPE (DOUBLE, V2DF, INT) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 52ad5c1..3cda147 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -28050,6 +28050,7 @@ enum ix86_builtins IX86_BUILTIN_MOVDQA64STORE512, IX86_BUILTIN_MOVDQA64_512, IX86_BUILTIN_MOVNTDQ512, + IX86_BUILTIN_MOVNTDQA512, IX86_BUILTIN_MOVNTPD512, IX86_BUILTIN_MOVNTPS512, IX86_BUILTIN_MOVSHDUP512, @@ -28326,13 +28327,19 @@ enum ix86_builtins IX86_BUILTIN_GATHERPFQPS, IX86_BUILTIN_SCATTERPFDPS, IX86_BUILTIN_SCATTERPFQPS, + + /* AVX-512ER */ IX86_BUILTIN_EXP2PD_MASK, IX86_BUILTIN_EXP2PS_MASK, IX86_BUILTIN_EXP2PS, IX86_BUILTIN_RCP28PD, IX86_BUILTIN_RCP28PS, + IX86_BUILTIN_RCP28SD, + IX86_BUILTIN_RCP28SS, IX86_BUILTIN_RSQRT28PD, IX86_BUILTIN_RSQRT28PS, + IX86_BUILTIN_RSQRT28SD, + IX86_BUILTIN_RSQRT28SS, /* SHA builtins. */ IX86_BUILTIN_SHA1MSG1, @@ -28920,6 +28927,7 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_movntv16sf, "__builtin_ia32_movntps512", IX86_BUILTIN_MOVNTPS512, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V16SF }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_movntv8df, "__builtin_ia32_movntpd512", IX86_BUILTIN_MOVNTPD512, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V8DF }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_movntv8di, "__builtin_ia32_movntdq512", IX86_BUILTIN_MOVNTDQ512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI }, + { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_movntdqa, "__builtin_ia32_movntdqa512", IX86_BUILTIN_MOVNTDQA512, UNKNOWN, (int) V8DI_FTYPE_PV8DI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_storedquv16si_mask, "__builtin_ia32_storedqusi512_mask", IX86_BUILTIN_STOREDQUSI512, UNKNOWN, (int) VOID_FTYPE_PV16SI_V16SI_HI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_storedquv8di_mask, "__builtin_ia32_storedqudi512_mask", IX86_BUILTIN_STOREDQUDI512, UNKNOWN, (int) VOID_FTYPE_PV8DI_V8DI_QI }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_storeupd512_mask, "__builtin_ia32_storeupd512_mask", IX86_BUILTIN_STOREUPD512, UNKNOWN, (int) VOID_FTYPE_PV8DF_V8DF_QI }, @@ -30133,8 +30141,12 @@ static const struct builtin_description bdesc_round_args[] = { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_exp2v16sf_mask_round, "__builtin_ia32_exp2ps_mask", IX86_BUILTIN_EXP2PS_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT }, { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rcp28v8df_mask_round, "__builtin_ia32_rcp28pd_mask", IX86_BUILTIN_RCP28PD, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT }, { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rcp28v16sf_mask_round, "__builtin_ia32_rcp28ps_mask", IX86_BUILTIN_RCP28PS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT }, + { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrcp28v2df_round, "__builtin_ia32_rcp28sd_round", IX86_BUILTIN_RCP28SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT }, + { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrcp28v4sf_round, "__builtin_ia32_rcp28ss_round", IX86_BUILTIN_RCP28SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT }, { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rsqrt28v8df_mask_round, "__builtin_ia32_rsqrt28pd_mask", IX86_BUILTIN_RSQRT28PD, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI_INT }, { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_rsqrt28v16sf_mask_round, "__builtin_ia32_rsqrt28ps_mask", IX86_BUILTIN_RSQRT28PS, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI_INT }, + { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v2df_round, "__builtin_ia32_rsqrt28sd_round", IX86_BUILTIN_RSQRT28SD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT }, + { OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__builtin_ia32_rsqrt28ss_round", IX86_BUILTIN_RSQRT28SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT }, }; /* FMA4 and XOP. */ @@ -34367,6 +34379,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V16SI_FTYPE_PV4SI: case V16SF_FTYPE_PV4SF: case V8DI_FTYPE_PV4DI: + case V8DI_FTYPE_PV8DI: case V8DF_FTYPE_PV4DF: nargs = 1; klass = load; @@ -34375,6 +34388,7 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, { case CODE_FOR_sse4_1_movntdqa: case CODE_FOR_avx2_movntdqa: + case CODE_FOR_avx512f_movntdqa: aligned_mem = true; break; default: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index dfc98ba..31e94fe 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -356,7 +356,7 @@ [(V16QI "sse4_1") (V32QI "avx2") (V8HI "sse4_1") (V16HI "avx2") (V4SI "sse4_1") (V8SI "avx2") (V16SI "avx512f") - (V2DI "sse4_1") (V4DI "avx2")]) + (V2DI "sse4_1") (V4DI "avx2") (V8DI "avx512f")]) (define_mode_attr avx_avx2 [(V4SF "avx") (V2DF "avx") @@ -1463,13 +1463,12 @@ [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "vm")] + [(match_operand:VF_128 1 "nonimmediate_operand" "vm")] UNSPEC_RCP14) - (match_dup 1) + (match_operand:VF_128 2 "register_operand" "v") (const_int 1)))] "TARGET_AVX512F" - "vrcp14\t{%2, %1, %0|, %1, %2}" + "vrcp14\t{%1, %2, %0|%0, %2, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -6570,7 +6569,7 @@ (vec_merge:VF_128 (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "")] + (match_operand:VF_128 2 "" "")] UNSPEC_SCALEF) (match_dup 1) (const_int 1)))] @@ -6650,7 +6649,7 @@ (vec_merge:VF_128 (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "")] + (match_operand:VF_128 2 "" "")] UNSPEC_GETEXP) (match_dup 1) (const_int 1)))] @@ -6815,7 +6814,7 @@ (vec_merge:VF_128 (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "") + (match_operand:VF_128 2 "" "") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_ROUND) (match_dup 1) @@ -11499,14 +11498,14 @@ (set_attr "mode" "")]) (define_insn "_movntdqa" - [(set (match_operand:VI8_AVX2 0 "register_operand" "=x") - (unspec:VI8_AVX2 [(match_operand:VI8_AVX2 1 "memory_operand" "m")] + [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand" "=x, v") + (unspec:VI8_AVX2_AVX512F [(match_operand:VI8_AVX2_AVX512F 1 "memory_operand" "m, m")] UNSPEC_MOVNTDQA))] "TARGET_SSE4_1" "%vmovntdqa\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") - (set_attr "prefix_extra" "1") - (set_attr "prefix" "maybe_vex") + (set_attr "prefix_extra" "1, *") + (set_attr "prefix" "maybe_vex, evex") (set_attr "mode" "")]) (define_insn "_mpsadbw" @@ -12635,36 +12634,64 @@ (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "avx512er_exp2" +(define_insn "avx512er_exp2" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 - [(match_operand:VF_512 1 "" "")] + [(match_operand:VF_512 1 "" "")] UNSPEC_EXP2))] "TARGET_AVX512ER" - "vexp2\t{%1, %0|%0, %1}" + "vexp2\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512er_rcp28" +(define_insn "avx512er_rcp28" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 - [(match_operand:VF_512 1 "" "")] + [(match_operand:VF_512 1 "" "")] UNSPEC_RCP28))] "TARGET_AVX512ER" - "vrcp28\t{%1, %0|%0, %1}" + "vrcp28\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512er_rsqrt28" +(define_insn "avx512er_vmrcp28" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (unspec:VF_128 + [(match_operand:VF_128 1 "" "")] + UNSPEC_RCP28) + (match_operand:VF_128 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512ER" + "vrcp28\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512er_rsqrt28" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 - [(match_operand:VF_512 1 "" "")] + [(match_operand:VF_512 1 "" "")] UNSPEC_RSQRT28))] "TARGET_AVX512ER" - "vrsqrt28\t{%1, %0|%0, %1}" + "vrsqrt28\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512er_vmrsqrt28" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (unspec:VF_128 + [(match_operand:VF_128 1 "" "")] + UNSPEC_RSQRT28) + (match_operand:VF_128 2 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512ER" + "vrsqrt28\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; XOP instructions @@ -15201,7 +15228,7 @@ (vec_merge:VF_128 (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "") + (match_operand:VF_128 2 "" "") (match_operand:SI 3 "const_0_to_15_operand")] UNSPEC_GETMANT) (match_dup 1) diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 7fd3948..7948e78 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -133,8 +133,6 @@ (define_subst_attr "round_saeonly_name" "round_saeonly" "" "_round") (define_subst_attr "round_saeonly_mask_operand2" "mask" "%R2" "%R4") (define_subst_attr "round_saeonly_mask_operand3" "mask" "%R3" "%R5") -(define_subst_attr "round_saeonly_mask_scalar_operand3" "mask_scalar" "%R3" "%R5") -(define_subst_attr "round_saeonly_mask_scalar_operand4" "mask_scalar" "%R4" "%R6") (define_subst_attr "round_saeonly_mask_scalar_merge_operand4" "mask_scalar_merge" "%R4" "%R5") (define_subst_attr "round_saeonly_sd_mask_operand5" "sd" "%R5" "%R7") (define_subst_attr "round_saeonly_op2" "round_saeonly" "" "%R2") @@ -145,8 +143,6 @@ (define_subst_attr "round_saeonly_prefix" "round_saeonly" "vex" "evex") (define_subst_attr "round_saeonly_mask_op2" "round_saeonly" "" "") (define_subst_attr "round_saeonly_mask_op3" "round_saeonly" "" "") -(define_subst_attr "round_saeonly_mask_scalar_op3" "round_saeonly" "" "") -(define_subst_attr "round_saeonly_mask_scalar_op4" "round_saeonly" "" "") (define_subst_attr "round_saeonly_mask_scalar_merge_op4" "round_saeonly" "" "") (define_subst_attr "round_saeonly_sd_mask_op5" "round_saeonly" "" "") (define_subst_attr "round_saeonly_constraint" "round_saeonly" "vm" "v") diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 7201592..12674ad 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -344,12 +344,20 @@ #define __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, E) __builtin_ia32_vfnmsubps512_mask3(A, B, C, D, 1) #define __builtin_ia32_vpermilpd512_mask(A, E, C, D) __builtin_ia32_vpermilpd512_mask(A, 1, C, D) #define __builtin_ia32_vpermilps512_mask(A, E, C, D) __builtin_ia32_vpermilps512_mask(A, 1, C, D) -#define __builtin_ia32_exp2ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask(A, B, C, 1) -#define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask(A, B, C, 1) -#define __builtin_ia32_rcp28ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask(A, B, C, 1) -#define __builtin_ia32_rcp28pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask(A, B, C, 1) -#define __builtin_ia32_rsqrt28ps_mask(A, B, C, D) __builtin_ia32_rsqrt28ps_mask(A, B, C, 1) -#define __builtin_ia32_rsqrt28pd_mask(A, B, C, D) __builtin_ia32_rsqrt28pd_mask(A, B, C, 1) + +/* avx512erintrin.h */ +#define __builtin_ia32_exp2ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask(A, B, C, 5) +#define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask(A, B, C, 5) +#define __builtin_ia32_rcp28ps_mask(A, B, C, D) __builtin_ia32_rcp28ps_mask(A, B, C, 5) +#define __builtin_ia32_rcp28pd_mask(A, B, C, D) __builtin_ia32_rcp28pd_mask(A, B, C, 5) +#define __builtin_ia32_rsqrt28ps_mask(A, B, C, D) __builtin_ia32_rsqrt28ps_mask(A, B, C, 5) +#define __builtin_ia32_rsqrt28pd_mask(A, B, C, D) __builtin_ia32_rsqrt28pd_mask(A, B, C, 5) +#define __builtin_ia32_rcp28ss_round(A, B, C) __builtin_ia32_rcp28ss_round(A, B, 5) +#define __builtin_ia32_rcp28sd_round(A, B, C) __builtin_ia32_rcp28sd_round(A, B, 5) +#define __builtin_ia32_rsqrt28ss_round(A, B, C) __builtin_ia32_rsqrt28ss_round(A, B, 5) +#define __builtin_ia32_rsqrt28sd_round(A, B, C) __builtin_ia32_rsqrt28sd_round(A, B, 5) + +/* avx512pfintrin.h */ #define __builtin_ia32_gatherpfdps(A, B, C, D, E) __builtin_ia32_gatherpfdps(A, B, C, 1, 1) #define __builtin_ia32_gatherpfqps(A, B, C, D, E) __builtin_ia32_gatherpfqps(A, B, C, 1, 1) #define __builtin_ia32_scatterpfdps(A, B, C, D, E) __builtin_ia32_scatterpfdps(A, B, C, 1, 1) diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c index 9fb87cf..22c086d 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vexp2pd-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_exp2a23_pd (x); x = _mm512_mask_exp2a23_pd (x, m, x); x = _mm512_maskz_exp2a23_pd (m, x); - x = _mm512_exp2a23_round_pd (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_exp2a23_round_pd (x, m, x, _MM_FROUND_TO_NEG_INF); - x = _mm512_maskz_exp2a23_round_pd (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_exp2a23_round_pd (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_exp2a23_round_pd (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_exp2a23_round_pd (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c index a7e7009e..9d1178e 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vexp2ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_exp2a23_ps (x); x = _mm512_mask_exp2a23_ps (x, m, x); x = _mm512_maskz_exp2a23_ps (m, x); - x = _mm512_exp2a23_round_ps (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_exp2a23_round_ps (x, m, x, _MM_FROUND_TO_POS_INF); - x = _mm512_maskz_exp2a23_round_ps (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_exp2a23_round_ps (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_exp2a23_round_ps (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_exp2a23_round_ps (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28pd-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28pd-1.c index 06b6160..505c0eb 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28pd-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28pd-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_rcp28_pd (x); x = _mm512_mask_rcp28_pd (x, m, x); x = _mm512_maskz_rcp28_pd (m, x); - x = _mm512_rcp28_round_pd (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_rcp28_round_pd (x, m, x, _MM_FROUND_TO_NEG_INF); - x = _mm512_maskz_rcp28_round_pd (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_rcp28_round_pd (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_rcp28_round_pd (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_rcp28_round_pd (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-1.c index 023d6b2..e9245ba 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ps-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrcp28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_rcp28_ps (x); x = _mm512_mask_rcp28_ps (x, m, x); x = _mm512_maskz_rcp28_ps (m, x); - x = _mm512_rcp28_round_ps (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_rcp28_round_ps (x, m, x, _MM_FROUND_TO_POS_INF); - x = _mm512_maskz_rcp28_round_ps (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_rcp28_round_ps (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_rcp28_round_ps (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_rcp28_round_ps (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-1.c new file mode 100644 index 0000000..d09ba57 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512er -O2" } */ +/* { dg-final { scan-assembler-times "vrcp28sd\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */ +/* { dg-final { scan-assembler-times "vrcp28sd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]*\n" 1 } } */ + +#include + +volatile __m128d x, y; + +void extern +avx512er_test (void) +{ + x = _mm_rcp28_sd (x, y); + x = _mm_rcp28_round_sd (x, y, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-2.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-2.c new file mode 100644 index 0000000..d30f088 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28sd-2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512er } */ +/* { dg-options "-O2 -mavx512er" } */ + +#include "avx512er-check.h" +#include "avx512f-mask-type.h" +#include "avx512f-helper.h" +#include + +void static +avx512er_test (void) +{ + union128d src, res; + double res_ref[2]; + int i; + + for (i = 0; i < 2; i++) + { + src.a[i] = 179.345 - 6.5645 * i; + res_ref[i] = src.a[i]; + } + + res_ref[0] = 1.0 / src.a[0]; + + res.x = _mm_rcp28_round_sd (src.x, src.x, _MM_FROUND_NO_EXC); + + if (checkVd (res.a, res_ref, 2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-1.c new file mode 100644 index 0000000..3f5ccea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512er -O2" } */ +/* { dg-final { scan-assembler-times "vrcp28ss\[ \\t\]+\[^\n\]*%xmm\[0-9\]\[\\n\]" 2 } } */ +/* { dg-final { scan-assembler-times "vrcp28ss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]\[^\{\]*\n" 1 } } */ + +#include + +volatile __m128 x, y; + +void extern +avx512er_test (void) +{ + x = _mm_rcp28_ss (x, y); + x = _mm_rcp28_round_ss (x, y, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-2.c b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-2.c new file mode 100644 index 0000000..499a977 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrcp28ss-2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512er } */ +/* { dg-options "-O2 -mavx512er" } */ + +#include "avx512er-check.h" +#include "avx512f-mask-type.h" +#include "avx512f-helper.h" +#include + +void static +avx512er_test (void) +{ + union128 src, res; + float res_ref[4]; + int i; + + for (i = 0; i < 4; i++) + { + src.a[i] = 179.345 - 6.5645 * i; + res_ref[i] = src.a[i]; + } + + res_ref[0] = 1.0 / src.a[0]; + + res.x = _mm_rsqrt28_round_ss (src.x, src.x, _MM_FROUND_NO_EXC); + + if (checkVf (res.a, res_ref, 4)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28pd-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28pd-1.c index dfb95b2..5d264ac 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28pd-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28pd-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{rd-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28pd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_rsqrt28_pd (x); x = _mm512_mask_rsqrt28_pd (x, m, x); x = _mm512_maskz_rsqrt28_pd (m, x); - x = _mm512_rsqrt28_round_pd (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_rsqrt28_round_pd (x, m, x, _MM_FROUND_TO_NEG_INF); - x = _mm512_maskz_rsqrt28_round_pd (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_rsqrt28_round_pd (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_rsqrt28_round_pd (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_rsqrt28_round_pd (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-1.c index ecd3a6f..bfdb9ac 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ps-1.c @@ -3,9 +3,9 @@ /* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\[\\n\]" 2 } } */ /* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 2 } } */ /* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 2 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%zmm\[0-9\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{ru-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ -/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{rz-sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\[^\{\]*\n" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28ps\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%zmm\[0-9\]\{%k\[1-7\]\}\{z\}" 1 } } */ #include @@ -18,7 +18,7 @@ avx512er_test (void) x = _mm512_rsqrt28_ps (x); x = _mm512_mask_rsqrt28_ps (x, m, x); x = _mm512_maskz_rsqrt28_ps (m, x); - x = _mm512_rsqrt28_round_ps (x, _MM_FROUND_TO_NEAREST_INT); - x = _mm512_mask_rsqrt28_round_ps (x, m, x, _MM_FROUND_TO_POS_INF); - x = _mm512_maskz_rsqrt28_round_ps (m, x, _MM_FROUND_TO_ZERO); + x = _mm512_rsqrt28_round_ps (x, _MM_FROUND_NO_EXC); + x = _mm512_mask_rsqrt28_round_ps (x, m, x, _MM_FROUND_NO_EXC); + x = _mm512_maskz_rsqrt28_round_ps (m, x, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-1.c new file mode 100644 index 0000000..59dff78 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512er -O2" } */ +/* { dg-final { scan-assembler-times "vrsqrt28sd\[ \\t\]+\[^\{^\n\]*%xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28sd\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */ + +#include + +volatile __m128d x, y; + +void extern +avx512er_test (void) +{ + x = _mm_rsqrt28_sd (x, y); + x = _mm_rsqrt28_round_sd (x, y, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-2.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-2.c new file mode 100644 index 0000000..1537a59 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28sd-2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512er } */ +/* { dg-options "-O2 -mavx512er" } */ + +#include "avx512er-check.h" +#include "avx512f-mask-type.h" +#include "avx512f-helper.h" +#include + +void static +avx512er_test (void) +{ + union128d src, res; + double res_ref[2]; + int i; + + for (i = 0; i < 2; i++) + { + src.a[i] = 179.345 - 6.5645 * i; + res_ref[i] = src.a[i]; + } + + res_ref[0] = 1.0 / sqrt (src.a[0]); + + res.x = _mm_rsqrt28_round_sd (src.x, src.x, _MM_FROUND_NO_EXC); + + if (checkVd (res.a, res_ref, 2)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-1.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-1.c new file mode 100644 index 0000000..a334375 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512er -O2" } */ +/* { dg-final { scan-assembler-times "vrsqrt28ss\[ \\t\]+\[^\{^\n\]*%xmm\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "vrsqrt28ss\[ \\t\]+\[^\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]" 1 } } */ + +#include + +volatile __m128 x, y; + +void extern +avx512er_test (void) +{ + x = _mm_rsqrt28_ss (x, y); + x = _mm_rsqrt28_round_ss (x, y, _MM_FROUND_NO_EXC); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-2.c b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-2.c new file mode 100644 index 0000000..f88422e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512er-vrsqrt28ss-2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512er } */ +/* { dg-options "-O2 -mavx512er" } */ + +#include "avx512er-check.h" +#include "avx512f-mask-type.h" +#include "avx512f-helper.h" +#include + +void static +avx512er_test (void) +{ + union128 src, res; + float res_ref[4]; + int i; + + for (i = 0; i < 4; i++) + { + src.a[i] = 179.345 - 6.5645 * i; + res_ref[i] = src.a[i]; + } + + res_ref[0] = 1.0 / sqrt (src.a[0]); + + res.x = _mm_rsqrt28_round_ss (src.x, src.x, _MM_FROUND_NO_EXC); + + if (checkVf (res.a, res_ref, 4)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-1.c new file mode 100644 index 0000000..d5be976 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler "vmovntdqa\[ \\t\]+\[^\n\]*%zmm\[0-9\]" } } */ + +#include + +__m512i *x; +volatile __m512i y; + +void extern +avx512f_test (void) +{ + y = _mm512_stream_load_si512 (x); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-2.c new file mode 100644 index 0000000..0825781 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vmovntdqa-2.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-require-effective-target avx512f } */ + +#include "avx512f-check.h" + +void static +avx512f_test (void) +{ + union512i_q s, res; + + s.x = _mm512_set_epi64 (39578, -429496, 7856, 0, 85632, -1234, 47563, -1); + res.x = _mm512_stream_load_si512 (&s.x); + + if (check_union512i_q (s, res.a)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c index 9ff3541..0c9211a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14sd-2.c @@ -8,8 +8,8 @@ static void compute_vrcp14sd (double *s1, double *s2, double *r) { - r[0] = 1.0 / s2[0]; - r[1] = s1[1]; + r[0] = 1.0 / s1[0]; + r[1] = s2[1]; } static void @@ -26,6 +26,6 @@ avx512f_test (void) compute_vrcp14sd (s1.a, s2.a, res_ref); - if (check_union128d (res1, res_ref)) + if (checkVd (res1.a, res_ref, 2)) abort (); } diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c index fe8989a..3344dad 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrcp14ss-2.c @@ -8,10 +8,10 @@ static void compute_vrcp14ss (float *s1, float *s2, float *r) { - r[0] = 1.0 / s2[0]; - r[1] = s1[1]; - r[2] = s1[2]; - r[3] = s1[3]; + r[0] = 1.0 / s1[0]; + r[1] = s2[1]; + r[2] = s2[2]; + r[3] = s2[3]; } static void @@ -28,6 +28,6 @@ avx512f_test (void) compute_vrcp14ss (s1.a, s2.a, res_ref); - if (check_union128 (res1, res_ref)) + if (checkVf (res1.a, res_ref, 4)) abort (); } diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 05b4af0..630c952 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -647,24 +647,28 @@ test_3vx (_mm512_mask_prefetch_i64gather_ps, __m512i, __mmask8, void const *, 1, test_3vx (_mm512_mask_prefetch_i64scatter_ps, void const *, __mmask8, __m512i, 1, 1) /* avx512erintrin.h */ -test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 1) -test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 1) -test_1 (_mm512_rcp28_round_pd, __m512d, __m512d, 1) -test_1 (_mm512_rcp28_round_ps, __m512, __m512, 1) -test_1 (_mm512_rsqrt28_round_pd, __m512d, __m512d, 1) -test_1 (_mm512_rsqrt28_round_ps, __m512, __m512, 1) -test_2 (_mm512_maskz_exp2a23_round_pd, __m512d, __mmask8, __m512d, 1) -test_2 (_mm512_maskz_exp2a23_round_ps, __m512, __mmask16, __m512, 1) -test_2 (_mm512_maskz_rcp28_round_pd, __m512d, __mmask8, __m512d, 1) -test_2 (_mm512_maskz_rcp28_round_ps, __m512, __mmask16, __m512, 1) -test_2 (_mm512_maskz_rsqrt28_round_pd, __m512d, __mmask8, __m512d, 1) -test_2 (_mm512_maskz_rsqrt28_round_ps, __m512, __mmask16, __m512, 1) -test_3 (_mm512_mask_exp2a23_round_pd, __m512d, __m512d, __mmask8, __m512d, 1) -test_3 (_mm512_mask_exp2a23_round_ps, __m512, __m512, __mmask16, __m512, 1) -test_3 (_mm512_mask_rcp28_round_pd, __m512d, __m512d, __mmask8, __m512d, 1) -test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 1) -test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 1) -test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 1) +test_1 (_mm512_exp2a23_round_pd, __m512d, __m512d, 5) +test_1 (_mm512_exp2a23_round_ps, __m512, __m512, 5) +test_1 (_mm512_rcp28_round_pd, __m512d, __m512d, 5) +test_1 (_mm512_rcp28_round_ps, __m512, __m512, 5) +test_1 (_mm512_rsqrt28_round_pd, __m512d, __m512d, 5) +test_1 (_mm512_rsqrt28_round_ps, __m512, __m512, 5) +test_2 (_mm512_maskz_exp2a23_round_pd, __m512d, __mmask8, __m512d, 5) +test_2 (_mm512_maskz_exp2a23_round_ps, __m512, __mmask16, __m512, 5) +test_2 (_mm512_maskz_rcp28_round_pd, __m512d, __mmask8, __m512d, 5) +test_2 (_mm512_maskz_rcp28_round_ps, __m512, __mmask16, __m512, 5) +test_2 (_mm512_maskz_rsqrt28_round_pd, __m512d, __mmask8, __m512d, 5) +test_2 (_mm512_maskz_rsqrt28_round_ps, __m512, __mmask16, __m512, 5) +test_3 (_mm512_mask_exp2a23_round_pd, __m512d, __m512d, __mmask8, __m512d, 5) +test_3 (_mm512_mask_exp2a23_round_ps, __m512, __m512, __mmask16, __m512, 5) +test_3 (_mm512_mask_rcp28_round_pd, __m512d, __m512d, __mmask8, __m512d, 5) +test_3 (_mm512_mask_rcp28_round_ps, __m512, __m512, __mmask16, __m512, 5) +test_3 (_mm512_mask_rsqrt28_round_pd, __m512d, __m512d, __mmask8, __m512d, 5) +test_3 (_mm512_mask_rsqrt28_round_ps, __m512, __m512, __mmask16, __m512, 5) +test_2 (_mm_rcp28_round_sd, __m128d, __m128d, __m128d, 5) +test_2 (_mm_rcp28_round_ss, __m128, __m128, __m128, 5) +test_2 (_mm_rsqrt28_round_sd, __m128d, __m128d, __m128d, 5) +test_2 (_mm_rsqrt28_round_ss, __m128, __m128, __m128, 5) /* shaintrin.h */ test_2 (_mm_sha1rnds4_epu32, __m128i, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index a6a7b39..309cd73 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -367,12 +367,16 @@ #define __builtin_ia32_scatterpfqps(A, B, C, D, E) __builtin_ia32_scatterpfqps(A, B, C, 1, 1) /* avx512erintrin.h */ -#define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 1) -#define __builtin_ia32_exp2ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask (A, B, C, 1) -#define __builtin_ia32_rcp28pd_mask(A, B, C, D) __builtin_ia32_rcp28pd_mask (A, B, C, 1) -#define __builtin_ia32_rcp28ps_mask(A, B, C, D) __builtin_ia32_rcp28ps_mask (A, B, C, 1) -#define __builtin_ia32_rsqrt28pd_mask(A, B, C, D) __builtin_ia32_rsqrt28pd_mask (A, B, C, 1) -#define __builtin_ia32_rsqrt28ps_mask(A, B, C, D) __builtin_ia32_rsqrt28ps_mask (A, B, C, 1) +#define __builtin_ia32_exp2pd_mask(A, B, C, D) __builtin_ia32_exp2pd_mask (A, B, C, 5) +#define __builtin_ia32_exp2ps_mask(A, B, C, D) __builtin_ia32_exp2ps_mask (A, B, C, 5) +#define __builtin_ia32_rcp28pd_mask(A, B, C, D) __builtin_ia32_rcp28pd_mask (A, B, C, 5) +#define __builtin_ia32_rcp28ps_mask(A, B, C, D) __builtin_ia32_rcp28ps_mask (A, B, C, 5) +#define __builtin_ia32_rsqrt28pd_mask(A, B, C, D) __builtin_ia32_rsqrt28pd_mask (A, B, C, 5) +#define __builtin_ia32_rsqrt28ps_mask(A, B, C, D) __builtin_ia32_rsqrt28ps_mask (A, B, C, 5) +#define __builtin_ia32_rcp28sd_round(A, B, C) __builtin_ia32_rcp28sd_round(A, B, 5) +#define __builtin_ia32_rcp28ss_round(A, B, C) __builtin_ia32_rcp28ss_round(A, B, 5) +#define __builtin_ia32_rsqrt28sd_round(A, B, C) __builtin_ia32_rsqrt28sd_round(A, B, 5) +#define __builtin_ia32_rsqrt28ss_round(A, B, C) __builtin_ia32_rsqrt28ss_round(A, B, 5) /* shaintrin.h */ #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)