From patchwork Fri Nov 1 12:19:56 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Yukhin X-Patchwork-Id: 287799 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id CC1692C00B1 for ; Fri, 1 Nov 2013 23:21:08 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=uX48Q8v9P8KORBI9p mOp7co9LXSsDB1KiTqQLhLCYNH+fLDFSAMx5k58M5sTAX7SwG6jsYJKuCvGeJ57i L32NWfLgPKK5SiMenf7Tn66B37GveUovhLH0NxONSuvkJ19n9ObTppoabSXOKOMa TkcueeJXyzpf1nVyVpZk15kRN4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=UfKI6O15AfONlBwqXpn5wzP ENRs=; b=Tsgv3dIyD3QR1JpdHI8ye1ExHZ7EAjsWpCCVEsZQO25MjCd7C2oFxLr EBk2Hj/MFnA6WV1/hWm77Sa53YLBCv+vs1tdl/iD8bCpqE2IKDpCQYOOTMnl4SCs 2tFw5B0McSkrFbnl/0/oE2FzLP196kUpnbmnudSf9XYSAj2FQQ6w= Received: (qmail 24652 invoked by alias); 1 Nov 2013 12:20:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24615 invoked by uid 89); 1 Nov 2013 12:20:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pb0-f50.google.com Received: from mail-pb0-f50.google.com (HELO mail-pb0-f50.google.com) (209.85.160.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 01 Nov 2013 12:20:46 +0000 Received: by mail-pb0-f50.google.com with SMTP id uo5so4228891pbc.37 for ; Fri, 01 Nov 2013 05:20:43 -0700 (PDT) X-Received: by 10.68.129.99 with SMTP id nv3mr2983320pbb.40.1383308443437; Fri, 01 Nov 2013 05:20:43 -0700 (PDT) Received: from msticlxl57.ims.intel.com (fmdmzpr04-ext.fm.intel.com. [192.55.55.39]) by mx.google.com with ESMTPSA id de1sm10546057pbc.7.2013.11.01.05.20.38 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 01 Nov 2013 05:20:41 -0700 (PDT) Date: Fri, 1 Nov 2013 16:19:56 +0400 From: Kirill Yukhin To: Richard Henderson Cc: GCC Patches , Uros Bizjak , Jakub Jelinek Subject: Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns. Message-ID: <20131101121956.GA54822@msticlxl57.ims.intel.com> References: <20130814074404.GE52726@msticlxl57.ims.intel.com> <20130822141006.GA3556@msticlxl57.ims.intel.com> <20131017141513.GC18369@msticlxl57.ims.intel.com> <5265B231.1040609@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5265B231.1040609@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hello Richard, On 21 Oct 16:01, Richard Henderson wrote: > Error on V16SF. Probably better to fill this out. Thanks, fixed. > Better to just use here, as it's a compile-time constant. Fixed. > > +(define_insn "avx512f_store_mask" > > Likewise. Fixed. > Nested vec_merge? That seems... odd to say the least. > How in the world does this get matched? Moved to separate patch. > > +(define_insn "*avx512f_loads_mask" > > Likewise. Moved to separate patch. > > +(define_insn "avx512f_stores_mask" > This seems similar, though of course it's an extract. > I still can't imagine how it could be used. Separate patch. > > -(define_insn "rcp14" > > +(define_insn "rcp14" > > What, this name isn't used for non-masked anymore? Bogus original pattern. Introduced by me here: svn+ssh://gcc.gnu.org/svn/gcc/trunk@203609 This (and subseqent patterns you mention) are not used by name. In general case for every built-in whether it is masked or not we use masked version of built-in. E.g.: extern __inline __m512d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_rcp14_pd (__m512d __A) { return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A, (__v8df) _mm512_setzero_pd (), (__mmask8) -1); } Then, while expanding, if we can prove that mask is const -1 we remove redundant vec_merge getting non-masked variant of the built-in. Same holds for all inssn you mentioned > > -(define_insn "srcp14" > Likewise. These changes don't belong in this patch. Ditto. > > -(define_insn "rsqrt14" > Likewise. Ditto. > > - (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))] > > + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x") > > + (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m") > > + (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))] > > Unrelated changes. Repeated throughout the fma patterns. Reverted. > > +(define_insn "*fmai_fmadd__maskz" > > + [(set (match_operand:VF_128 0 "register_operand" "=v,v") > > + (vec_merge:VF_128 > > + (vec_merge:VF_128 > > + (fma:VF_128 > > + (match_operand:VF_128 1 "nonimmediate_operand" "0,0") > > + (match_operand:VF_128 2 "nonimmediate_operand" "vm,v") > > + (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")) > > + (match_operand:VF_128 4 "const0_operand") > > + (match_operand:QI 5 "register_operand" "k,k")) > > + (match_dup 1) > > + (const_int 1)))] > > + "TARGET_AVX512F" > > + "@ > > + vfmadd132\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2} > > + vfmadd213\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3}" > > + [(set_attr "type" "ssemuladd") > > + (set_attr "mode" "")]) > > These seem like useless patterns. If they're for builtins, > then they seem like useless builtins. See above. Moved to dedicated patch. > > @@ -3686,8 +4328,8 @@ > > (set_attr "athlon_decode" "vector,double,*") > > (set_attr "amdfam10_decode" "vector,double,*") > > (set_attr "bdver1_decode" "direct,direct,*") > > - (set_attr "btver2_decode" "double,double,double") > > (set_attr "prefix" "orig,orig,vex") > > + (set_attr "btver2_decode" "double,double,double") > > (set_attr "mode" "SF")]) > > Unrelated changes. Ugh, reverted. > > +(define_expand "vec_unpacku_float_hi_v16si" > > + [(match_operand:V8DF 0 "register_operand") > > + (match_operand:V16SI 1 "register_operand")] > > + "TARGET_AVX512F" > > +{ > > + REAL_VALUE_TYPE TWO32r; > > + rtx k, x, tmp[4]; > > + > > + real_ldexp (&TWO32r, &dconst1, 32); > > + x = const_double_from_real_value (TWO32r, DFmode); > > + > > + tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode)); > > + tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x)); > > + tmp[2] = gen_reg_rtx (V8DFmode); > > + tmp[3] = gen_reg_rtx (V8SImode); > > + k = gen_reg_rtx (QImode); > > + > > + emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1])); > > + emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3])); > > + emit_insn (gen_rtx_SET (VOIDmode, k, > > + gen_rtx_LT (QImode, tmp[2], tmp[0]))); > > + emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k)); > > + emit_move_insn (operands[0], tmp[2]); > > + DONE; > > +}) > > Separate patch. And this is too complicated, since vcvtudq2pd exists. Moved to [5/8] Extend hooks. > Non-masked name change again. See above. > > +(define_insn "avx512f_unpcklps512" > > Ditto. Above. > > +(define_insn "avx512f_movshdup512" > > Ditto. Above. > > +(define_insn "avx512f_movsldup512" > > Ditto. Above. Updated patch in the bottom. Bootstrapped. Coould you pls take a look? --- Thanks, K --- gcc/config/i386/i386.c | 5 + gcc/config/i386/predicates.md | 10 + gcc/config/i386/sse.md | 1440 +++++++++++++++++++++++++++++++++-------- gcc/config/i386/subst.md | 56 ++ 4 files changed, 1249 insertions(+), 262 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index febceca..0e91500 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -14780,6 +14780,11 @@ ix86_print_operand (FILE *file, rtx x, int code) /* We do not want to print value of the operand. */ return; + case 'N': + if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x))) + fputs ("{z}", file); + return; + case '*': if (ASSEMBLER_DIALECT == ASM_ATT) putc ('*', file); diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 261335d..00a203e 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -687,6 +687,16 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 3)"))) +;; Match 0 to 4. +(define_predicate "const_0_to_4_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 4)"))) + +;; Match 0 to 5. +(define_predicate "const_0_to_5_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 5)"))) + ;; Match 0 to 7. (define_predicate "const_0_to_7_operand" (and (match_code "const_int") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 939cc33..ac7f108 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -87,6 +87,7 @@ ;; For AVX512F support UNSPEC_VPERMI2 UNSPEC_VPERMT2 + UNSPEC_VPERMI2_MASK UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_UNSIGNED_PCMP UNSPEC_TESTM @@ -101,9 +102,15 @@ UNSPEC_GETMANT UNSPEC_ALIGN UNSPEC_CONFLICT + UNSPEC_COMPRESS + UNSPEC_COMPRESS_STORE + UNSPEC_EXPAND UNSPEC_MASKED_EQ UNSPEC_MASKED_GT + ;; For embed. rounding feature + UNSPEC_EMBEDDED_ROUNDING + ;; For AVX512PF support UNSPEC_GATHER_PREFETCH UNSPEC_SCATTER_PREFETCH @@ -554,6 +561,12 @@ (V8SF "7") (V4DF "3") (V4SF "3") (V2DF "1")]) +(define_mode_attr ssescalarsize + [(V8DI "64") (V4DI "64") (V2DI "64") + (V32HI "16") (V16HI "16") (V8HI "16") + (V16SI "32") (V8SI "32") (V4SI "32") + (V16SF "32") (V8DF "64")]) + ;; SSE prefix for integer vector modes (define_mode_attr sseintprefix [(V2DI "p") (V2DF "") @@ -610,6 +623,9 @@ (define_mode_attr bcstscalarsuff [(V16SI "d") (V16SF "ss") (V8DI "q") (V8DF "sd")]) +;; Include define_subst patterns for instructions with mask +(include "subst.md") + ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; @@ -749,6 +765,28 @@ ] (const_string "")))]) +(define_insn "avx512f_load_mask" + [(set (match_operand:VI48F_512 0 "register_operand" "=v,v") + (vec_merge:VI48F_512 + (match_operand:VI48F_512 1 "nonimmediate_operand" "v,m") + (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C") + (match_operand: 3 "register_operand" "k,k")))] + "TARGET_AVX512F" +{ + switch (mode) + { + case MODE_V8DF: + case MODE_V16SF: + return "vmova\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"; + default: + return "vmovdqa\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"; + } +} + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "memory" "none,load") + (set_attr "mode" "")]) + (define_insn "avx512f_blendm" [(set (match_operand:VI48F_512 0 "register_operand" "=v") (vec_merge:VI48F_512 @@ -761,6 +799,28 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_store_mask" + [(set (match_operand:VI48F_512 0 "memory_operand" "=m") + (vec_merge:VI48F_512 + (match_operand:VI48F_512 1 "register_operand" "v") + (match_dup 0) + (match_operand: 2 "register_operand" "k")))] + "TARGET_AVX512F" +{ + switch (mode) + { + case MODE_V8DF: + case MODE_V16SF: + return "vmova\t{%1, %0%{%2%}|%0%{%2%}, %1}"; + default: + return "vmovdqa\t{%1, %0%{%2%}|%0%{%2%}, %1}"; + } +} + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "memory" "store") + (set_attr "mode" "")]) + (define_insn "sse2_movq128" [(set (match_operand:V2DI 0 "register_operand" "=x") (vec_concat:V2DI @@ -852,21 +912,21 @@ DONE; }) -(define_insn "_loadu" +(define_insn "_loadu" [(set (match_operand:VF 0 "register_operand" "=v") (unspec:VF [(match_operand:VF 1 "nonimmediate_operand" "vm")] UNSPEC_LOADU))] - "TARGET_SSE" + "TARGET_SSE && " { switch (get_attr_mode (insn)) { case MODE_V16SF: case MODE_V8SF: case MODE_V4SF: - return "%vmovups\t{%1, %0|%0, %1}"; + return "%vmovups\t{%1, %0|%0, %1}"; default: - return "%vmovu\t{%1, %0|%0, %1}"; + return "%vmovu\t{%1, %0|%0, %1}"; } } [(set_attr "type" "ssemov") @@ -913,12 +973,36 @@ ] (const_string "")))]) -(define_insn "_loaddqu" +(define_insn "avx512f_storeu512_mask" + [(set (match_operand:VF_512 0 "memory_operand" "=m") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "v")] + UNSPEC_STOREU) + (match_dup 0) + (match_operand: 2 "register_operand" "k")))] + "TARGET_AVX512F" +{ + switch (get_attr_mode (insn)) + { + case MODE_V16SF: + return "vmovups\t{%1, %0%{%2%}|%0%{%2%}, %1}"; + default: + return "vmovu\t{%1, %0%{%2%}|%0%{%2%}, %1}"; + } +} + [(set_attr "type" "ssemov") + (set_attr "movu" "1") + (set_attr "memory" "store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "_loaddqu" [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v") (unspec:VI_UNALIGNED_LOADSTORE [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")] UNSPEC_LOADU))] - "TARGET_SSE2" + "TARGET_SSE2 && " { switch (get_attr_mode (insn)) { @@ -927,9 +1011,9 @@ return "%vmovups\t{%1, %0|%0, %1}"; case MODE_XI: if (mode == V8DImode) - return "vmovdqu64\t{%1, %0|%0, %1}"; + return "vmovdqu64\t{%1, %0|%0, %1}"; else - return "vmovdqu32\t{%1, %0|%0, %1}"; + return "vmovdqu32\t{%1, %0|%0, %1}"; default: return "%vmovdqu\t{%1, %0|%0, %1}"; } @@ -992,6 +1076,27 @@ ] (const_string "")))]) +(define_insn "avx512f_storedqu_mask" + [(set (match_operand:VI48_512 0 "memory_operand" "=m") + (vec_merge:VI48_512 + (unspec:VI48_512 + [(match_operand:VI48_512 1 "register_operand" "v")] + UNSPEC_STOREU) + (match_dup 0) + (match_operand: 2 "register_operand" "k")))] + "TARGET_AVX512F" +{ + if (mode == V8DImode) + return "vmovdqu64\t{%1, %0%{%2%}|%0%{%2%}, %1}"; + else + return "vmovdqu32\t{%1, %0%{%2%}|%0%{%2%}, %1}"; +} + [(set_attr "type" "ssemov") + (set_attr "movu" "1") + (set_attr "memory" "store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_lddqu" [(set (match_operand:VI1 0 "register_operand" "=x") (unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")] @@ -1119,26 +1224,26 @@ } [(set_attr "isa" "noavx,noavx,avx,avx")]) -(define_expand "3" +(define_expand "3" [(set (match_operand:VF 0 "register_operand") (plusminus:VF (match_operand:VF 1 "nonimmediate_operand") (match_operand:VF 2 "nonimmediate_operand")))] - "TARGET_SSE" + "TARGET_SSE && " "ix86_fixup_binary_operands_no_copy (, mode, operands);") -(define_insn "*3" +(define_insn "*3" [(set (match_operand:VF 0 "register_operand" "=x,v") (plusminus:VF (match_operand:VF 1 "nonimmediate_operand" "0,v") (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE && ix86_binary_operator_ok (, mode, operands)" + "TARGET_SSE && ix86_binary_operator_ok (, mode, operands) && " "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_insn "_vm3" @@ -1158,26 +1263,26 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_expand "mul3" +(define_expand "mul3" [(set (match_operand:VF 0 "register_operand") (mult:VF (match_operand:VF 1 "nonimmediate_operand") (match_operand:VF 2 "nonimmediate_operand")))] - "TARGET_SSE" + "TARGET_SSE && " "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") -(define_insn "*mul3" +(define_insn "*mul3" [(set (match_operand:VF 0 "register_operand" "=x,v") (mult:VF (match_operand:VF 1 "nonimmediate_operand" "%0,v") (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands)" + "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands) && " "@ mul\t{%2, %0|%0, %2} - vmul\t{%2, %1, %0|%0, %1, %2}" + vmul\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "ssemul") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) @@ -1195,7 +1300,7 @@ v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") - (set_attr "prefix" "orig,maybe_evex") + (set_attr "prefix" "orig,vex") (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) @@ -1225,18 +1330,18 @@ } }) -(define_insn "_div3" +(define_insn "_div3" [(set (match_operand:VF 0 "register_operand" "=x,v") (div:VF (match_operand:VF 1 "register_operand" "0,v") (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE" + "TARGET_SSE && " "@ div\t{%2, %0|%0, %2} - vdiv\t{%2, %1, %0|%0, %1, %2}" + vdiv\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "ssediv") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_insn "_rcp2" @@ -1269,18 +1374,18 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "SF")]) -(define_insn "rcp14" +(define_insn "rcp14" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_RCP14))] "TARGET_AVX512F" - "vrcp14\t{%1, %0|%0, %1}" + "vrcp14\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "srcp14" +(define_insn "*srcp14" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 @@ -1316,11 +1421,11 @@ } }) -(define_insn "_sqrt2" +(define_insn "_sqrt2" [(set (match_operand:VF 0 "register_operand" "=v") (sqrt:VF (match_operand:VF 1 "nonimmediate_operand" "vm")))] - "TARGET_SSE" - "%vsqrt\t{%1, %0|%0, %1}" + "TARGET_SSE && " + "%vsqrt\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "atom_sse_attr" "sqrt") (set_attr "btver2_sse_attr" "sqrt") @@ -1341,8 +1446,8 @@ [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") (set_attr "atom_sse_attr" "sqrt") - (set_attr "btver2_sse_attr" "sqrt") (set_attr "prefix" "orig,vex") + (set_attr "btver2_sse_attr" "sqrt") (set_attr "mode" "")]) (define_expand "rsqrt2" @@ -1365,18 +1470,18 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) -(define_insn "rsqrt14" +(define_insn "rsqrt14" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_RSQRT14))] "TARGET_AVX512F" - "vrsqrt14\t{%1, %0|%0, %1}" + "vrsqrt14\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "rsqrt14" +(define_insn "*rsqrt14" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 @@ -1411,47 +1516,49 @@ ;; isn't really correct, as those rtl operators aren't defined when ;; applied to NaNs. Hopefully the optimizers won't get too smart on us. -(define_expand "3" +(define_expand "3" [(set (match_operand:VF 0 "register_operand") (smaxmin:VF (match_operand:VF 1 "nonimmediate_operand") (match_operand:VF 2 "nonimmediate_operand")))] - "TARGET_SSE" + "TARGET_SSE && " { if (!flag_finite_math_only) operands[1] = force_reg (mode, operands[1]); ix86_fixup_binary_operands_no_copy (, mode, operands); }) -(define_insn "*3_finite" +(define_insn "*3_finite" [(set (match_operand:VF 0 "register_operand" "=x,v") (smaxmin:VF (match_operand:VF 1 "nonimmediate_operand" "%0,v") (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] "TARGET_SSE && flag_finite_math_only - && ix86_binary_operator_ok (, mode, operands)" + && ix86_binary_operator_ok (, mode, operands) + && " "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "btver2_sse_attr" "maxmin") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) -(define_insn "*3" +(define_insn "*3" [(set (match_operand:VF 0 "register_operand" "=x,v") (smaxmin:VF (match_operand:VF 1 "register_operand" "0,v") (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE && !flag_finite_math_only" + "TARGET_SSE && !flag_finite_math_only + && " "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "btver2_sse_attr" "maxmin") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_insn "_vm3" @@ -2029,6 +2136,24 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_vmcmp3_mask" + [(set (match_operand: 0 "register_operand" "=k") + (and: + (unspec: + [(match_operand:VF_128 1 "register_operand" "v") + (match_operand:VF_128 2 "nonimmediate_operand" "vm") + (match_operand:SI 3 "const_0_to_31_operand" "n")] + UNSPEC_PCMP) + (and: + (match_operand: 4 "register_operand" "k") + (const_int 1))))] + "TARGET_AVX512F" + "vcmp\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}" + [(set_attr "type" "ssecmp") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx512f_maskcmp3" [(set (match_operand: 0 "register_operand" "=k") (match_operator: 3 "sse_comparison_operator" @@ -2579,7 +2704,39 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) -(define_insn "fma_fmsub_" +(define_insn "avx512f_fmadd__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (fma:VF_512 + (match_operand:VF_512 1 "register_operand" "0,0") + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fmadd__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=x") + (vec_merge:VF_512 + (fma:VF_512 + (match_operand:VF_512 1 "register_operand" "x") + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (match_operand:VF_512 3 "register_operand" "0")) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfmadd231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "*fma_fmsub_" [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x") (fma:FMAMODE (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x") @@ -2597,7 +2754,41 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) -(define_insn "fma_fnmadd_" +(define_insn "avx512f_fmsub__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (fma:VF_512 + (match_operand:VF_512 1 "register_operand" "0,0") + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (neg:VF_512 + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfmsub132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmsub213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fmsub__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (fma:VF_512 + (match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (neg:VF_512 + (match_operand:VF_512 3 "register_operand" "0"))) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfmsub231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "*fma_fnmadd_" [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x") (fma:FMAMODE (neg:FMAMODE @@ -2615,6 +2806,40 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "avx512f_fnmadd__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (fma:VF_512 + (neg:VF_512 + (match_operand:VF_512 1 "register_operand" "0,0")) + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfnmadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfnmadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fnmadd__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (fma:VF_512 + (neg:VF_512 + (match_operand:VF_512 1 "register_operand" "v")) + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (match_operand:VF_512 3 "register_operand" "0")) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfnmadd231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + (define_insn "*fma_fnmsub_" [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x") (fma:FMAMODE @@ -2634,6 +2859,42 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "avx512f_fnmsub__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (fma:VF_512 + (neg:VF_512 + (match_operand:VF_512 1 "register_operand" "0,0")) + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (neg:VF_512 + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfnmsub132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfnmsub213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fnmsub__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (fma:VF_512 + (neg:VF_512 + (match_operand:VF_512 1 "register_operand" "v")) + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (neg:VF_512 + (match_operand:VF_512 3 "register_operand" "0"))) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfnmsub231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + ;; FMA parallel floating point multiply addsub and subadd operations. ;; It would be possible to represent these without the UNSPEC as @@ -2672,6 +2933,40 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "avx512f_fmaddsub__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "0,0") + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm")] + UNSPEC_FMADDSUB) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfmaddsub132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmaddsub213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fmaddsub__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (match_operand:VF_512 3 "register_operand" "0")] + UNSPEC_FMADDSUB) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfmaddsub231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + (define_insn "*fma_fmsubadd_" [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x") (unspec:VF @@ -2691,6 +2986,42 @@ (set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "avx512f_fmsubadd__mask" + [(set (match_operand:VF_512 0 "register_operand" "=v,v") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "0,0") + (match_operand:VF_512 2 "nonimmediate_operand" "vm,v") + (neg:VF_512 + (match_operand:VF_512 3 "nonimmediate_operand" "v,vm"))] + UNSPEC_FMADDSUB) + (match_dup 1) + (match_operand: 4 "register_operand" "k,k")))] + "TARGET_AVX512F" + "@ + vfmsubadd132\t{%2, %3, %0%{%4%}|%0%{%4%}, %3, %2} + vfmsubadd213\t{%3, %2, %0%{%4%}|%0%{%4%}, %2, %3}" + [(set_attr "isa" "fma_avx512f,fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + +(define_insn "avx512f_fmsubadd__mask3" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm") + (neg:VF_512 + (match_operand:VF_512 3 "register_operand" "0"))] + UNSPEC_FMADDSUB) + (match_dup 3) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vfmsubadd231\t{%2, %1, %0%{%4%}|%0%{%4%}, %1, %2}" + [(set_attr "isa" "fma_avx512f") + (set_attr "type" "ssemuladd") + (set_attr "mode" "")]) + ;; FMA3 floating point scalar intrinsics. These merge result with ;; high-order elements from the destination register. @@ -3014,7 +3345,7 @@ [(set (match_operand:DI 0 "register_operand" "=r,r") (fix:DI (vec_select:SF - (match_operand:V4SF 1 "nonimmediate_operand" "v,m") + (match_operand:V4SF 1 "nonimmediate_operand" "v,vm") (parallel [(const_int 0)]))))] "TARGET_SSE && TARGET_64BIT" "%vcvttss2si{q}\t{%1, %0|%0, %k1}" @@ -3054,22 +3385,22 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "float2" +(define_insn "float2" [(set (match_operand:VF1 0 "register_operand" "=v") (float:VF1 (match_operand: 1 "nonimmediate_operand" "vm")))] - "TARGET_SSE2" - "%vcvtdq2ps\t{%1, %0|%0, %1}" + "TARGET_SSE2 && " + "%vcvtdq2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) -(define_insn "ufloatv16siv16sf2" +(define_insn "ufloatv16siv16sf2" [(set (match_operand:V16SF 0 "register_operand" "=v") (unsigned_float:V16SF (match_operand:V16SI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vcvtudq2ps\t{%1, %0|%0, %1}" + "vcvtudq2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -3104,34 +3435,34 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) -(define_insn "avx512f_fix_notruncv16sfv16si" +(define_insn "avx512f_fix_notruncv16sfv16si" [(set (match_operand:V16SI 0 "register_operand" "=v") (unspec:V16SI [(match_operand:V16SF 1 "nonimmediate_operand" "vm")] UNSPEC_FIX_NOTRUNC))] "TARGET_AVX512F" - "vcvtps2dq\t{%1, %0|%0, %1}" + "vcvtps2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "avx512f_ufix_notruncv16sfv16si" +(define_insn "avx512f_ufix_notruncv16sfv16si" [(set (match_operand:V16SI 0 "register_operand" "=v") (unspec:V16SI [(match_operand:V16SF 1 "nonimmediate_operand" "vm")] UNSPEC_UNSIGNED_FIX_NOTRUNC))] "TARGET_AVX512F" - "vcvtps2udq\t{%1, %0|%0, %1}" + "vcvtps2udq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "fix_truncv16sfv16si2" +(define_insn "fix_truncv16sfv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") (any_fix:V16SI (match_operand:V16SF 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vcvttps2dq\t{%1, %0|%0, %1}" + "vcvttps2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -3461,20 +3792,21 @@ (define_mode_attr si2dfmodelower [(V8DF "v8si") (V4DF "v4si")]) -(define_insn "float2" +(define_insn "float2" [(set (match_operand:VF2_512_256 0 "register_operand" "=v") (float:VF2_512_256 (match_operand: 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX" - "vcvtdq2pd\t{%1, %0|%0, %1}" + "TARGET_AVX && " + "vcvtdq2pd\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) -(define_insn "ufloatv8siv8df" +(define_insn "ufloatv8siv8df" [(set (match_operand:V8DF 0 "register_operand" "=v") - (unsigned_float:V8DF (match_operand:V8SI 1 "nonimmediate_operand" "vm")))] + (unsigned_float:V8DF + (match_operand:V8SI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vcvtudq2pd\t{%1, %0|%0, %1}" + "vcvtudq2pd\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "V8DF")]) @@ -3519,12 +3851,13 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "V2DF")]) -(define_insn "avx512f_cvtpd2dq512" +(define_insn "avx512f_cvtpd2dq512" [(set (match_operand:V8SI 0 "register_operand" "=v") - (unspec:V8SI [(match_operand:V8DF 1 "nonimmediate_operand" "vm")] - UNSPEC_FIX_NOTRUNC))] + (unspec:V8SI + [(match_operand:V8DF 1 "nonimmediate_operand" "vm")] + UNSPEC_FIX_NOTRUNC))] "TARGET_AVX512F" - "vcvtpd2dq\t{%1, %0|%0, %1}" + "vcvtpd2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "OI")]) @@ -3592,22 +3925,23 @@ (set_attr "athlon_decode" "vector") (set_attr "bdver1_decode" "double")]) -(define_insn "avx512f_ufix_notruncv8dfv8si" +(define_insn "avx512f_ufix_notruncv8dfv8si" [(set (match_operand:V8SI 0 "register_operand" "=v") (unspec:V8SI [(match_operand:V8DF 1 "nonimmediate_operand" "vm")] UNSPEC_UNSIGNED_FIX_NOTRUNC))] "TARGET_AVX512F" - "vcvtpd2udq\t{%1, %0|%0, %1}" + "vcvtpd2udq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "OI")]) -(define_insn "fix_truncv8dfv8si2" +(define_insn "fix_truncv8dfv8si2" [(set (match_operand:V8SI 0 "register_operand" "=v") - (any_fix:V8SI (match_operand:V8DF 1 "nonimmediate_operand" "vm")))] + (any_fix:V8SI + (match_operand:V8DF 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vcvttpd2dq\t{%1, %0|%0, %1}" + "vcvttpd2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "OI")]) @@ -3713,12 +4047,12 @@ (set_attr "prefix" "orig,orig,vex") (set_attr "mode" "DF")]) -(define_insn "avx512f_cvtpd2ps512" +(define_insn "avx512f_cvtpd2ps512" [(set (match_operand:V8SF 0 "register_operand" "=v") (float_truncate:V8SF (match_operand:V8DF 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vcvtpd2ps\t{%1, %0|%0, %1}" + "vcvtpd2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "V8SF")]) @@ -3768,12 +4102,12 @@ (define_mode_attr sf2dfmode [(V8DF "V8SF") (V4DF "V4SF")]) -(define_insn "_cvtps2pd" +(define_insn "_cvtps2pd" [(set (match_operand:VF2_512_256 0 "register_operand" "=v") (float_extend:VF2_512_256 (match_operand: 1 "nonimmediate_operand" "vm")))] - "TARGET_AVX" - "vcvtps2pd\t{%1, %0|%0, %1}" + "TARGET_AVX && " + "vcvtps2pd\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) @@ -4122,6 +4456,30 @@ DONE; }) +(define_expand "vec_unpacku_float_lo_v16si" + [(match_operand:V8DF 0 "register_operand") + (match_operand:V16SI 1 "nonimmediate_operand")] + "TARGET_AVX512F" +{ + REAL_VALUE_TYPE TWO32r; + rtx k, x, tmp[3]; + + real_ldexp (&TWO32r, &dconst1, 32); + x = const_double_from_real_value (TWO32r, DFmode); + + tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode)); + tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x)); + tmp[2] = gen_reg_rtx (V8DFmode); + k = gen_reg_rtx (QImode); + + emit_insn (gen_avx512f_cvtdq2pd512_2 (tmp[2], operands[1])); + emit_insn (gen_rtx_SET (VOIDmode, k, + gen_rtx_LT (QImode, tmp[2], tmp[0]))); + emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k)); + emit_move_insn (operands[0], tmp[2]); + DONE; +}) + (define_expand "vec_pack_trunc_" [(set (match_dup 3) (float_truncate: @@ -4409,7 +4767,7 @@ (set_attr "prefix" "orig,vex,orig,vex,maybe_vex") (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) -(define_insn "avx512f_unpckhps512" +(define_insn "avx512f_unpckhps512" [(set (match_operand:V16SF 0 "register_operand" "=v") (vec_select:V16SF (vec_concat:V32SF @@ -4424,7 +4782,7 @@ (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] "TARGET_AVX512F" - "vunpckhps\t{%2, %1, %0|%0, %1, %2}" + "vunpckhps\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -4497,7 +4855,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "V4SF")]) -(define_insn "avx512f_unpcklps512" +(define_insn "avx512f_unpcklps512" [(set (match_operand:V16SF 0 "register_operand" "=v") (vec_select:V16SF (vec_concat:V32SF @@ -4512,7 +4870,7 @@ (const_int 12) (const_int 28) (const_int 13) (const_int 29)])))] "TARGET_AVX512F" - "vunpcklps\t{%2, %1, %0|%0, %1, %2}" + "vunpcklps\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -4620,7 +4978,7 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "V4SF")]) -(define_insn "avx512f_movshdup512" +(define_insn "avx512f_movshdup512" [(set (match_operand:V16SF 0 "register_operand" "=v") (vec_select:V16SF (vec_concat:V32SF @@ -4635,7 +4993,7 @@ (const_int 13) (const_int 13) (const_int 15) (const_int 15)])))] "TARGET_AVX512F" - "vmovshdup\t{%1, %0|%0, %1}" + "vmovshdup\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -4673,7 +5031,7 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "V4SF")]) -(define_insn "avx512f_movsldup512" +(define_insn "avx512f_movsldup512" [(set (match_operand:V16SF 0 "register_operand" "=v") (vec_select:V16SF (vec_concat:V32SF @@ -4688,7 +5046,7 @@ (const_int 12) (const_int 12) (const_int 14) (const_int 14)])))] "TARGET_AVX512F" - "vmovsldup\t{%1, %0|%0, %1}" + "vmovsldup\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -5222,8 +5580,71 @@ operands[1] = adjust_address (operands[1], SFmode, INTVAL (operands[2]) * 4); }) -(define_insn "avx512f_vextract32x4_1" - [(set (match_operand: 0 "nonimmediate_operand" "=vm") +(define_expand "avx512f_vextract32x4_mask" + [(match_operand: 0 "nonimmediate_operand") + (match_operand:V16FI 1 "register_operand") + (match_operand:SI 2 "const_0_to_3_operand") + (match_operand: 3 "nonimmediate_operand") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512F" +{ + if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR) + operands[0] = force_reg (mode, operands[0]); + switch (INTVAL (operands[2])) + { + case 0: + emit_insn (gen_avx512f_vextract32x4_1_mask (operands[0], + operands[1], GEN_INT (0), GEN_INT (1), GEN_INT (2), + GEN_INT (3), operands[3], operands[4])); + break; + case 1: + emit_insn (gen_avx512f_vextract32x4_1_mask (operands[0], + operands[1], GEN_INT (4), GEN_INT (5), GEN_INT (6), + GEN_INT (7), operands[3], operands[4])); + break; + case 2: + emit_insn (gen_avx512f_vextract32x4_1_mask (operands[0], + operands[1], GEN_INT (8), GEN_INT (9), GEN_INT (10), + GEN_INT (11), operands[3], operands[4])); + break; + case 3: + emit_insn (gen_avx512f_vextract32x4_1_mask (operands[0], + operands[1], GEN_INT (12), GEN_INT (13), GEN_INT (14), + GEN_INT (15), operands[3], operands[4])); + break; + default: + gcc_unreachable (); + } + DONE; +}) + +(define_insn "avx512f_vextract32x4_1_maskm" + [(set (match_operand: 0 "memory_operand" "=m") + (vec_merge: + (vec_select: + (match_operand:V16FI 1 "register_operand" "v") + (parallel [(match_operand 2 "const_0_to_15_operand") + (match_operand 3 "const_0_to_15_operand") + (match_operand 4 "const_0_to_15_operand") + (match_operand 5 "const_0_to_15_operand")])) + (match_operand: 6 "memory_operand" "0") + (match_operand:QI 7 "register_operand" "k")))] + "TARGET_AVX512F && (INTVAL (operands[2]) = INTVAL (operands[3]) - 1) + && (INTVAL (operands[3]) = INTVAL (operands[4]) - 1) + && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)" +{ + operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2); + return "vextract32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}"; +} + [(set_attr "type" "sselog") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "memory" "store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_vextract32x4_1" + [(set (match_operand: 0 "" "=") (vec_select: (match_operand:V16FI 1 "register_operand" "v") (parallel [(match_operand 2 "const_0_to_15_operand") @@ -5235,7 +5656,7 @@ && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)" { operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2); - return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; + return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") @@ -5247,6 +5668,35 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_expand "avx512f_vextract64x4_mask" + [(match_operand: 0 "nonimmediate_operand") + (match_operand:V8FI 1 "register_operand") + (match_operand:SI 2 "const_0_to_1_operand") + (match_operand: 3 "nonimmediate_operand") + (match_operand:QI 4 "register_operand")] + "TARGET_AVX512F" +{ + rtx (*insn)(rtx, rtx, rtx, rtx); + + if (MEM_P (operands[0]) && GET_CODE (operands[3]) == CONST_VECTOR) + operands[0] = force_reg (mode, operands[0]); + + switch (INTVAL (operands[2])) + { + case 0: + insn = gen_vec_extract_lo__mask; + break; + case 1: + insn = gen_vec_extract_hi__mask; + break; + default: + gcc_unreachable (); + } + + emit_insn (insn (operands[0], operands[1], operands[3], operands[4])); + DONE; +}) + (define_split [(set (match_operand: 0 "nonimmediate_operand") (vec_select: @@ -5266,14 +5716,36 @@ DONE; }) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "nonimmediate_operand" "=vm") +(define_insn "vec_extract_lo__maskm" + [(set (match_operand: 0 "memory_operand" "=m") + (vec_merge: + (vec_select: + (match_operand:V8FI 1 "register_operand" "v") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])) + (match_operand: 2 "memory_operand" "0") + (match_operand:QI 3 "register_operand" "k")))] + "TARGET_AVX512F" +"vextract64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}" + [(set_attr "type" "sselog") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "" "=") (vec_select: (match_operand:V8FI 1 "nonimmediate_operand" "vm") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" - "#" +{ + if () + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; + else + return "#"; +} [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") @@ -5284,14 +5756,32 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "nonimmediate_operand" "=vm") +(define_insn "vec_extract_hi__maskm" + [(set (match_operand: 0 "memory_operand" "=m") + (vec_merge: + (vec_select: + (match_operand:V8FI 1 "register_operand" "v") + (parallel [(const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand: 2 "memory_operand" "0") + (match_operand:QI 3 "register_operand" "k")))] + "TARGET_AVX512F" + "vextract64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" + [(set_attr "type" "sselog") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "memory" "store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "" "=") (vec_select: (match_operand:V8FI 1 "register_operand" "v") (parallel [(const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] "TARGET_AVX512F" - "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" + "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") @@ -5643,7 +6133,7 @@ ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; -(define_insn "avx512f_unpckhpd512" +(define_insn "avx512f_unpckhpd512" [(set (match_operand:V8DF 0 "register_operand" "=v") (vec_select:V8DF (vec_concat:V16DF @@ -5654,7 +6144,7 @@ (const_int 5) (const_int 13) (const_int 7) (const_int 15)])))] "TARGET_AVX512F" - "vunpckhpd\t{%2, %1, %0|%0, %1, %2}" + "vunpckhpd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "V8DF")]) @@ -5739,7 +6229,7 @@ (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex") (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,V1DF")]) -(define_expand "avx512f_movddup512" +(define_expand "avx512f_movddup512" [(set (match_operand:V8DF 0 "register_operand") (vec_select:V8DF (vec_concat:V16DF @@ -5751,7 +6241,7 @@ (const_int 6) (const_int 14)])))] "TARGET_AVX512F") -(define_expand "avx512f_unpcklpd512" +(define_expand "avx512f_unpcklpd512" [(set (match_operand:V8DF 0 "register_operand") (vec_select:V8DF (vec_concat:V16DF @@ -5763,7 +6253,7 @@ (const_int 6) (const_int 14)])))] "TARGET_AVX512F") -(define_insn "*avx512f_unpcklpd512" +(define_insn "*avx512f_unpcklpd512" [(set (match_operand:V8DF 0 "register_operand" "=v,v") (vec_select:V8DF (vec_concat:V16DF @@ -5775,8 +6265,8 @@ (const_int 6) (const_int 14)])))] "TARGET_AVX512F" "@ - vunpcklpd\t{%2, %1, %0|%0, %1, %2} - vmovddup\t{%1, %0|%0, %1}" + vunpcklpd\t{%2, %1, %0|%0, %1, %2} + vmovddup\t{%1, %0|%0, %1}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "V8DF")]) @@ -5913,12 +6403,13 @@ operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8); }) -(define_insn "avx512f_vmscalef" +(define_insn "*avx512f_vmscalef" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 - (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "vm")] - UNSPEC_SCALEF) + (unspec:VF_128 + [(match_operand:VF_128 1 "register_operand" "v") + (match_operand:VF_128 2 "nonimmediate_operand" "vm")] + UNSPEC_SCALEF) (match_dup 1) (const_int 1)))] "TARGET_AVX512F" @@ -5926,13 +6417,14 @@ [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_scalef" +(define_insn "avx512f_scalef" [(set (match_operand:VF_512 0 "register_operand" "=v") - (unspec:VF_512 [(match_operand:VF_512 1 "register_operand" "v") - (match_operand:VF_512 2 "nonimmediate_operand" "vm")] - UNSPEC_SCALEF))] + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm")] + UNSPEC_SCALEF))] "TARGET_AVX512F" - "%vscalef\t{%2, %1, %0|%0, %1, %2}" + "%vscalef\t{%2, %1, %0|%0, %1, %2}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -5950,21 +6442,39 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_getexp" +(define_insn "avx512f_vternlog_mask" + [(set (match_operand:VI48_512 0 "register_operand" "=v") + (vec_merge:VI48_512 + (unspec:VI48_512 + [(match_operand:VI48_512 1 "register_operand" "0") + (match_operand:VI48_512 2 "register_operand" "v") + (match_operand:VI48_512 3 "nonimmediate_operand" "vm") + (match_operand:SI 4 "const_0_to_255_operand")] + UNSPEC_VTERNLOG) + (match_dup 1) + (match_operand: 5 "register_operand" "k")))] + "TARGET_AVX512F" + "vpternlog\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_getexp" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_GETEXP))] "TARGET_AVX512F" - "vgetexp\t{%1, %0|%0, %1}"; + "vgetexp\t{%1, %0|%0, %1}"; [(set_attr "prefix" "evex") (set_attr "mode" "")]) (define_insn "avx512f_sgetexp" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 - (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "nonimmediate_operand" "vm")] - UNSPEC_GETEXP) + (unspec:VF_128 + [(match_operand:VF_128 1 "register_operand" "v") + (match_operand:VF_128 2 "nonimmediate_operand" "vm")] + UNSPEC_GETEXP) (match_dup 1) (const_int 1)))] "TARGET_AVX512F" @@ -5972,17 +6482,48 @@ [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_align" +(define_insn "avx512f_align" [(set (match_operand:VI48_512 0 "register_operand" "=v") (unspec:VI48_512 [(match_operand:VI48_512 1 "register_operand" "v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_ALIGN))] "TARGET_AVX512F" - "valign\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + "valign\t{%3, %2, %1, %0|%0, %1, %2, %3}"; [(set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_expand "avx512f_shufps512_mask" + [(match_operand:V16SF 0 "register_operand") + (match_operand:V16SF 1 "register_operand") + (match_operand:V16SF 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_255_operand") + (match_operand:V16SF 4 "register_operand") + (match_operand:HI 5 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[3]); + emit_insn (gen_avx512f_shufps512_1_mask (operands[0], operands[1], operands[2], + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT (((mask >> 4) & 3) + 16), + GEN_INT (((mask >> 6) & 3) + 16), + GEN_INT (((mask >> 0) & 3) + 4), + GEN_INT (((mask >> 2) & 3) + 4), + GEN_INT (((mask >> 4) & 3) + 20), + GEN_INT (((mask >> 6) & 3) + 20), + GEN_INT (((mask >> 0) & 3) + 8), + GEN_INT (((mask >> 2) & 3) + 8), + GEN_INT (((mask >> 4) & 3) + 24), + GEN_INT (((mask >> 6) & 3) + 24), + GEN_INT (((mask >> 0) & 3) + 12), + GEN_INT (((mask >> 2) & 3) + 12), + GEN_INT (((mask >> 4) & 3) + 28), + GEN_INT (((mask >> 6) & 3) + 28), + operands[4], operands[5])); + DONE; +}) + (define_insn "avx512f_fixupimm" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 @@ -5996,6 +6537,22 @@ [(set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_fixupimm_mask" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (vec_merge:VF_512 + (unspec:VF_512 + [(match_operand:VF_512 1 "register_operand" "0") + (match_operand:VF_512 2 "register_operand" "v") + (match_operand: 3 "nonimmediate_operand" "vm") + (match_operand:SI 4 "const_0_to_255_operand")] + UNSPEC_FIXUPIMM) + (match_dup 1) + (match_operand: 5 "register_operand" "k")))] + "TARGET_AVX512F" + "vfixupimm\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}"; + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx512f_sfixupimm" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 @@ -6012,19 +6569,38 @@ [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_rndscale" +(define_insn "avx512f_sfixupimm_mask" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (vec_merge:VF_128 + (unspec:VF_128 + [(match_operand:VF_128 1 "register_operand" "0") + (match_operand:VF_128 2 "register_operand" "v") + (match_operand: 3 "nonimmediate_operand" "vm") + (match_operand:SI 4 "const_0_to_255_operand")] + UNSPEC_FIXUPIMM) + (match_dup 1) + (const_int 1)) + (match_dup 1) + (match_operand: 5 "register_operand" "k")))] + "TARGET_AVX512F" + "vfixupimm\t{%4, %3, %2, %0%{%5%}|%0%{%5%}, %2, %3, %4}"; + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_rndscale" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm") (match_operand:SI 2 "const_0_to_255_operand")] UNSPEC_ROUND))] "TARGET_AVX512F" - "vrndscale\t{%2, %1, %0|%0, %1, %2}" + "vrndscale\t{%2, %1, %0|%0, %1, %2}" [(set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_rndscale" +(define_insn "*avx512f_rndscale" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 @@ -6041,7 +6617,7 @@ (set_attr "mode" "")]) ;; One bit in mask selects 2 elements. -(define_insn "avx512f_shufps512_1" +(define_insn "avx512f_shufps512_1" [(set (match_operand:V16SF 0 "register_operand" "=v") (vec_select:V16SF (vec_concat:V32SF @@ -6084,14 +6660,37 @@ mask |= (INTVAL (operands[6]) - 16) << 6; operands[3] = GEN_INT (mask); - return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) -(define_insn "avx512f_shufpd512_1" +(define_expand "avx512f_shufpd512_mask" + [(match_operand:V8DF 0 "register_operand") + (match_operand:V8DF 1 "register_operand") + (match_operand:V8DF 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_255_operand") + (match_operand:V8DF 4 "register_operand") + (match_operand:QI 5 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[3]); + emit_insn (gen_avx512f_shufpd512_1_mask (operands[0], operands[1], operands[2], + GEN_INT (mask & 1), + GEN_INT (mask & 2 ? 9 : 8), + GEN_INT (mask & 4 ? 3 : 2), + GEN_INT (mask & 8 ? 11 : 10), + GEN_INT (mask & 16 ? 5 : 4), + GEN_INT (mask & 32 ? 13 : 12), + GEN_INT (mask & 64 ? 7 : 6), + GEN_INT (mask & 128 ? 15 : 14), + operands[4], operands[5])); + DONE; +}) + +(define_insn "avx512f_shufpd512_1" [(set (match_operand:V8DF 0 "register_operand" "=v") (vec_select:V8DF (vec_concat:V16DF @@ -6118,7 +6717,7 @@ mask |= (INTVAL (operands[10]) - 14) << 7; operands[3] = GEN_INT (mask); - return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") @@ -6198,7 +6797,7 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx512f_interleave_highv8di" +(define_insn "avx512f_interleave_highv8di" [(set (match_operand:V8DI 0 "register_operand" "=v") (vec_select:V8DI (vec_concat:V16DI @@ -6209,7 +6808,7 @@ (const_int 5) (const_int 13) (const_int 7) (const_int 15)])))] "TARGET_AVX512F" - "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}" + "vpunpckhqdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -6248,7 +6847,7 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx512f_interleave_lowv8di" +(define_insn "avx512f_interleave_lowv8di" [(set (match_operand:V8DI 0 "register_operand" "=v") (vec_select:V8DI (vec_concat:V16DI @@ -6259,7 +6858,7 @@ (const_int 4) (const_int 12) (const_int 6) (const_int 14)])))] "TARGET_AVX512F" - "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}" + "vpunpcklqdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -6630,6 +7229,20 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_2_mask" + [(set (match_operand:PMOV_DST_MODE 0 "nonimmediate_operand" "=v,m") + (vec_merge:PMOV_DST_MODE + (any_truncate:PMOV_DST_MODE + (match_operand: 1 "register_operand" "v,v")) + (match_operand:PMOV_DST_MODE 2 "vector_move_operand" "0C,0") + (match_operand: 3 "register_operand" "k,k")))] + "TARGET_AVX512F" + "vpmov\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssemov") + (set_attr "memory" "none,store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "*avx512f_v8div16qi2" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_concat:V16QI @@ -6663,6 +7276,55 @@ (set_attr "prefix" "evex") (set_attr "mode" "TI")]) +(define_insn "avx512f_v8div16qi2_mask" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_concat:V16QI + (vec_merge:V8QI + (any_truncate:V8QI + (match_operand:V8DI 1 "register_operand" "v")) + (vec_select:V8QI + (match_operand:V16QI 2 "vector_move_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 3 "register_operand" "k")) + (const_vector:V8QI [(const_int 0) (const_int 0) + (const_int 0) (const_int 0) + (const_int 0) (const_int 0) + (const_int 0) (const_int 0)])))] + "TARGET_AVX512F" + "vpmovqb\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + +(define_insn "*avx512f_v8div16qi2_store_mask" + [(set (match_operand:V16QI 0 "memory_operand" "=m") + (vec_concat:V16QI + (vec_merge:V8QI + (any_truncate:V8QI + (match_operand:V8DI 1 "register_operand" "v")) + (vec_select:V8QI + (match_dup 0) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand:QI 2 "register_operand" "k")) + (vec_select:V8QI + (match_dup 0) + (parallel [(const_int 8) (const_int 9) + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)]))))] + "TARGET_AVX512F" + "vpmovqb\t{%1, %0%{%2%}|%0%{%2%}, %1}" + [(set_attr "type" "ssemov") + (set_attr "memory" "store") + (set_attr "prefix" "evex") + (set_attr "mode" "TI")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral arithmetic @@ -6677,27 +7339,27 @@ "TARGET_SSE2" "operands[2] = force_reg (mode, CONST0_RTX (mode));") -(define_expand "3" +(define_expand "3" [(set (match_operand:VI_AVX2 0 "register_operand") (plusminus:VI_AVX2 (match_operand:VI_AVX2 1 "nonimmediate_operand") (match_operand:VI_AVX2 2 "nonimmediate_operand")))] - "TARGET_SSE2" + "TARGET_SSE2 && " "ix86_fixup_binary_operands_no_copy (, mode, operands);") -(define_insn "*3" +(define_insn "*3" [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v") (plusminus:VI_AVX2 (match_operand:VI_AVX2 1 "nonimmediate_operand" "0,v") (match_operand:VI_AVX2 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)" + "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands) && " "@ p\t{%2, %0|%0, %2} - vp\t{%2, %1, %0|%0, %1, %2}" + vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseiadd") (set_attr "prefix_data16" "1,*") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_expand "_3" @@ -6787,7 +7449,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_expand "vec_widen_umult_even_v16si" +(define_expand "vec_widen_umult_even_v16si" [(set (match_operand:V8DI 0 "register_operand") (mult:V8DI (zero_extend:V8DI @@ -6807,7 +7469,7 @@ "TARGET_AVX512F" "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);") -(define_insn "*vec_widen_umult_even_v16si" +(define_insn "*vec_widen_umult_even_v16si" [(set (match_operand:V8DI 0 "register_operand" "=v") (mult:V8DI (zero_extend:V8DI @@ -6825,7 +7487,7 @@ (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)" - "vpmuludq\t{%2, %1, %0|%0, %1, %2}" + "vpmuludq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "avx512f") (set_attr "type" "sseimul") (set_attr "prefix_extra" "1") @@ -6902,7 +7564,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_expand "vec_widen_smult_even_v16si" +(define_expand "vec_widen_smult_even_v16si" [(set (match_operand:V8DI 0 "register_operand") (mult:V8DI (sign_extend:V8DI @@ -6922,7 +7584,7 @@ "TARGET_AVX512F" "ix86_fixup_binary_operands_no_copy (MULT, V16SImode, operands);") -(define_insn "*vec_widen_smult_even_v16si" +(define_insn "*vec_widen_smult_even_v16si" [(set (match_operand:V8DI 0 "register_operand" "=x") (mult:V8DI (sign_extend:V8DI @@ -6940,7 +7602,7 @@ (const_int 8) (const_int 10) (const_int 12) (const_int 14)])))))] "TARGET_AVX512F && ix86_binary_operator_ok (MULT, V16SImode, operands)" - "vpmuldq\t{%2, %1, %0|%0, %1, %2}" + "vpmuldq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "avx512f") (set_attr "type" "sseimul") (set_attr "prefix_extra" "1") @@ -7151,12 +7813,12 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_expand "mul3" +(define_expand "mul3" [(set (match_operand:VI4_AVX512F 0 "register_operand") (mult:VI4_AVX512F (match_operand:VI4_AVX512F 1 "general_vector_operand") (match_operand:VI4_AVX512F 2 "general_vector_operand")))] - "TARGET_SSE2" + "TARGET_SSE2 && " { if (TARGET_SSE4_1) { @@ -7173,19 +7835,19 @@ } }) -(define_insn "*_mul3" +(define_insn "*_mul3" [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v") (mult:VI4_AVX512F (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v") (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, mode, operands)" + "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, mode, operands) && " "@ pmulld\t{%2, %0|%0, %2} - vpmulld\t{%2, %1, %0|%0, %1, %2}" + vpmulld\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseimul") (set_attr "prefix_extra" "1") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "btver2_decode" "vector,vector") (set_attr "mode" "")]) @@ -7298,6 +7960,20 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) +(define_insn "ashr3" + [(set (match_operand:VI48_512 0 "register_operand" "=v,v") + (ashiftrt:VI48_512 + (match_operand:VI48_512 1 "nonimmediate_operand" "v,vm") + (match_operand:SI 2 "nonmemory_operand" "v,N")))] + "TARGET_AVX512F && " + "vpsra\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseishft") + (set (attr "length_immediate") + (if_then_else (match_operand 2 "const_int_operand") + (const_string "1") + (const_string "0"))) + (set_attr "mode" "")]) + (define_insn "3" [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x") (any_lshift:VI248_AVX2 @@ -7317,13 +7993,13 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "3" +(define_insn "3" [(set (match_operand:VI48_512 0 "register_operand" "=v,v") (any_lshift:VI48_512 (match_operand:VI48_512 1 "register_operand" "v,m") (match_operand:SI 2 "nonmemory_operand" "vN,N")))] - "TARGET_AVX512F" - "vp\t{%2, %1, %0|%0, %1, %2}" + "TARGET_AVX512F && " + "vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "avx512f") (set_attr "type" "sseishft") (set (attr "length_immediate") @@ -7333,6 +8009,7 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) + (define_expand "vec_shl_" [(set (match_operand:VI_128 0 "register_operand") (ashift:V1TI @@ -7408,41 +8085,42 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "avx512f_v" +(define_insn "avx512f_v" [(set (match_operand:VI48_512 0 "register_operand" "=v") (any_rotate:VI48_512 (match_operand:VI48_512 1 "register_operand" "v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vpv\t{%2, %1, %0|%0, %1, %2}" + "vpv\t{%2, %1, %0|%0, %1, %2}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_" +(define_insn "avx512f_" [(set (match_operand:VI48_512 0 "register_operand" "=v") (any_rotate:VI48_512 (match_operand:VI48_512 1 "nonimmediate_operand" "vm") (match_operand:SI 2 "const_0_to_255_operand")))] "TARGET_AVX512F" - "vp\t{%2, %1, %0|%0, %1, %2}" + "vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_expand "3" +(define_expand "3" [(set (match_operand:VI124_256_48_512 0 "register_operand") (maxmin:VI124_256_48_512 (match_operand:VI124_256_48_512 1 "nonimmediate_operand") (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))] - "TARGET_AVX2" + "TARGET_AVX2 && " "ix86_fixup_binary_operands_no_copy (, mode, operands);") -(define_insn "*avx2_3" +(define_insn "*avx2_3" [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v") (maxmin:VI124_256_48_512 (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v") (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))] - "TARGET_AVX2 && ix86_binary_operator_ok (, mode, operands)" - "vp\t{%2, %1, %0|%0, %1, %2}" + "TARGET_AVX2 && ix86_binary_operator_ok (, mode, operands) + && " + "vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseiadd") (set_attr "prefix_extra" "1") (set_attr "prefix" "maybe_evex") @@ -7978,19 +8656,19 @@ operands[2] = force_reg (mode, gen_rtx_CONST_VECTOR (mode, v)); }) -(define_expand "_andnot3" +(define_expand "_andnot3" [(set (match_operand:VI_AVX2 0 "register_operand") (and:VI_AVX2 (not:VI_AVX2 (match_operand:VI_AVX2 1 "register_operand")) (match_operand:VI_AVX2 2 "nonimmediate_operand")))] - "TARGET_SSE2") + "TARGET_SSE2 && ") -(define_insn "*andnot3" +(define_insn "*andnot3" [(set (match_operand:VI 0 "register_operand" "=x,v") (and:VI (not:VI (match_operand:VI 1 "register_operand" "0,v")) (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE" + "TARGET_SSE && " { static char buf[64]; const char *ops; @@ -8030,7 +8708,7 @@ ops = "%s\t{%%2, %%0|%%0, %%2}"; break; case 1: - ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; break; default: gcc_unreachable (); @@ -8047,7 +8725,7 @@ (eq_attr "mode" "TI")) (const_string "1") (const_string "*"))) - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set (attr "mode") (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "") @@ -8075,12 +8753,12 @@ DONE; }) -(define_insn "*3" +(define_insn "3" [(set (match_operand:VI 0 "register_operand" "=x,v") (any_logic:VI (match_operand:VI 1 "nonimmediate_operand" "%0,v") (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))] - "TARGET_SSE + "TARGET_SSE && && ix86_binary_operator_ok (, mode, operands)" { static char buf[64]; @@ -8122,7 +8800,7 @@ ops = "%s\t{%%2, %%0|%%0, %%2}"; break; case 1: - ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; break; default: gcc_unreachable (); @@ -8139,7 +8817,7 @@ (eq_attr "mode" "TI")) (const_string "1") (const_string "*"))) - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set (attr "mode") (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "") @@ -8447,7 +9125,7 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx512f_interleave_highv16si" +(define_insn "avx512f_interleave_highv16si" [(set (match_operand:V16SI 0 "register_operand" "=v") (vec_select:V16SI (vec_concat:V32SI @@ -8462,7 +9140,7 @@ (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] "TARGET_AVX512F" - "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}" + "vpunpckhdq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -8502,7 +9180,7 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx512f_interleave_lowv16si" +(define_insn "avx512f_interleave_lowv16si" [(set (match_operand:V16SI 0 "register_operand" "=v") (vec_select:V16SI (vec_concat:V32SI @@ -8517,7 +9195,7 @@ (const_int 12) (const_int 28) (const_int 13) (const_int 29)])))] "TARGET_AVX512F" - "vpunpckldq\t{%2, %1, %0|%0, %1, %2}" + "vpunpckldq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -8640,7 +9318,45 @@ (set_attr "prefix" "orig,orig,vex,vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_vinsert32x4_1" +(define_expand "avx512f_vinsert32x4_mask" + [(match_operand:V16FI 0 "register_operand") + (match_operand:V16FI 1 "register_operand") + (match_operand: 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_3_operand") + (match_operand:V16FI 4 "register_operand") + (match_operand: 5 "register_operand")] + "TARGET_AVX512F" +{ + switch (INTVAL (operands[3])) + { + case 0: + emit_insn (gen_avx512f_vinsert32x4_1_mask (operands[0], + operands[1], operands[2], GEN_INT (0xFFF), operands[4], + operands[5])); + break; + case 1: + emit_insn (gen_avx512f_vinsert32x4_1_mask (operands[0], + operands[1], operands[2], GEN_INT (0xF0FF), operands[4], + operands[5])); + break; + case 2: + emit_insn (gen_avx512f_vinsert32x4_1_mask (operands[0], + operands[1], operands[2], GEN_INT (0xFF0F), operands[4], + operands[5])); + break; + case 3: + emit_insn (gen_avx512f_vinsert32x4_1_mask (operands[0], + operands[1], operands[2], GEN_INT (0xFFF0), operands[4], + operands[5])); + break; + default: + gcc_unreachable (); + } + DONE; + +}) + +(define_insn "avx512f_vinsert32x4_1" [(set (match_operand:V16FI 0 "register_operand" "=v") (vec_merge:V16FI (match_operand:V16FI 1 "register_operand" "v") @@ -8663,14 +9379,35 @@ operands[3] = GEN_INT (mask); - return "vinsert32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + return "vinsert32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_set_lo_" +(define_expand "avx512f_vinsert64x4_mask" + [(match_operand:V8FI 0 "register_operand") + (match_operand:V8FI 1 "register_operand") + (match_operand: 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_1_operand") + (match_operand:V8FI 4 "register_operand") + (match_operand: 5 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[3]); + if (mask == 0) + emit_insn (gen_vec_set_lo__mask + (operands[0], operands[1], operands[2], + operands[4], operands[5])); + else + emit_insn (gen_vec_set_hi__mask + (operands[0], operands[1], operands[2], + operands[4], operands[5])); + DONE; +}) + +(define_insn "vec_set_lo_" [(set (match_operand:V8FI 0 "register_operand" "=v") (vec_concat:V8FI (match_operand: 2 "nonimmediate_operand" "vm") @@ -8679,13 +9416,13 @@ (parallel [(const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] "TARGET_AVX512F" - "vinsert64x4\t{$0x0, %2, %1, %0|%0, %1, %2, $0x0}" + "vinsert64x4\t{$0x0, %2, %1, %0|%0, %1, %2, $0x0}" [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "vec_set_hi_" +(define_insn "vec_set_hi_" [(set (match_operand:V8FI 0 "register_operand" "=v") (vec_concat:V8FI (match_operand: 2 "nonimmediate_operand" "vm") @@ -8694,13 +9431,37 @@ (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]))))] "TARGET_AVX512F" - "vinsert64x4\t{$0x1, %2, %1, %0|%0, %1, %2, $0x1}" + "vinsert64x4\t{$0x1, %2, %1, %0|%0, %1, %2, $0x1}" [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "avx512f_shuf_64x2_1" +(define_expand "avx512f_shuf_64x2_mask" + [(match_operand:V8FI 0 "register_operand") + (match_operand:V8FI 1 "register_operand") + (match_operand:V8FI 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_255_operand") + (match_operand:V8FI 4 "register_operand") + (match_operand:QI 5 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[3]); + emit_insn (gen_avx512f_shuf_64x2_1_mask + (operands[0], operands[1], operands[2], + GEN_INT (((mask >> 0) & 3) * 2), + GEN_INT (((mask >> 0) & 3) * 2 + 1), + GEN_INT (((mask >> 2) & 3) * 2), + GEN_INT (((mask >> 2) & 3) * 2 + 1), + GEN_INT (((mask >> 4) & 3) * 2 + 8), + GEN_INT (((mask >> 4) & 3) * 2 + 9), + GEN_INT (((mask >> 6) & 3) * 2 + 8), + GEN_INT (((mask >> 6) & 3) * 2 + 9), + operands[4], operands[5])); + DONE; +}) + +(define_insn "avx512f_shuf_64x2_1" [(set (match_operand:V8FI 0 "register_operand" "=v") (vec_select:V8FI (vec_concat: @@ -8727,14 +9488,46 @@ mask |= (INTVAL (operands[9]) - 8) / 2 << 6; operands[3] = GEN_INT (mask); - return "vshuf64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + return "vshuf64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_shuf_32x4_1" +(define_expand "avx512f_shuf_32x4_mask" + [(match_operand:V16FI 0 "register_operand") + (match_operand:V16FI 1 "register_operand") + (match_operand:V16FI 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_255_operand") + (match_operand:V16FI 4 "register_operand") + (match_operand:HI 5 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[3]); + emit_insn (gen_avx512f_shuf_32x4_1_mask + (operands[0], operands[1], operands[2], + GEN_INT (((mask >> 0) & 3) * 4), + GEN_INT (((mask >> 0) & 3) * 4 + 1), + GEN_INT (((mask >> 0) & 3) * 4 + 2), + GEN_INT (((mask >> 0) & 3) * 4 + 3), + GEN_INT (((mask >> 2) & 3) * 4), + GEN_INT (((mask >> 2) & 3) * 4 + 1), + GEN_INT (((mask >> 2) & 3) * 4 + 2), + GEN_INT (((mask >> 2) & 3) * 4 + 3), + GEN_INT (((mask >> 4) & 3) * 4 + 16), + GEN_INT (((mask >> 4) & 3) * 4 + 17), + GEN_INT (((mask >> 4) & 3) * 4 + 18), + GEN_INT (((mask >> 4) & 3) * 4 + 19), + GEN_INT (((mask >> 6) & 3) * 4 + 16), + GEN_INT (((mask >> 6) & 3) * 4 + 17), + GEN_INT (((mask >> 6) & 3) * 4 + 18), + GEN_INT (((mask >> 6) & 3) * 4 + 19), + operands[4], operands[5])); + DONE; +}) + +(define_insn "avx512f_shuf_32x4_1" [(set (match_operand:V16FI 0 "register_operand" "=v") (vec_select:V16FI (vec_concat: @@ -8777,14 +9570,44 @@ mask |= (INTVAL (operands[15]) - 16) / 4 << 6; operands[3] = GEN_INT (mask); - return "vshuf32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + return "vshuf32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_pshufd_1" +(define_expand "avx512f_pshufdv3_mask" + [(match_operand:V16SI 0 "register_operand") + (match_operand:V16SI 1 "nonimmediate_operand") + (match_operand:SI 2 "const_0_to_255_operand") + (match_operand:V16SI 3 "register_operand") + (match_operand:HI 4 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[2]); + emit_insn (gen_avx512f_pshufd_1_mask (operands[0], operands[1], + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3), + GEN_INT (((mask >> 0) & 3) + 4), + GEN_INT (((mask >> 2) & 3) + 4), + GEN_INT (((mask >> 4) & 3) + 4), + GEN_INT (((mask >> 6) & 3) + 4), + GEN_INT (((mask >> 0) & 3) + 8), + GEN_INT (((mask >> 2) & 3) + 8), + GEN_INT (((mask >> 4) & 3) + 8), + GEN_INT (((mask >> 6) & 3) + 8), + GEN_INT (((mask >> 0) & 3) + 12), + GEN_INT (((mask >> 2) & 3) + 12), + GEN_INT (((mask >> 4) & 3) + 12), + GEN_INT (((mask >> 6) & 3) + 12), + operands[3], operands[4])); + DONE; +}) + +(define_insn "avx512f_pshufd_1" [(set (match_operand:V16SI 0 "register_operand" "=v") (vec_select:V16SI (match_operand:V16SI 1 "nonimmediate_operand" "vm") @@ -8825,7 +9648,7 @@ mask |= INTVAL (operands[5]) << 6; operands[2] = GEN_INT (mask); - return "vpshufd\t{%2, %1, %0|%0, %1, %2}"; + return "vpshufd\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") (set_attr "prefix" "evex") @@ -10276,12 +11099,12 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "abs2" +(define_insn "abs2" [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v") (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))] - "TARGET_SSSE3" - "%vpabs\t{%1, %0|%0, %1}" + "TARGET_SSSE3 && " + "%vpabs\t{%1, %0|%0, %1}" [(set_attr "type" "sselog1") (set_attr "prefix_data16" "1") (set_attr "prefix_extra" "1") @@ -10622,12 +11445,12 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_v16qiv16si2" +(define_insn "avx512f_v16qiv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI (match_operand:V16QI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vpmovbd\t{%1, %0|%0, %q1}" + "vpmovbd\t{%1, %0|%0, %q1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -10662,12 +11485,12 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_v16hiv16si2" +(define_insn "avx512f_v16hiv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI (match_operand:V16HI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vpmovwd\t{%1, %0|%0, %1}" + "vpmovwd\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -10697,7 +11520,7 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_v8qiv8di2" +(define_insn "avx512f_v8qiv8di2" [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (vec_select:V8QI @@ -10707,7 +11530,7 @@ (const_int 4) (const_int 5) (const_int 6) (const_int 7)]))))] "TARGET_AVX512F" - "vpmovbq\t{%1, %0|%0, %k1}" + "vpmovbq\t{%1, %0|%0, %k1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -10739,12 +11562,12 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_v8hiv8di2" +(define_insn "avx512f_v8hiv8di2" [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8HI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vpmovwq\t{%1, %0|%0, %q1}" + "vpmovwq\t{%1, %0|%0, %q1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -10776,12 +11599,12 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "avx512f_v8siv8di2" +(define_insn "avx512f_v8siv8di2" [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI (match_operand:V8SI 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512F" - "vpmovdq\t{%1, %0|%0, %1}" + "vpmovdq\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) @@ -11564,33 +12387,33 @@ (set_attr "prefix" "evex") (set_attr "mode" "XI")]) -(define_insn "*avx512er_exp2" +(define_insn "avx512er_exp2" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_EXP2))] "TARGET_AVX512ER" - "vexp2\t{%1, %0|%0, %1}" + "vexp2\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "*avx512er_rcp28" +(define_insn "avx512er_rcp28" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_RCP28))] "TARGET_AVX512ER" - "vrcp28\t{%1, %0|%0, %1}" + "vrcp28\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512er_rsqrt28" +(define_insn "avx512er_rsqrt28" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] UNSPEC_RSQRT28))] "TARGET_AVX512ER" - "vrsqrt28\t{%1, %0|%0, %1}" + "vrsqrt28\t{%1, %0|%0, %1}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -12640,16 +13463,16 @@ (set_attr "prefix" "vex") (set_attr "mode" "")]) -(define_insn "_permvar" +(define_insn "_permvar" [(set (match_operand:VI48F_256_512 0 "register_operand" "=v") (unspec:VI48F_256_512 [(match_operand:VI48F_256_512 1 "nonimmediate_operand" "vm") (match_operand: 2 "register_operand" "v")] UNSPEC_VPERMVAR))] - "TARGET_AVX2" - "vperm\t{%1, %2, %0|%0, %2, %1}" + "TARGET_AVX2 && " + "vperm\t{%1, %2, %0|%0, %2, %1}" [(set_attr "type" "sselog") - (set_attr "prefix" "vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_expand "_perm" @@ -12660,14 +13483,32 @@ { int mask = INTVAL (operands[2]); emit_insn (gen__perm_1 (operands[0], operands[1], - GEN_INT ((mask >> 0) & 3), - GEN_INT ((mask >> 2) & 3), - GEN_INT ((mask >> 4) & 3), - GEN_INT ((mask >> 6) & 3))); + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3))); + DONE; +}) + +(define_expand "avx512f_perm_mask" + [(match_operand:V8FI 0 "register_operand") + (match_operand:V8FI 1 "nonimmediate_operand") + (match_operand:SI 2 "const_0_to_255_operand") + (match_operand:V8FI 3 "vector_move_operand") + (match_operand: 4 "register_operand")] + "TARGET_AVX512F" +{ + int mask = INTVAL (operands[2]); + emit_insn (gen__perm_1_mask (operands[0], operands[1], + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3), + operands[3], operands[4])); DONE; }) -(define_insn "_perm_1" +(define_insn "_perm_1" [(set (match_operand:VI8F_256_512 0 "register_operand" "=v") (vec_select:VI8F_256_512 (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm") @@ -12675,7 +13516,7 @@ (match_operand 3 "const_0_to_3_operand") (match_operand 4 "const_0_to_3_operand") (match_operand 5 "const_0_to_3_operand")])))] - "TARGET_AVX2" + "TARGET_AVX2 && " { int mask = 0; mask |= INTVAL (operands[2]) << 0; @@ -12683,10 +13524,10 @@ mask |= INTVAL (operands[4]) << 4; mask |= INTVAL (operands[5]) << 6; operands[2] = GEN_INT (mask); - return "vperm\t{%2, %1, %0|%0, %1, %2}"; + return "vperm\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog") - (set_attr "prefix" "vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_insn "avx2_permv2ti" @@ -12733,58 +13574,58 @@ (set_attr "isa" "*,avx2,noavx2") (set_attr "mode" "V8SF")]) -(define_insn "avx512f_vec_dup" +(define_insn "avx512f_vec_dup" [(set (match_operand:VI48F_512 0 "register_operand" "=v") (vec_duplicate:VI48F_512 (vec_select: (match_operand: 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]))))] "TARGET_AVX512F" - "vbroadcast\t{%1, %0|%0, %1}" + "vbroadcast\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_broadcast" +(define_insn "avx512f_broadcast" [(set (match_operand:V16FI 0 "register_operand" "=v,v") (vec_duplicate:V16FI (match_operand: 1 "nonimmediate_operand" "v,m")))] "TARGET_AVX512F" "@ - vshuf32x4\t{$0x0, %g1, %g1, %0|%0, %g1, %g1, 0x0} - vbroadcast32x4\t{%1, %0|%0, %1}" + vshuf32x4\t{$0x0, %g1, %g1, %0|%0, %g1, %g1, 0x0} + vbroadcast32x4\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_broadcast" +(define_insn "avx512f_broadcast" [(set (match_operand:V8FI 0 "register_operand" "=v,v") (vec_duplicate:V8FI (match_operand: 1 "nonimmediate_operand" "v,m")))] "TARGET_AVX512F" "@ - vshuf64x2\t{$0x44, %g1, %g1, %0|%0, %g1, %g1, 0x44} - vbroadcast64x4\t{%1, %0|%0, %1}" + vshuf64x2\t{$0x44, %g1, %g1, %0|%0, %g1, %g1, 0x44} + vbroadcast64x4\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_vec_dup_gpr" +(define_insn "avx512f_vec_dup_gpr" [(set (match_operand:VI48_512 0 "register_operand" "=v") (vec_duplicate:VI48_512 (match_operand: 1 "register_operand" "r")))] "TARGET_AVX512F && (mode != V8DImode || TARGET_64BIT)" - "vpbroadcast\t{%1, %0|%0, %1}" + "vpbroadcast\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_vec_dup_mem" +(define_insn "avx512f_vec_dup_mem" [(set (match_operand:VI48F_512 0 "register_operand" "=x") (vec_duplicate:VI48F_512 (match_operand: 1 "nonimmediate_operand" "xm")))] "TARGET_AVX512F" - "vbroadcast\t{%1, %0|%0, %1}" + "vbroadcast\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -12924,12 +13765,12 @@ elt * GET_MODE_SIZE (mode)); }) -(define_expand "_vpermil" +(define_expand "_vpermil" [(set (match_operand:VF2 0 "register_operand") (vec_select:VF2 (match_operand:VF2 1 "nonimmediate_operand") (match_operand:SI 2 "const_0_to_255_operand")))] - "TARGET_AVX" + "TARGET_AVX && " { int mask = INTVAL (operands[2]); rtx perm[]; @@ -12945,12 +13786,12 @@ = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (, perm)); }) -(define_expand "_vpermil" +(define_expand "_vpermil" [(set (match_operand:VF1 0 "register_operand") (vec_select:VF1 (match_operand:VF1 1 "nonimmediate_operand") (match_operand:SI 2 "const_0_to_255_operand")))] - "TARGET_AVX" + "TARGET_AVX && " { int mask = INTVAL (operands[2]); rtx perm[]; @@ -12968,37 +13809,37 @@ = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (, perm)); }) -(define_insn "*_vpermilp" +(define_insn "*_vpermilp" [(set (match_operand:VF 0 "register_operand" "=v") (vec_select:VF (match_operand:VF 1 "nonimmediate_operand" "vm") (match_parallel 2 "" [(match_operand 3 "const_int_operand")])))] - "TARGET_AVX + "TARGET_AVX && && avx_vpermilp_parallel (operands[2], mode)" { int mask = avx_vpermilp_parallel (operands[2], mode) - 1; operands[2] = GEN_INT (mask); - return "vpermil\t{%2, %1, %0|%0, %1, %2}"; + return "vpermil\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "prefix" "vex") + (set_attr "prefix" "") (set_attr "mode" "")]) -(define_insn "_vpermilvar3" +(define_insn "_vpermilvar3" [(set (match_operand:VF 0 "register_operand" "=v") (unspec:VF [(match_operand:VF 1 "register_operand" "v") (match_operand: 2 "nonimmediate_operand" "vm")] UNSPEC_VPERMIL))] - "TARGET_AVX" - "vpermil\t{%2, %1, %0|%0, %1, %2}" + "TARGET_AVX && " + "vpermil\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") (set_attr "btver2_decode" "vector") - (set_attr "prefix" "vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_insn "avx512f_vpermi2var3" @@ -13014,6 +13855,22 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_vpermi2var3_mask" + [(set (match_operand:VI48F_512 0 "register_operand" "=v") + (vec_merge:VI48F_512 + (unspec:VI48F_512 + [(match_operand:VI48F_512 1 "register_operand" "v") + (match_operand: 2 "register_operand" "0") + (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMI2_MASK) + (match_dup 0) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vpermi2\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx512f_vpermt2var3" [(set (match_operand:VI48F_512 0 "register_operand" "=v") (unspec:VI48F_512 @@ -13027,6 +13884,22 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_vpermt2var3_mask" + [(set (match_operand:VI48F_512 0 "register_operand" "=v") + (vec_merge:VI48F_512 + (unspec:VI48F_512 + [(match_operand: 1 "register_operand" "v") + (match_operand:VI48F_512 2 "register_operand" "0") + (match_operand:VI48F_512 3 "nonimmediate_operand" "vm")] + UNSPEC_VPERMT2) + (match_dup 2) + (match_operand: 4 "register_operand" "k")))] + "TARGET_AVX512F" + "vpermt2\t{%3, %1, %0%{%4%}|%0%{%4%}, %1, %3}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_expand "avx_vperm2f1283" [(set (match_operand:AVX256MODE2P 0 "register_operand") (unspec:AVX256MODE2P @@ -13417,24 +14290,24 @@ DONE; }) -(define_insn "_ashrv" +(define_insn "_ashrv" [(set (match_operand:VI48_AVX512F 0 "register_operand" "=v") (ashiftrt:VI48_AVX512F (match_operand:VI48_AVX512F 1 "register_operand" "v") (match_operand:VI48_AVX512F 2 "nonimmediate_operand" "vm")))] - "TARGET_AVX2" - "vpsrav\t{%2, %1, %0|%0, %1, %2}" + "TARGET_AVX2 && " + "vpsrav\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseishft") (set_attr "prefix" "maybe_evex") (set_attr "mode" "")]) -(define_insn "_v" +(define_insn "_v" [(set (match_operand:VI48_AVX2_48_AVX512F 0 "register_operand" "=v") (any_lshift:VI48_AVX2_48_AVX512F (match_operand:VI48_AVX2_48_AVX512F 1 "register_operand" "v") (match_operand:VI48_AVX2_48_AVX512F 2 "nonimmediate_operand" "vm")))] - "TARGET_AVX2" - "vpv\t{%2, %1, %0|%0, %1, %2}" + "TARGET_AVX2 && " + "vpv\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseishft") (set_attr "prefix" "maybe_evex") (set_attr "mode" "")]) @@ -13517,12 +14390,13 @@ (set_attr "btver2_decode" "double") (set_attr "mode" "V8SF")]) -(define_insn "avx512f_vcvtph2ps512" +(define_insn "avx512f_vcvtph2ps512" [(set (match_operand:V16SF 0 "register_operand" "=v") - (unspec:V16SF [(match_operand:V16HI 1 "nonimmediate_operand" "vm")] - UNSPEC_VCVTPH2PS))] + (unspec:V16SF + [(match_operand:V16HI 1 "nonimmediate_operand" "vm")] + UNSPEC_VCVTPH2PS))] "TARGET_AVX512F" - "vcvtph2ps\t{%1, %0|%0, %1}" + "vcvtph2ps\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -13573,13 +14447,14 @@ (set_attr "btver2_decode" "vector") (set_attr "mode" "V8SF")]) -(define_insn "avx512f_vcvtps2ph512" +(define_insn "avx512f_vcvtps2ph512" [(set (match_operand:V16HI 0 "nonimmediate_operand" "=vm") - (unspec:V16HI [(match_operand:V16SF 1 "register_operand" "v") - (match_operand:SI 2 "const_0_to_255_operand" "N")] - UNSPEC_VCVTPS2PH))] + (unspec:V16HI + [(match_operand:V16SF 1 "register_operand" "v") + (match_operand:SI 2 "const_0_to_255_operand" "N")] + UNSPEC_VCVTPS2PH))] "TARGET_AVX512F" - "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" + "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecvt") (set_attr "prefix" "evex") (set_attr "mode" "V16SF")]) @@ -13969,14 +14844,55 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_getmant" +(define_insn "avx512f_compress_mask" + [(set (match_operand:VI48F_512 0 "register_operand" "=v") + (unspec:VI48F_512 + [(match_operand:VI48F_512 1 "register_operand" "v") + (match_operand:VI48F_512 2 "vector_move_operand" "0C") + (match_operand: 3 "register_operand" "k")] + UNSPEC_COMPRESS))] + "TARGET_AVX512F" + "vcompress\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_compressstore_mask" + [(set (match_operand:VI48F_512 0 "memory_operand" "=m") + (unspec:VI48F_512 + [(match_operand:VI48F_512 1 "register_operand" "x") + (match_dup 0) + (match_operand: 2 "register_operand" "k")] + UNSPEC_COMPRESS_STORE))] + "TARGET_AVX512F" + "vcompress\t{%1, %0%{%2%}|%0%{%2%}, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "memory" "store") + (set_attr "mode" "")]) + +(define_insn "avx512f_expand_mask" + [(set (match_operand:VI48F_512 0 "register_operand" "=v,v") + (unspec:VI48F_512 + [(match_operand:VI48F_512 1 "nonimmediate_operand" "v,m") + (match_operand:VI48F_512 2 "vector_move_operand" "0C,0C") + (match_operand: 3 "register_operand" "k,k")] + UNSPEC_EXPAND))] + "TARGET_AVX512F" + "vexpand\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "memory" "none,load") + (set_attr "mode" "")]) + +(define_insn "avx512f_getmant" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm") (match_operand:SI 2 "const_0_to_15_operand")] UNSPEC_GETMANT))] "TARGET_AVX512F" - "vgetmant\t{%2, %1, %0|%0, %1, %2}"; + "vgetmant\t{%2, %1, %0|%0, %1, %2}"; [(set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -13995,23 +14911,23 @@ [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "clz2" +(define_insn "clz2" [(set (match_operand:VI48_512 0 "register_operand" "=v") (clz:VI48_512 (match_operand:VI48_512 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512CD" - "vplzcnt\t{%1, %0|%0, %1}" + "vplzcnt\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "conflict" +(define_insn "conflict" [(set (match_operand:VI48_512 0 "register_operand" "=v") (unspec:VI48_512 [(match_operand:VI48_512 1 "nonimmediate_operand" "vm")] UNSPEC_CONFLICT))] "TARGET_AVX512CD" - "vpconflict\t{%1, %0|%0, %1}" + "vpconflict\t{%1, %0|%0, %1}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md new file mode 100644 index 0000000..6b45d05 --- /dev/null +++ b/gcc/config/i386/subst.md @@ -0,0 +1,56 @@ +;; GCC machine description for AVX512F instructions +;; Copyright (C) 2013 Free Software Foundation, Inc. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +;; Some iterators for extending subst as much as possible +;; All vectors (Use it for destination) +(define_mode_iterator SUBST_V + [V16QI + V16HI V8HI + V16SI V8SI V4SI + V8DI V4DI V2DI + V16SF V8SF V4SF + V8DF V4DF V2DF]) + +(define_subst_attr "mask_name" "mask" "" "_mask") +(define_subst_attr "mask_applied" "mask" "false" "true") +(define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2") +(define_subst_attr "mask_operand3" "mask" "" "%{%4%}%N3") +(define_subst_attr "mask_operand3_1" "mask" "" "%%{%%4%%}%%N3") ;; for sprintf +(define_subst_attr "mask_operand4" "mask" "" "%{%5%}%N4") +(define_subst_attr "mask_operand6" "mask" "" "%{%7%}%N6") +(define_subst_attr "mask_operand11" "mask" "" "%{%12%}%N11") +(define_subst_attr "mask_operand18" "mask" "" "%{%19%}%N18") +(define_subst_attr "mask_operand19" "mask" "" "%{%20%}%N19") +(define_subst_attr "mask_codefor" "mask" "*" "") +(define_subst_attr "mask_mode512bit_condition" "mask" "1" "(GET_MODE_SIZE (GET_MODE (operands[0])) == 64)") +(define_subst_attr "store_mask_constraint" "mask" "vm" "v") +(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand") +(define_subst_attr "mask_prefix" "mask" "vex" "evex") +(define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex") +(define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex") + +(define_subst "mask" + [(set (match_operand:SUBST_V 0) + (match_operand:SUBST_V 1))] + "TARGET_AVX512F" + [(set (match_dup 0) + (vec_merge:SUBST_V + (match_dup 1) + (match_operand:SUBST_V 2 "vector_move_operand" "0C") + (match_operand: 3 "register_operand" "k")))])