From patchwork Tue Aug 27 18:37:17 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirill Yukhin X-Patchwork-Id: 270193 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "www.sourceware.org", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 7CEEA2C00E0 for ; Wed, 28 Aug 2013 04:37:57 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=Vk6ynNhAUSfGq7bfX 5oq3Zw3B+Yfyf4A3APM83CxxS5p73WATlx6FlI23eMrOFwVDcFV9dDQYSHyIlXNH upMuiBf5cn11eM//9T9XsA9q5ZZnclZ/CBN/tDx1SF+uZ2hXE4iHGSnQQcNL76ra MB8WwCitVf4V7otmctenNIjTDw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=EDVWFWIplvF2WtYTKxrg5DM Ese0=; b=pdjzovEZvd4UaBXXmTYErbU1wWiO4RmXcajVpw+I7oIpGnySdDmLVos hgorq0862AK8/c05wTHTLhaAX7veOOmy6/MVYBrgAA6Lf3OCcCMaEYnumP97hMcz jJH0CnusVGz5JKjz04fnp+Tq0f7vk1H1lCO+jBQ4tyyou5yv7Osw= Received: (qmail 4491 invoked by alias); 27 Aug 2013 18:37:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 4465 invoked by uid 89); 27 Aug 2013 18:37:50 -0000 Received: from mail-pb0-f46.google.com (HELO mail-pb0-f46.google.com) (209.85.160.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 27 Aug 2013 18:37:50 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED, AWL, BAYES_00, FREEMAIL_FROM autolearn=ham version=3.3.2 X-HELO: mail-pb0-f46.google.com Received: by mail-pb0-f46.google.com with SMTP id rq2so5221801pbb.33 for ; Tue, 27 Aug 2013 11:37:46 -0700 (PDT) X-Received: by 10.68.245.133 with SMTP id xo5mr3924144pbc.198.1377628666189; Tue, 27 Aug 2013 11:37:46 -0700 (PDT) Received: from msticlxl57.ims.intel.com ([192.55.54.42]) by mx.google.com with ESMTPSA id os4sm26134122pbb.25.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 27 Aug 2013 11:37:45 -0700 (PDT) Date: Tue, 27 Aug 2013 22:37:17 +0400 From: Kirill Yukhin To: Richard Henderson Cc: Uros Bizjak , Vladimir Makarov , Jakub Jelinek , GCC Patches Subject: Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended. Message-ID: <20130827183717.GB42618@msticlxl57.ims.intel.com> References: <20130808112524.GA40277@msticlxl57.ims.intel.com> <20130814072638.GD52726@msticlxl57.ims.intel.com> <52129604.6040305@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <52129604.6040305@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Problem is that some iterators are depend on each other, so patches are not going to be tiny. Here is 1st one. It extends VF iterator - biggest impact I believe Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. --- Thanks, K PS. If it is - I am going to strip out ChangeLog lines from big patch --- gcc/config/i386/i386.c | 62 +++++++++-- gcc/config/i386/i386.md | 1 + gcc/config/i386/sse.md | 283 +++++++++++++++++++++++++++++++----------------- 3 files changed, 241 insertions(+), 105 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8325919..5f50533 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16538,8 +16538,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1) gcc_unreachable (); case V32QImode: extract = gen_avx_vextractf128v32qi; - load_unaligned = gen_avx_loaddqu256; - store_unaligned = gen_avx_storedqu256; + load_unaligned = gen_avx_loaddquv32qi; + store_unaligned = gen_avx_storedquv32qi; mode = V16QImode; break; case V8SFmode: @@ -16642,10 +16642,56 @@ void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { rtx op0, op1, m; + rtx (*load_unaligned) (rtx, rtx); + rtx (*store_unaligned) (rtx, rtx); op0 = operands[0]; op1 = operands[1]; + if (GET_MODE_SIZE (mode) == 64) + { + switch (GET_MODE_CLASS (mode)) + { + case MODE_VECTOR_INT: + case MODE_INT: + op0 = gen_lowpart (V16SImode, op0); + op1 = gen_lowpart (V16SImode, op1); + /* FALLTHRU */ + + case MODE_VECTOR_FLOAT: + switch (GET_MODE (op0)) + { + default: + gcc_unreachable (); + case V16SImode: + load_unaligned = gen_avx512f_loaddquv16si; + store_unaligned = gen_avx512f_storedquv16si; + break; + case V16SFmode: + load_unaligned = gen_avx512f_loadups512; + store_unaligned = gen_avx512f_storeups512; + break; + case V8DFmode: + load_unaligned = gen_avx512f_loadupd512; + store_unaligned = gen_avx512f_storeupd512; + break; + } + + if (MEM_P (op1)) + emit_insn (load_unaligned (op0, op1)); + else if (MEM_P (op0)) + emit_insn (store_unaligned (op0, op1)); + else + gcc_unreachable (); + break; + + default: + gcc_unreachable (); + } + + return; + } + if (TARGET_AVX && GET_MODE_SIZE (mode) == 32) { @@ -16678,7 +16724,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_loaddqu (op0, op1)); + emit_insn (gen_sse2_loaddquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -16753,7 +16799,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_storedqu (op0, op1)); + emit_insn (gen_sse2_storedquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -27473,13 +27519,13 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_lfence, "__builtin_ia32_lfence", IX86_BUILTIN_LFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_mfence, 0, IX86_BUILTIN_MFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storeupd, "__builtin_ia32_storeupd", IX86_BUILTIN_STOREUPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedqu, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedquv16qi, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2df, "__builtin_ia32_movntpd", IX86_BUILTIN_MOVNTPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2di, "__builtin_ia32_movntdq", IX86_BUILTIN_MOVNTDQ, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntisi, "__builtin_ia32_movnti", IX86_BUILTIN_MOVNTI, UNKNOWN, (int) VOID_FTYPE_PINT_INT }, { OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_movntidi, "__builtin_ia32_movnti64", IX86_BUILTIN_MOVNTI64, UNKNOWN, (int) VOID_FTYPE_PLONGLONG_LONGLONG }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_loadupd, "__builtin_ia32_loadupd", IX86_BUILTIN_LOADUPD, UNKNOWN, (int) V2DF_FTYPE_PCDOUBLE }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_loaddqu, "__builtin_ia32_loaddqu", IX86_BUILTIN_LOADDQU, UNKNOWN, (int) V16QI_FTYPE_PCCHAR }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_loaddquv16qi, "__builtin_ia32_loaddqu", IX86_BUILTIN_LOADDQU, UNKNOWN, (int) V16QI_FTYPE_PCCHAR }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_loadhpd_exp, "__builtin_ia32_loadhpd", IX86_BUILTIN_LOADHPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_PCDOUBLE }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_loadlpd_exp, "__builtin_ia32_loadlpd", IX86_BUILTIN_LOADLPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_PCDOUBLE }, @@ -27508,8 +27554,8 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_AVX, CODE_FOR_avx_loadups256, "__builtin_ia32_loadups256", IX86_BUILTIN_LOADUPS256, UNKNOWN, (int) V8SF_FTYPE_PCFLOAT }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_storeupd256, "__builtin_ia32_storeupd256", IX86_BUILTIN_STOREUPD256, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_storeups256, "__builtin_ia32_storeups256", IX86_BUILTIN_STOREUPS256, UNKNOWN, (int) VOID_FTYPE_PFLOAT_V8SF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_loaddqu256, "__builtin_ia32_loaddqu256", IX86_BUILTIN_LOADDQU256, UNKNOWN, (int) V32QI_FTYPE_PCCHAR }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_storedqu256, "__builtin_ia32_storedqu256", IX86_BUILTIN_STOREDQU256, UNKNOWN, (int) VOID_FTYPE_PCHAR_V32QI }, + { OPTION_MASK_ISA_AVX, CODE_FOR_avx_loaddquv32qi, "__builtin_ia32_loaddqu256", IX86_BUILTIN_LOADDQU256, UNKNOWN, (int) V32QI_FTYPE_PCCHAR }, + { OPTION_MASK_ISA_AVX, CODE_FOR_avx_storedquv32qi, "__builtin_ia32_storedqu256", IX86_BUILTIN_STOREDQU256, UNKNOWN, (int) VOID_FTYPE_PCHAR_V32QI }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_lddqu256, "__builtin_ia32_lddqu256", IX86_BUILTIN_LDDQU256, UNKNOWN, (int) V32QI_FTYPE_PCCHAR }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_movntv4di, "__builtin_ia32_movntdq256", IX86_BUILTIN_MOVNTDQ256, UNKNOWN, (int) VOID_FTYPE_PV4DI_V4DI }, diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 01c85d8..e458932 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -778,6 +778,7 @@ ;; Mapping of logic operators (define_code_iterator any_logic [and ior xor]) (define_code_iterator any_or [ior xor]) +(define_code_iterator fpint_logic [and xor]) ;; Base name for insn mnemonic. (define_code_attr logic [(and "and") (ior "or") (xor "xor")]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9d9469e..10637cc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -97,13 +97,13 @@ ;; All vector modes including V?TImode, used in move patterns. (define_mode_iterator VMOVE - [(V32QI "TARGET_AVX") V16QI - (V16HI "TARGET_AVX") V8HI - (V8SI "TARGET_AVX") V4SI - (V4DI "TARGET_AVX") V2DI + [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI + (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI (V2TI "TARGET_AVX") V1TI - (V8SF "TARGET_AVX") V4SF - (V4DF "TARGET_AVX") V2DF]) + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) ;; All vector modes (define_mode_iterator V @@ -124,6 +124,11 @@ ;; All vector float modes (define_mode_iterator VF + [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + +;; 128- and 256-bit float vector modes +(define_mode_iterator VF_128_256 [(V8SF "TARGET_AVX") V4SF (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) @@ -143,6 +148,10 @@ (define_mode_iterator VF_256 [V8SF V4DF]) +;; All 512bit vector float modes +(define_mode_iterator VF_512 + [V16SF V8DF]) + ;; All vector integer modes (define_mode_iterator VI [(V32QI "TARGET_AVX") V16QI @@ -160,6 +169,10 @@ (define_mode_iterator VI1 [(V32QI "TARGET_AVX") V16QI]) +(define_mode_iterator VI_UNALIGNED_LOADSTORE + [(V32QI "TARGET_AVX") V16QI + (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")]) + ;; All DImode vector integer modes (define_mode_iterator VI8 [(V4DI "TARGET_AVX") V2DI]) @@ -212,11 +225,18 @@ (V4SI "TARGET_AVX2") (V2DI "TARGET_AVX2") (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")]) +(define_mode_attr sse2_avx_avx512f + [(V16QI "sse2") (V32QI "avx") (V64QI "avx512f") + (V4SI "sse2") (V8SI "avx") (V16SI "avx512f") + (V8DI "avx512f") + (V16SF "avx512f") (V8SF "avx") (V4SF "avx") + (V8DF "avx512f") (V4DF "avx") (V2DF "avx")]) + (define_mode_attr sse2_avx2 [(V16QI "sse2") (V32QI "avx2") (V8HI "sse2") (V16HI "avx2") - (V4SI "sse2") (V8SI "avx2") - (V2DI "sse2") (V4DI "avx2") + (V4SI "sse2") (V8SI "avx2") (V16SI "avx512f") + (V2DI "sse2") (V4DI "avx2") (V8DI "avx512f") (V1TI "sse2") (V2TI "avx2")]) (define_mode_attr ssse3_avx2 @@ -229,7 +249,7 @@ (define_mode_attr sse4_1_avx2 [(V16QI "sse4_1") (V32QI "avx2") (V8HI "sse4_1") (V16HI "avx2") - (V4SI "sse4_1") (V8SI "avx2") + (V4SI "sse4_1") (V8SI "avx2") (V16SI "avx512f") (V2DI "sse4_1") (V4DI "avx2")]) (define_mode_attr avx_avx2 @@ -244,6 +264,12 @@ (V4SI "vec") (V8SI "avx2") (V2DI "vec") (V4DI "avx2")]) +(define_mode_attr avx2_avx512f + [(V4SI "avx2") (V8SI "avx2") (V16SI "avx512f") + (V2DI "avx2") (V4DI "avx2") (V8DI "avx512f") + (V8SF "avx2") (V16SF "avx512f") + (V4DF "avx2") (V8DF "avx512f")]) + (define_mode_attr shuffletype [(V16SF "f") (V16SI "i") (V8DF "f") (V8DI "i") (V8SF "f") (V8SI "i") (V4DF "f") (V4DI "i") @@ -287,22 +313,26 @@ (define_mode_attr sse [(SF "sse") (DF "sse2") (V4SF "sse") (V2DF "sse2") - (V8SF "avx") (V4DF "avx")]) + (V16SF "avx512f") (V8SF "avx") + (V8DF "avx512f") (V4DF "avx")]) (define_mode_attr sse2 - [(V16QI "sse2") (V32QI "avx") - (V2DI "sse2") (V4DI "avx")]) + [(V16QI "sse2") (V32QI "avx") (V64QI "avx512f") + (V2DI "sse2") (V4DI "avx") (V8DI "avx512f")]) (define_mode_attr sse3 [(V16QI "sse3") (V32QI "avx")]) (define_mode_attr sse4_1 [(V4SF "sse4_1") (V2DF "sse4_1") - (V8SF "avx") (V4DF "avx")]) + (V8SF "avx") (V4DF "avx") + (V8DF "avx512f")]) (define_mode_attr avxsizesuffix - [(V32QI "256") (V16HI "256") (V8SI "256") (V4DI "256") + [(V64QI "512") (V32HI "512") (V16SI "512") (V8DI "512") + (V32QI "256") (V16HI "256") (V8SI "256") (V4DI "256") (V16QI "") (V8HI "") (V4SI "") (V2DI "") + (V16SF "512") (V8DF "512") (V8SF "256") (V4DF "256") (V4SF "") (V2DF "")]) @@ -318,11 +348,13 @@ ;; Mapping of vector float modes to an integer mode of the same size (define_mode_attr sseintvecmode - [(V8SF "V8SI") (V4DF "V4DI") - (V4SF "V4SI") (V2DF "V2DI") - (V8SI "V8SI") (V4DI "V4DI") - (V4SI "V4SI") (V2DI "V2DI") - (V16HI "V16HI") (V8HI "V8HI") + [(V16SF "V16SI") (V8DF "V8DI") + (V8SF "V8SI") (V4DF "V4DI") + (V4SF "V4SI") (V2DF "V2DI") + (V16SI "V16SI") (V8DI "V8DI") + (V8SI "V8SI") (V4DI "V4DI") + (V4SI "V4SI") (V2DI "V2DI") + (V16HI "V16HI") (V8HI "V8HI") (V32QI "V32QI") (V16QI "V16QI")]) (define_mode_attr sseintvecmodelower @@ -349,8 +381,10 @@ ;; Mapping of vector modes ti packed single mode of the same size (define_mode_attr ssePSmode - [(V32QI "V8SF") (V16QI "V4SF") - (V16HI "V8SF") (V8HI "V4SF") + [(V16SI "V16SF") (V8DF "V16SF") + (V16SF "V16SF") (V8DI "V16SF") + (V64QI "V16SF") (V32QI "V8SF") (V16QI "V4SF") + (V32HI "V16SF") (V16HI "V8SF") (V8HI "V4SF") (V8SI "V8SF") (V4SI "V4SF") (V4DI "V8SF") (V2DI "V4SF") (V2TI "V8SF") (V1TI "V4SF") @@ -665,12 +699,13 @@ (define_insn "_loadu" [(set (match_operand:VF 0 "register_operand" "=v") (unspec:VF - [(match_operand:VF 1 "memory_operand" "m")] + [(match_operand:VF 1 "nonimmediate_operand" "vm")] UNSPEC_LOADU))] "TARGET_SSE" { switch (get_attr_mode (insn)) { + case MODE_V16SF: case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; @@ -694,12 +729,13 @@ (define_insn "_storeu" [(set (match_operand:VF 0 "memory_operand" "=m") (unspec:VF - [(match_operand:VF 1 "register_operand" "x")] + [(match_operand:VF 1 "register_operand" "v")] UNSPEC_STOREU))] "TARGET_SSE" { switch (get_attr_mode (insn)) { + case MODE_V16SF: case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; @@ -721,10 +757,11 @@ ] (const_string "")))]) -(define_insn "_loaddqu" - [(set (match_operand:VI1 0 "register_operand" "=v") - (unspec:VI1 [(match_operand:VI1 1 "memory_operand" "m")] - UNSPEC_LOADU))] +(define_insn "_loaddqu" + [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v") + (unspec:VI_UNALIGNED_LOADSTORE + [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")] + UNSPEC_LOADU))] "TARGET_SSE2" { switch (get_attr_mode (insn)) @@ -732,6 +769,11 @@ case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; + case MODE_XI: + if (mode == V8DImode) + return "vmovdqu64\t{%1, %0|%0, %1}"; + else + return "vmovdqu32\t{%1, %0|%0, %1}"; default: return "%vmovdqu\t{%1, %0|%0, %1}"; } @@ -754,10 +796,11 @@ ] (const_string "")))]) -(define_insn "_storedqu" - [(set (match_operand:VI1 0 "memory_operand" "=m") - (unspec:VI1 [(match_operand:VI1 1 "register_operand" "v")] - UNSPEC_STOREU))] +(define_insn "_storedqu" + [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "memory_operand" "=m") + (unspec:VI_UNALIGNED_LOADSTORE + [(match_operand:VI_UNALIGNED_LOADSTORE 1 "register_operand" "v")] + UNSPEC_STOREU))] "TARGET_SSE2" { switch (get_attr_mode (insn)) @@ -765,6 +808,11 @@ case MODE_V8SF: case MODE_V4SF: return "%vmovups\t{%1, %0|%0, %1}"; + case MODE_XI: + if (mode == V8DImode) + return "vmovdqu64\t{%1, %0|%0, %1}"; + else + return "vmovdqu32\t{%1, %0|%0, %1}"; default: return "%vmovdqu\t{%1, %0|%0, %1}"; } @@ -821,8 +869,9 @@ (define_insn "_movnt" [(set (match_operand:VF 0 "memory_operand" "=m") - (unspec:VF [(match_operand:VF 1 "register_operand" "x")] - UNSPEC_MOVNT))] + (unspec:VF + [(match_operand:VF 1 "register_operand" "v")] + UNSPEC_MOVNT))] "TARGET_SSE" "%vmovnt\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") @@ -852,9 +901,9 @@ (define_mode_iterator STORENT_MODE [(DI "TARGET_SSE2 && TARGET_64BIT") (SI "TARGET_SSE2") (SF "TARGET_SSE4A") (DF "TARGET_SSE4A") - (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2") - (V8SF "TARGET_AVX") V4SF - (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") (V2DI "TARGET_SSE2") + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) (define_expand "storent" [(set (match_operand:STORENT_MODE 0 "memory_operand") @@ -877,10 +926,10 @@ "ix86_expand_fp_absneg_operator (, mode, operands); DONE;") (define_insn_and_split "*absneg2" - [(set (match_operand:VF 0 "register_operand" "=x,x,x,x") + [(set (match_operand:VF 0 "register_operand" "=x,x,v,v") (match_operator:VF 3 "absneg_operator" - [(match_operand:VF 1 "nonimmediate_operand" "0, xm,x, m")])) - (use (match_operand:VF 2 "nonimmediate_operand" "xm,0, xm,x"))] + [(match_operand:VF 1 "nonimmediate_operand" "0, xm, v, m")])) + (use (match_operand:VF 2 "nonimmediate_operand" "xm, 0, vm,v"))] "TARGET_SSE" "#" "&& reload_completed" @@ -962,10 +1011,10 @@ "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") (define_insn "*mul3" - [(set (match_operand:VF 0 "register_operand" "=x,x") + [(set (match_operand:VF 0 "register_operand" "=x,v") (mult:VF - (match_operand:VF 1 "nonimmediate_operand" "%0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")))] + (match_operand:VF 1 "nonimmediate_operand" "%0,v") + (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands)" "@ mul\t{%2, %0|%0, %2} @@ -1239,10 +1288,10 @@ ;; presence of -0.0 and NaN. (define_insn "*ieee_smin3" - [(set (match_operand:VF 0 "register_operand" "=x,x") + [(set (match_operand:VF 0 "register_operand" "=v,v") (unspec:VF - [(match_operand:VF 1 "register_operand" "0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")] + [(match_operand:VF 1 "register_operand" "0,v") + (match_operand:VF 2 "nonimmediate_operand" "vm,vm")] UNSPEC_IEEE_MIN))] "TARGET_SSE" "@ @@ -1254,10 +1303,10 @@ (set_attr "mode" "")]) (define_insn "*ieee_smax3" - [(set (match_operand:VF 0 "register_operand" "=x,x") + [(set (match_operand:VF 0 "register_operand" "=v,v") (unspec:VF - [(match_operand:VF 1 "register_operand" "0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")] + [(match_operand:VF 1 "register_operand" "0,v") + (match_operand:VF 2 "nonimmediate_operand" "vm,vm")] UNSPEC_IEEE_MAX))] "TARGET_SSE" "@ @@ -1632,10 +1681,10 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define_insn "avx_cmp3" - [(set (match_operand:VF 0 "register_operand" "=x") - (unspec:VF - [(match_operand:VF 1 "register_operand" "x") - (match_operand:VF 2 "nonimmediate_operand" "xm") + [(set (match_operand:VF_128_256 0 "register_operand" "=x") + (unspec:VF_128_256 + [(match_operand:VF_128_256 1 "register_operand" "x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm") (match_operand:SI 3 "const_0_to_31_operand" "n")] UNSPEC_PCMP))] "TARGET_AVX" @@ -1663,10 +1712,10 @@ (set_attr "mode" "")]) (define_insn "*_maskcmp3_comm" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (match_operator:VF 3 "sse_comparison_operator" - [(match_operand:VF 1 "register_operand" "%0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")]))] + [(set (match_operand:VF_128_256 0 "register_operand" "=x,x") + (match_operator:VF_128_256 3 "sse_comparison_operator" + [(match_operand:VF_128_256 1 "register_operand" "%0,x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")]))] "TARGET_SSE && GET_RTX_CLASS (GET_CODE (operands[3])) == RTX_COMM_COMPARE" "@ @@ -1679,10 +1728,10 @@ (set_attr "mode" "")]) (define_insn "_maskcmp3" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (match_operator:VF 3 "sse_comparison_operator" - [(match_operand:VF 1 "register_operand" "0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")]))] + [(set (match_operand:VF_128_256 0 "register_operand" "=x,x") + (match_operator:VF_128_256 3 "sse_comparison_operator" + [(match_operand:VF_128_256 1 "register_operand" "0,x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm")]))] "TARGET_SSE" "@ cmp%D3\t{%2, %0|%0, %2} @@ -1792,11 +1841,11 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define_insn "_andnot3" - [(set (match_operand:VF 0 "register_operand" "=x,x") + [(set (match_operand:VF 0 "register_operand" "=x,v") (and:VF (not:VF - (match_operand:VF 1 "register_operand" "0,x")) - (match_operand:VF 2 "nonimmediate_operand" "xm,xm")))] + (match_operand:VF 1 "register_operand" "0,v")) + (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))] "TARGET_SSE" { static char buf[32]; @@ -1825,12 +1874,19 @@ gcc_unreachable (); } + /* There is no vandnp[sd]. Use vpandnq. */ + if (GET_MODE_SIZE (mode) == 64) + { + suffix = "q"; + ops = "vpandn%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + } + snprintf (buf, sizeof (buf), ops, suffix); return buf; } [(set_attr "isa" "noavx,avx") (set_attr "type" "sselog") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "orig,maybe_evex") (set (attr "mode") (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "") @@ -1842,13 +1898,21 @@ (const_string "")))]) (define_expand "3" - [(set (match_operand:VF 0 "register_operand") - (any_logic:VF - (match_operand:VF 1 "nonimmediate_operand") - (match_operand:VF 2 "nonimmediate_operand")))] + [(set (match_operand:VF_128_256 0 "register_operand") + (any_logic:VF_128_256 + (match_operand:VF_128_256 1 "nonimmediate_operand") + (match_operand:VF_128_256 2 "nonimmediate_operand")))] "TARGET_SSE" "ix86_fixup_binary_operands_no_copy (, mode, operands);") +(define_expand "3" + [(set (match_operand:VF_512 0 "register_operand") + (fpint_logic:VF_512 + (match_operand:VF_512 1 "nonimmediate_operand") + (match_operand:VF_512 2 "nonimmediate_operand")))] + "TARGET_AVX512F" + "ix86_fixup_binary_operands_no_copy (, mode, operands);") + (define_insn "*3" [(set (match_operand:VF 0 "register_operand" "=x,v") (any_logic:VF @@ -1882,12 +1946,19 @@ gcc_unreachable (); } + /* There is no vp[sd]. Use vpq. */ + if (GET_MODE_SIZE (mode) == 64) + { + suffix = "q"; + ops = "vp%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + } + snprintf (buf, sizeof (buf), ops, suffix); return buf; } [(set_attr "isa" "noavx,avx") (set_attr "type" "sselog") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "orig,maybe_evex") (set (attr "mode") (cond [(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL") (const_string "") @@ -2105,6 +2176,23 @@ ] (const_string "TI")))]) +;; There are no floating point xor for V16SF and V8DF in avx512f +;; but we need them for negation. Instead we use int versions of +;; xor. Maybe there could be a better way to do that. + +(define_mode_attr avx512flogicsuff + [(V16SF "d") (V8DF "q")]) + +(define_insn "avx512f_" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (fpint_logic:VF_512 + (match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm")))] + "TARGET_AVX512F" + "vp\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex")]) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; FMA floating point multiply/accumulate instructions. These include @@ -7747,7 +7835,7 @@ (define_insn "_movmsk" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI - [(match_operand:VF 1 "register_operand" "x")] + [(match_operand:VF_128_256 1 "register_operand" "x")] UNSPEC_MOVMSK))] "TARGET_SSE" "%vmovmsk\t{%1, %0|%0, %1}" @@ -8537,10 +8625,10 @@ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (define_insn "_blend" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (vec_merge:VF - (match_operand:VF 2 "nonimmediate_operand" "xm,xm") - (match_operand:VF 1 "register_operand" "0,x") + [(set (match_operand:VF_128_256 0 "register_operand" "=x,x") + (vec_merge:VF_128_256 + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm") + (match_operand:VF_128_256 1 "register_operand" "0,x") (match_operand:SI 3 "const_0_to__operand")))] "TARGET_SSE4_1" "@ @@ -8555,11 +8643,11 @@ (set_attr "mode" "")]) (define_insn "_blendv" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (unspec:VF - [(match_operand:VF 1 "register_operand" "0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm") - (match_operand:VF 3 "register_operand" "Yz,x")] + [(set (match_operand:VF_128_256 0 "register_operand" "=x,x") + (unspec:VF_128_256 + [(match_operand:VF_128_256 1 "register_operand" "0,x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm") + (match_operand:VF_128_256 3 "register_operand" "Yz,x")] UNSPEC_BLENDV))] "TARGET_SSE4_1" "@ @@ -8575,10 +8663,10 @@ (set_attr "mode" "")]) (define_insn "_dp" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "%0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm,xm") + [(set (match_operand:VF_128_256 0 "register_operand" "=x,x") + (unspec:VF_128_256 + [(match_operand:VF_128_256 1 "nonimmediate_operand" "%0,x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "xm,xm") (match_operand:SI 3 "const_0_to_255_operand" "n,n")] UNSPEC_DP))] "TARGET_SSE4_1" @@ -8909,8 +8997,8 @@ ;; setting FLAGS_REG. But it is not a really compare instruction. (define_insn "avx_vtest" [(set (reg:CC FLAGS_REG) - (unspec:CC [(match_operand:VF 0 "register_operand" "x") - (match_operand:VF 1 "nonimmediate_operand" "xm")] + (unspec:CC [(match_operand:VF_128_256 0 "register_operand" "x") + (match_operand:VF_128_256 1 "nonimmediate_operand" "xm")] UNSPEC_VTESTP))] "TARGET_AVX" "vtest\t{%1, %0|%0, %1}" @@ -8947,9 +9035,9 @@ (set_attr "mode" "TI")]) (define_insn "_round" - [(set (match_operand:VF 0 "register_operand" "=x") - (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "xm") + [(set (match_operand:VF_128_256 0 "register_operand" "=x") + (unspec:VF_128_256 + [(match_operand:VF_128_256 1 "nonimmediate_operand" "xm") (match_operand:SI 2 "const_0_to_15_operand" "n")] UNSPEC_ROUND))] "TARGET_ROUND" @@ -10341,10 +10429,10 @@ (set_attr "mode" "TI")]) (define_insn "xop_vpermil23" - [(set (match_operand:VF 0 "register_operand" "=x") - (unspec:VF - [(match_operand:VF 1 "register_operand" "x") - (match_operand:VF 2 "nonimmediate_operand" "%x") + [(set (match_operand:VF_128_256 0 "register_operand" "=x") + (unspec:VF_128_256 + [(match_operand:VF_128_256 1 "register_operand" "x") + (match_operand:VF_128_256 2 "nonimmediate_operand" "%x") (match_operand: 3 "nonimmediate_operand" "xm") (match_operand:SI 4 "const_0_to_3_operand" "n")] UNSPEC_VPERMIL2))] @@ -10794,7 +10882,7 @@ = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (, perm)); }) -(define_insn "*avx_vpermilp" +(define_insn "*_vpermilp" [(set (match_operand:VF 0 "register_operand" "=v") (vec_select:VF (match_operand:VF 1 "nonimmediate_operand" "vm") @@ -10811,9 +10899,9 @@ (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "vex") - (set_attr "mode" "")]) + (set_attr "mode" "")]) -(define_insn "avx_vpermilvar3" +(define_insn "_vpermilvar3" [(set (match_operand:VF 0 "register_operand" "=v") (unspec:VF [(match_operand:VF 1 "register_operand" "v") @@ -10823,9 +10911,10 @@ "vpermil\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog") (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") (set_attr "btver2_decode" "vector") - (set_attr "mode" "")]) + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + (define_expand "avx_vperm2f1283" [(set (match_operand:AVX256MODE2P 0 "register_operand")