From patchwork Fri Oct 7 18:23:51 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 118361 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id CE44CB70CA for ; Sat, 8 Oct 2011 05:24:17 +1100 (EST) Received: (qmail 21669 invoked by alias); 7 Oct 2011 18:24:14 -0000 Received: (qmail 21656 invoked by uid 22791); 7 Oct 2011 18:24:13 -0000 X-SWARE-Spam-Status: No, hits=-6.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_AV, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 07 Oct 2011 18:23:52 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p97INqhS015020 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 7 Oct 2011 14:23:52 -0400 Received: from anchor.twiddle.net (vpn-236-121.phx2.redhat.com [10.3.236.121]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p97INpZx015037; Fri, 7 Oct 2011 14:23:52 -0400 Message-ID: <4E8F43B7.9090806@redhat.com> Date: Fri, 07 Oct 2011 11:23:51 -0700 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110927 Thunderbird/7.0 MIME-Version: 1.0 To: Uros Bizjak CC: hjl.tools@gmail.com, GCC Patches Subject: Fix avx2 incorrect representations of shifts References: <4E8CA8AA.20901@redhat.com> In-Reply-To: X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 10/05/2011 12:07 PM, Uros Bizjak wrote: > We already have V2TImode, but hidden in VIMAX_AVX2 mode iterator. > Based on that, I would suggest that we model correct insn > functionality and try to avoid unspec. On the related note, there is > no move insn for V2TImode, so V2TI should be added to V16 mode > iterator and a couple of other places (please grep for V1TImode, used > for SSE full-register shift insns only). Ah, so we do. And, interestingly, we already had a pattern for the shifts using that VIMAX_AVX2 iterator. At the same time I found that palignr was using the wrong mode, so I fixed that too. Tested --with-cpu=core-avx2 on the intel sde. Committed. r~ + * config/i386/i386.c (bdesc_args): Update code for + __builtin_ia32_palignr256. Change type of __builtin_ia32_pslldqi256, + and __builtin_ia32_psrldqi256 to V4DI_FTYPE_V4DI_INT_CONVERT. + (ix86_expand_args_builtin): Handle V4DI_FTYPE_V4DI_INT_CONVERT. + * config/i386/sse.md (mode iterator V16): Add V2TI. + (mode iterator SSESCALARMODE): Use V2TI not V4DI. + (mode attr ssse3_avx2): Add V2TI. + (avx2_lshrqv4di3, avx2_lshlqv4di3): Remove. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 85dccf9..9611f1f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -26107,7 +26107,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_ssaddv16hi3, "__builtin_ia32_paddsw256", IX86_BUILTIN_PADDSW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_usaddv32qi3, "__builtin_ia32_paddusb256", IX86_BUILTIN_PADDUSB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_usaddv16hi3, "__builtin_ia32_paddusw256", IX86_BUILTIN_PADDUSW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI }, - { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_palignrv4di, "__builtin_ia32_palignr256", IX86_BUILTIN_PALIGNR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_CONVERT }, + { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_palignrv2ti, "__builtin_ia32_palignr256", IX86_BUILTIN_PALIGNR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT_CONVERT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_andv4di3, "__builtin_ia32_andsi256", IX86_BUILTIN_AND256I, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_andnotv4di3, "__builtin_ia32_andnotsi256", IX86_BUILTIN_ANDNOT256I, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_uavgv32qi3, "__builtin_ia32_pavgb256", IX86_BUILTIN_PAVGB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI }, @@ -26171,7 +26171,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_psignv32qi3, "__builtin_ia32_psignb256", IX86_BUILTIN_PSIGNB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_psignv16hi3, "__builtin_ia32_psignw256", IX86_BUILTIN_PSIGNW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_psignv8si3 , "__builtin_ia32_psignd256", IX86_BUILTIN_PSIGND256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI }, - { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshlqv4di3, "__builtin_ia32_pslldqi256", IX86_BUILTIN_PSLLDQI256, UNKNOWN, (int) V4DI_FTYPE_V4DI_INT }, + { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_ashlv2ti3, "__builtin_ia32_pslldqi256", IX86_BUILTIN_PSLLDQI256, UNKNOWN, (int) V4DI_FTYPE_V4DI_INT_CONVERT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshlv16hi3, "__builtin_ia32_psllwi256", IX86_BUILTIN_PSLLWI256 , UNKNOWN, (int) V16HI_FTYPE_V16HI_SI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshlv16hi3, "__builtin_ia32_psllw256", IX86_BUILTIN_PSLLW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V8HI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshlv8si3, "__builtin_ia32_pslldi256", IX86_BUILTIN_PSLLDI256, UNKNOWN, (int) V8SI_FTYPE_V8SI_SI_COUNT }, @@ -26182,7 +26182,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX2, CODE_FOR_ashrv16hi3, "__builtin_ia32_psraw256", IX86_BUILTIN_PSRAW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V8HI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_ashrv8si3, "__builtin_ia32_psradi256", IX86_BUILTIN_PSRADI256, UNKNOWN, (int) V8SI_FTYPE_V8SI_SI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_ashrv8si3, "__builtin_ia32_psrad256", IX86_BUILTIN_PSRAD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V4SI_COUNT }, - { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshrqv4di3, "__builtin_ia32_psrldqi256", IX86_BUILTIN_PSRLDQI256, UNKNOWN, (int) V4DI_FTYPE_V4DI_INT }, + { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_lshrv2ti3, "__builtin_ia32_psrldqi256", IX86_BUILTIN_PSRLDQI256, UNKNOWN, (int) V4DI_FTYPE_V4DI_INT_CONVERT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_lshrv16hi3, "__builtin_ia32_psrlwi256", IX86_BUILTIN_PSRLWI256 , UNKNOWN, (int) V16HI_FTYPE_V16HI_SI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_lshrv16hi3, "__builtin_ia32_psrlw256", IX86_BUILTIN_PSRLW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V8HI_COUNT }, { OPTION_MASK_ISA_AVX2, CODE_FOR_lshrv8si3, "__builtin_ia32_psrldi256", IX86_BUILTIN_PSRLDI256, UNKNOWN, (int) V8SI_FTYPE_V8SI_SI_COUNT }, @@ -27812,6 +27812,11 @@ ix86_expand_args_builtin (const struct builtin_description *d, rmode = V1TImode; nargs_constant = 1; break; + case V4DI_FTYPE_V4DI_INT_CONVERT: + nargs = 2; + rmode = V2TImode; + nargs_constant = 1; + break; case V8HI_FTYPE_V8HI_INT: case V8HI_FTYPE_V8SF_INT: case V8HI_FTYPE_V4SF_INT: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bf1d448..a7df221 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -18,13 +18,13 @@ ;; along with GCC; see the file COPYING3. If not see ;; . -;; All vector modes including V1TImode, used in move patterns. +;; All vector modes including V?TImode, used in move patterns. (define_mode_iterator V16 [(V32QI "TARGET_AVX") V16QI (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI - V1TI + (V2TI "TARGET_AVX") V1TI (V8SF "TARGET_AVX") V4SF (V4DF "TARGET_AVX") V2DF]) @@ -99,11 +99,13 @@ (define_mode_iterator VI8_AVX2 [(V4DI "TARGET_AVX2") V2DI]) +;; ??? We should probably use TImode instead. (define_mode_iterator VIMAX_AVX2 [(V2TI "TARGET_AVX2") V1TI]) +;; ??? This should probably be dropped in favor of VIMAX_AVX2. (define_mode_iterator SSESCALARMODE - [(V4DI "TARGET_AVX2") TI]) + [(V2TI "TARGET_AVX2") TI]) (define_mode_iterator VI12_AVX2 [(V32QI "TARGET_AVX2") V16QI @@ -147,7 +149,7 @@ (V8HI "ssse3") (V16HI "avx2") (V4SI "ssse3") (V8SI "avx2") (V2DI "ssse3") (V4DI "avx2") - (TI "ssse3")]) + (TI "ssse3") (V2TI "avx2")]) (define_mode_attr sse4_1_avx2 [(V16QI "sse4_1") (V32QI "avx2") @@ -5649,21 +5651,6 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "avx2_lshrqv4di3" - [(set (match_operand:V4DI 0 "register_operand" "=x") - (lshiftrt:V4DI - (match_operand:V4DI 1 "register_operand" "x") - (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))] - "TARGET_AVX2" -{ - operands[2] = GEN_INT (INTVAL (operands[2]) / 8); - return "vpsrldq\t{%2, %1, %0|%0, %1, %2}"; -} - [(set_attr "type" "sseishft") - (set_attr "prefix" "vex") - (set_attr "length_immediate" "1") - (set_attr "mode" "OI")]) - (define_insn "lshr3" [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x") (lshiftrt:VI248_AVX2 @@ -5683,20 +5670,6 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "avx2_lshlqv4di3" - [(set (match_operand:V4DI 0 "register_operand" "=x") - (ashift:V4DI (match_operand:V4DI 1 "register_operand" "x") - (match_operand:SI 2 "const_0_to_255_mul_8_operand" "n")))] - "TARGET_AVX2" -{ - operands[2] = GEN_INT (INTVAL (operands[2]) / 8); - return "vpslldq\t{%2, %1, %0|%0, %1, %2}"; -} - [(set_attr "type" "sseishft") - (set_attr "prefix" "vex") - (set_attr "length_immediate" "1") - (set_attr "mode" "OI")]) - (define_insn "avx2_lshl3" [(set (match_operand:VI248_256 0 "register_operand" "=x") (ashift:VI248_256