From patchwork Wed Dec 5 14:22:04 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marc Glisse X-Patchwork-Id: 203880 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 53FCD2C0087 for ; Thu, 6 Dec 2012 01:22:42 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1355322164; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence: List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=LpGUBMOJr0OErCsXU3Nn2v3cRLs=; b=MjEYdrzerkEDv5v EtyADjW38vSWIit3ZuS4M5ZwnXiSRRnfk8NHU3UjZw7r/2LpGLMuqpAiEpC1S7So 5Wpt4XucrvVsY0s7PzXZ4FZlTBI35dDV5WTRTOvG+PCqvvR6S1VQhv/FT2a1/9Sj zNv7cK7/UK1Gs7DhBLSDgf0ovcfs= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=FudE+gGXkp9+taKsJysAeSHN65xt8V/qVHlYK9RaFntFWlU3tVimNcs/MpL+Dx r+eEhu2dWpcpIMmG+RoBv0wdDSeqC0csYd0ss9IHzLbsJI9sGnZWG6gX9zyiCYcv xt2coXwgQxXrRdDDd+7XB8xZMiUSv8iZOFHlenh4fG/8g=; Received: (qmail 14003 invoked by alias); 5 Dec 2012 14:22:36 -0000 Received: (qmail 13989 invoked by uid 22791); 5 Dec 2012 14:22:33 -0000 X-SWARE-Spam-Status: No, hits=-8.0 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, KHOP_THREADED, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail1-relais-roc.national.inria.fr (HELO mail1-relais-roc.national.inria.fr) (192.134.164.82) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 05 Dec 2012 14:22:06 +0000 Received: from stedding.saclay.inria.fr ([193.55.250.194]) by mail1-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 05 Dec 2012 15:22:04 +0100 Received: from glisse (helo=localhost) by stedding.saclay.inria.fr with local-esmtp (Exim 4.80) (envelope-from ) id 1TgFrQ-0006Q9-HT; Wed, 05 Dec 2012 15:22:04 +0100 Date: Wed, 5 Dec 2012 15:22:04 +0100 (CET) From: Marc Glisse To: ebotcazou@adacore.com cc: gcc-patches@gcc.gnu.org, Uros Bizjak , "H.J. Lu" Subject: Re: [i386] scalar ops that preserve the high part of a vector In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org >> 2012-12-04 Marc Glisse >> >> PR target/54855 >> gcc/ >> * simplify-rtx.c (simplify_binary_operation_1) : Replace >> with VEC_MERGE. >> >> * config/i386/sse.md (_vm3): Rewrite >> pattern. >> * config/i386/i386-builtin-types.def: New function types. >> * config/i386/i386.c (ix86_expand_args_builtin): Likewise. >> (bdesc_args) <__builtin_ia32_addss, __builtin_ia32_subss, >> __builtin_ia32_addsd, __builtin_ia32_subsd>: Change prototype. >> * config/i386/xmmintrin.h: Adapt to new builtin prototype. >> * config/i386/emmintrin.h: Likewise. >> * doc/extend.texi (X86 Built-in Functions): Document changed >> prototype. >> >> >> testsuite/ >> * gcc.target/i386/pr54855-1.c: New testcase. >> * gcc.target/i386/pr54855-2.c: New testcase. Hello Eric, could you take a look at the small simplify-rtx bit of this patch to see if the general approach makes sense to you? (this targets 4.9 and passes bootstrap+testsuite on x86_64-linux) The point of this transformation is to avoid writing a second define_insn in config/i386/sse.md as in the older patch: http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00028.html (similar patches for multiplication, division, etc will follow, and this will avoid an extra entry in sse.md for each of these operations) Thanks, Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c (revision 194199) +++ gcc/simplify-rtx.c (working copy) @@ -3588,20 +3588,34 @@ simplify_binary_operation_1 (enum rtx_co int len0 = XVECLEN (par0, 0); int len1 = XVECLEN (par1, 0); rtvec vec = rtvec_alloc (len0 + len1); for (int i = 0; i < len0; i++) RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i); for (int i = 0; i < len1; i++) RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i); return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0), gen_rtx_PARALLEL (VOIDmode, vec)); } + + /* The x86 back-end uses VEC_CONCAT to set an element in a V2DF, but + VEC_MERGE for scalar operations that preserve the other elements + of a vector. */ + if (GET_CODE (trueop1) == VEC_SELECT + && GET_MODE (XEXP (trueop1, 0)) == mode + && XVECLEN (XEXP (trueop1, 1), 0) == 1 + && INTVAL (XVECEXP (XEXP (trueop1, 1), 0, 0)) == 1) + { + rtx newop0 = gen_rtx_fmt_e (VEC_DUPLICATE, mode, trueop0); + rtx newop1 = XEXP (trueop1, 0); + return gen_rtx_fmt_eee (VEC_MERGE, mode, newop0, newop1, + const1_rtx); + } } return 0; default: gcc_unreachable (); } return 0; } Index: gcc/testsuite/gcc.target/i386/pr54855-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr54855-2.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr54855-2.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -msse" } */ + +typedef float vec __attribute__((vector_size(16))); + +vec f (vec x) +{ + x[0] += 2; + return x; +} + +vec g (vec x) +{ + x[0] -= 1; + return x; +} + +/* { dg-final { scan-assembler-not "mov" } } */ Property changes on: gcc/testsuite/gcc.target/i386/pr54855-2.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/testsuite/gcc.target/i386/pr54855-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr54855-1.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr54855-1.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O -msse2" } */ + +typedef double vec __attribute__((vector_size(16))); + +vec f (vec x) +{ + x[0] += 2; + return x; +} + +vec g (vec x) +{ + x[0] -= 1; + return x; +} + +/* { dg-final { scan-assembler-not "mov" } } */ Property changes on: gcc/testsuite/gcc.target/i386/pr54855-1.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (revision 194199) +++ gcc/doc/extend.texi (working copy) @@ -9843,22 +9843,22 @@ int __builtin_ia32_comige (v4sf, v4sf) int __builtin_ia32_ucomieq (v4sf, v4sf) int __builtin_ia32_ucomineq (v4sf, v4sf) int __builtin_ia32_ucomilt (v4sf, v4sf) int __builtin_ia32_ucomile (v4sf, v4sf) int __builtin_ia32_ucomigt (v4sf, v4sf) int __builtin_ia32_ucomige (v4sf, v4sf) v4sf __builtin_ia32_addps (v4sf, v4sf) v4sf __builtin_ia32_subps (v4sf, v4sf) v4sf __builtin_ia32_mulps (v4sf, v4sf) v4sf __builtin_ia32_divps (v4sf, v4sf) -v4sf __builtin_ia32_addss (v4sf, v4sf) -v4sf __builtin_ia32_subss (v4sf, v4sf) +v4sf __builtin_ia32_addss (v4sf, float) +v4sf __builtin_ia32_subss (v4sf, float) v4sf __builtin_ia32_mulss (v4sf, v4sf) v4sf __builtin_ia32_divss (v4sf, v4sf) v4si __builtin_ia32_cmpeqps (v4sf, v4sf) v4si __builtin_ia32_cmpltps (v4sf, v4sf) v4si __builtin_ia32_cmpleps (v4sf, v4sf) v4si __builtin_ia32_cmpgtps (v4sf, v4sf) v4si __builtin_ia32_cmpgeps (v4sf, v4sf) v4si __builtin_ia32_cmpunordps (v4sf, v4sf) v4si __builtin_ia32_cmpneqps (v4sf, v4sf) v4si __builtin_ia32_cmpnltps (v4sf, v4sf) @@ -9964,22 +9964,22 @@ v2df __builtin_ia32_cmpunordsd (v2df, v2 v2df __builtin_ia32_cmpneqsd (v2df, v2df) v2df __builtin_ia32_cmpnltsd (v2df, v2df) v2df __builtin_ia32_cmpnlesd (v2df, v2df) v2df __builtin_ia32_cmpordsd (v2df, v2df) v2di __builtin_ia32_paddq (v2di, v2di) v2di __builtin_ia32_psubq (v2di, v2di) v2df __builtin_ia32_addpd (v2df, v2df) v2df __builtin_ia32_subpd (v2df, v2df) v2df __builtin_ia32_mulpd (v2df, v2df) v2df __builtin_ia32_divpd (v2df, v2df) -v2df __builtin_ia32_addsd (v2df, v2df) -v2df __builtin_ia32_subsd (v2df, v2df) +v2df __builtin_ia32_addsd (v2df, double) +v2df __builtin_ia32_subsd (v2df, double) v2df __builtin_ia32_mulsd (v2df, v2df) v2df __builtin_ia32_divsd (v2df, v2df) v2df __builtin_ia32_minpd (v2df, v2df) v2df __builtin_ia32_maxpd (v2df, v2df) v2df __builtin_ia32_minsd (v2df, v2df) v2df __builtin_ia32_maxsd (v2df, v2df) v2df __builtin_ia32_andpd (v2df, v2df) v2df __builtin_ia32_andnpd (v2df, v2df) v2df __builtin_ia32_orpd (v2df, v2df) v2df __builtin_ia32_xorpd (v2df, v2df) Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 194199) +++ gcc/config/i386/sse.md (working copy) @@ -858,23 +858,26 @@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) (define_insn "_vm3" [(set (match_operand:VF_128 0 "register_operand" "=x,x") (vec_merge:VF_128 - (plusminus:VF_128 - (match_operand:VF_128 1 "register_operand" "0,x") - (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")) + (vec_duplicate:VF_128 + (plusminus: + (vec_select: + (match_operand:VF_128 1 "register_operand" "0,x") + (parallel [(const_int 0)])) + (match_operand: 2 "nonimmediate_operand" "xm,xm"))) (match_dup 1) (const_int 1)))] "TARGET_SSE" "@ \t{%2, %0|%0, %2} v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) Index: gcc/config/i386/i386-builtin-types.def =================================================================== --- gcc/config/i386/i386-builtin-types.def (revision 194199) +++ gcc/config/i386/i386-builtin-types.def (working copy) @@ -263,20 +263,21 @@ DEF_FUNCTION_TYPE (UINT64, UINT64, UINT6 DEF_FUNCTION_TYPE (UINT8, UINT8, INT) DEF_FUNCTION_TYPE (V16QI, V16QI, SI) DEF_FUNCTION_TYPE (V16QI, V16QI, V16QI) DEF_FUNCTION_TYPE (V16QI, V8HI, V8HI) DEF_FUNCTION_TYPE (V1DI, V1DI, SI) DEF_FUNCTION_TYPE (V1DI, V1DI, V1DI) DEF_FUNCTION_TYPE (V1DI, V2SI, V2SI) DEF_FUNCTION_TYPE (V1DI, V8QI, V8QI) DEF_FUNCTION_TYPE (V2DF, PCV2DF, V2DI) DEF_FUNCTION_TYPE (V2DF, V2DF, DI) +DEF_FUNCTION_TYPE (V2DF, V2DF, DOUBLE) DEF_FUNCTION_TYPE (V2DF, V2DF, INT) DEF_FUNCTION_TYPE (V2DF, V2DF, PCDOUBLE) DEF_FUNCTION_TYPE (V2DF, V2DF, SI) DEF_FUNCTION_TYPE (V2DF, V2DF, V2DF) DEF_FUNCTION_TYPE (V2DF, V2DF, V2DI) DEF_FUNCTION_TYPE (V2DF, V2DF, V4SF) DEF_FUNCTION_TYPE (V2DF, V4DF, INT) DEF_FUNCTION_TYPE (V2DI, V16QI, V16QI) DEF_FUNCTION_TYPE (V2DI, V2DF, V2DF) DEF_FUNCTION_TYPE (V2DI, V2DI, INT) @@ -296,20 +297,21 @@ DEF_FUNCTION_TYPE (V4DF, PCV4DF, V4DI) DEF_FUNCTION_TYPE (V4DF, V4DF, INT) DEF_FUNCTION_TYPE (V4DF, V4DF, V4DF) DEF_FUNCTION_TYPE (V4DF, V4DF, V4DI) DEF_FUNCTION_TYPE (V4HI, V2SI, V2SI) DEF_FUNCTION_TYPE (V4HI, V4HI, INT) DEF_FUNCTION_TYPE (V4HI, V4HI, SI) DEF_FUNCTION_TYPE (V4HI, V4HI, V4HI) DEF_FUNCTION_TYPE (V4HI, V8QI, V8QI) DEF_FUNCTION_TYPE (V4SF, PCV4SF, V4SI) DEF_FUNCTION_TYPE (V4SF, V4SF, DI) +DEF_FUNCTION_TYPE (V4SF, V4SF, FLOAT) DEF_FUNCTION_TYPE (V4SF, V4SF, INT) DEF_FUNCTION_TYPE (V4SF, V4SF, PCV2SF) DEF_FUNCTION_TYPE (V4SF, V4SF, SI) DEF_FUNCTION_TYPE (V4SF, V4SF, V2DF) DEF_FUNCTION_TYPE (V4SF, V4SF, V2SI) DEF_FUNCTION_TYPE (V4SF, V4SF, V4SF) DEF_FUNCTION_TYPE (V4SF, V4SF, V4SI) DEF_FUNCTION_TYPE (V4SF, V8SF, INT) DEF_FUNCTION_TYPE (V4SI, V2DF, V2DF) DEF_FUNCTION_TYPE (V4SI, V4SF, V4SF) Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 194199) +++ gcc/config/i386/i386.c (working copy) @@ -27059,22 +27059,22 @@ static const struct builtin_description { OPTION_MASK_ISA_SSE, CODE_FOR_sse_cvttps2pi, "__builtin_ia32_cvttps2pi", IX86_BUILTIN_CVTTPS2PI, UNKNOWN, (int) V2SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_cvttss2si, "__builtin_ia32_cvttss2si", IX86_BUILTIN_CVTTSS2SI, UNKNOWN, (int) INT_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_64BIT, CODE_FOR_sse_cvttss2siq, "__builtin_ia32_cvttss2si64", IX86_BUILTIN_CVTTSS2SI64, UNKNOWN, (int) INT64_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_shufps, "__builtin_ia32_shufps", IX86_BUILTIN_SHUFPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT }, { OPTION_MASK_ISA_SSE, CODE_FOR_addv4sf3, "__builtin_ia32_addps", IX86_BUILTIN_ADDPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_subv4sf3, "__builtin_ia32_subps", IX86_BUILTIN_SUBPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_mulv4sf3, "__builtin_ia32_mulps", IX86_BUILTIN_MULPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_divv4sf3, "__builtin_ia32_divps", IX86_BUILTIN_DIVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, - { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmaddv4sf3, "__builtin_ia32_addss", IX86_BUILTIN_ADDSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, - { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmsubv4sf3, "__builtin_ia32_subss", IX86_BUILTIN_SUBSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, + { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmaddv4sf3, "__builtin_ia32_addss", IX86_BUILTIN_ADDSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_FLOAT }, + { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmsubv4sf3, "__builtin_ia32_subss", IX86_BUILTIN_SUBSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_FLOAT }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmmulv4sf3, "__builtin_ia32_mulss", IX86_BUILTIN_MULSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmdivv4sf3, "__builtin_ia32_divss", IX86_BUILTIN_DIVSS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpeqps", IX86_BUILTIN_CMPEQPS, EQ, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpltps", IX86_BUILTIN_CMPLTPS, LT, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpleps", IX86_BUILTIN_CMPLEPS, LE, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpgtps", IX86_BUILTIN_CMPGTPS, LT, (int) V4SF_FTYPE_V4SF_V4SF_SWAP }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpgeps", IX86_BUILTIN_CMPGEPS, LE, (int) V4SF_FTYPE_V4SF_V4SF_SWAP }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpunordps", IX86_BUILTIN_CMPUNORDPS, UNORDERED, (int) V4SF_FTYPE_V4SF_V4SF }, { OPTION_MASK_ISA_SSE, CODE_FOR_sse_maskcmpv4sf3, "__builtin_ia32_cmpneqps", IX86_BUILTIN_CMPNEQPS, NE, (int) V4SF_FTYPE_V4SF_V4SF }, @@ -27163,22 +27163,22 @@ static const struct builtin_description { OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvttsd2siq, "__builtin_ia32_cvttsd2si64", IX86_BUILTIN_CVTTSD2SI64, UNKNOWN, (int) INT64_FTYPE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2dq, "__builtin_ia32_cvtps2dq", IX86_BUILTIN_CVTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2pd, "__builtin_ia32_cvtps2pd", IX86_BUILTIN_CVTPS2PD, UNKNOWN, (int) V2DF_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_fix_truncv4sfv4si2, "__builtin_ia32_cvttps2dq", IX86_BUILTIN_CVTTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_addv2df3, "__builtin_ia32_addpd", IX86_BUILTIN_ADDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_subv2df3, "__builtin_ia32_subpd", IX86_BUILTIN_SUBPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_mulv2df3, "__builtin_ia32_mulpd", IX86_BUILTIN_MULPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_divv2df3, "__builtin_ia32_divpd", IX86_BUILTIN_DIVPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmaddv2df3, "__builtin_ia32_addsd", IX86_BUILTIN_ADDSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmsubv2df3, "__builtin_ia32_subsd", IX86_BUILTIN_SUBSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmaddv2df3, "__builtin_ia32_addsd", IX86_BUILTIN_ADDSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_DOUBLE }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmsubv2df3, "__builtin_ia32_subsd", IX86_BUILTIN_SUBSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_DOUBLE }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmmulv2df3, "__builtin_ia32_mulsd", IX86_BUILTIN_MULSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_vmdivv2df3, "__builtin_ia32_divsd", IX86_BUILTIN_DIVSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpeqpd", IX86_BUILTIN_CMPEQPD, EQ, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpltpd", IX86_BUILTIN_CMPLTPD, LT, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmplepd", IX86_BUILTIN_CMPLEPD, LE, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpgtpd", IX86_BUILTIN_CMPGTPD, LT, (int) V2DF_FTYPE_V2DF_V2DF_SWAP }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpgepd", IX86_BUILTIN_CMPGEPD, LE, (int) V2DF_FTYPE_V2DF_V2DF_SWAP}, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpunordpd", IX86_BUILTIN_CMPUNORDPD, UNORDERED, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_maskcmpv2df3, "__builtin_ia32_cmpneqpd", IX86_BUILTIN_CMPNEQPD, NE, (int) V2DF_FTYPE_V2DF_V2DF }, @@ -30790,34 +30790,36 @@ ix86_expand_args_builtin (const struct b case V4HI_FTYPE_V8QI_V8QI: case V4HI_FTYPE_V2SI_V2SI: case V4DF_FTYPE_V4DF_V4DF: case V4DF_FTYPE_V4DF_V4DI: case V4SF_FTYPE_V4SF_V4SF: case V4SF_FTYPE_V4SF_V4SI: case V4SF_FTYPE_V4SF_V2SI: case V4SF_FTYPE_V4SF_V2DF: case V4SF_FTYPE_V4SF_DI: case V4SF_FTYPE_V4SF_SI: + case V4SF_FTYPE_V4SF_FLOAT: case V2DI_FTYPE_V2DI_V2DI: case V2DI_FTYPE_V16QI_V16QI: case V2DI_FTYPE_V4SI_V4SI: case V2UDI_FTYPE_V4USI_V4USI: case V2DI_FTYPE_V2DI_V16QI: case V2DI_FTYPE_V2DF_V2DF: case V2SI_FTYPE_V2SI_V2SI: case V2SI_FTYPE_V4HI_V4HI: case V2SI_FTYPE_V2SF_V2SF: case V2DF_FTYPE_V2DF_V2DF: case V2DF_FTYPE_V2DF_V4SF: case V2DF_FTYPE_V2DF_V2DI: case V2DF_FTYPE_V2DF_DI: case V2DF_FTYPE_V2DF_SI: + case V2DF_FTYPE_V2DF_DOUBLE: case V2SF_FTYPE_V2SF_V2SF: case V1DI_FTYPE_V1DI_V1DI: case V1DI_FTYPE_V8QI_V8QI: case V1DI_FTYPE_V2SI_V2SI: case V32QI_FTYPE_V16HI_V16HI: case V16HI_FTYPE_V8SI_V8SI: case V32QI_FTYPE_V32QI_V32QI: case V16HI_FTYPE_V32QI_V32QI: case V16HI_FTYPE_V16HI_V16HI: case V8SI_FTYPE_V4DF_V4DF: Index: gcc/config/i386/xmmintrin.h =================================================================== --- gcc/config/i386/xmmintrin.h (revision 194199) +++ gcc/config/i386/xmmintrin.h (working copy) @@ -92,27 +92,27 @@ _mm_setzero_ps (void) return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; } /* Perform the respective operation on the lower SPFP (single-precision floating-point) values of A and B; the upper three SPFP values are passed through from A. */ extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_add_ss (__m128 __A, __m128 __B) { - return (__m128) __builtin_ia32_addss ((__v4sf)__A, (__v4sf)__B); + return (__m128) __builtin_ia32_addss ((__v4sf)__A, __B[0]); } extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sub_ss (__m128 __A, __m128 __B) { - return (__m128) __builtin_ia32_subss ((__v4sf)__A, (__v4sf)__B); + return (__m128) __builtin_ia32_subss ((__v4sf)__A, __B[0]); } extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mul_ss (__m128 __A, __m128 __B) { return (__m128) __builtin_ia32_mulss ((__v4sf)__A, (__v4sf)__B); } extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_div_ss (__m128 __A, __m128 __B) Index: gcc/config/i386/emmintrin.h =================================================================== --- gcc/config/i386/emmintrin.h (revision 194199) +++ gcc/config/i386/emmintrin.h (working copy) @@ -226,33 +226,33 @@ _mm_cvtsi128_si64x (__m128i __A) extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_add_pd (__m128d __A, __m128d __B) { return (__m128d)__builtin_ia32_addpd ((__v2df)__A, (__v2df)__B); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_add_sd (__m128d __A, __m128d __B) { - return (__m128d)__builtin_ia32_addsd ((__v2df)__A, (__v2df)__B); + return (__m128d)__builtin_ia32_addsd ((__v2df)__A, __B[0]); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sub_pd (__m128d __A, __m128d __B) { return (__m128d)__builtin_ia32_subpd ((__v2df)__A, (__v2df)__B); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sub_sd (__m128d __A, __m128d __B) { - return (__m128d)__builtin_ia32_subsd ((__v2df)__A, (__v2df)__B); + return (__m128d)__builtin_ia32_subsd ((__v2df)__A, __B[0]); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mul_pd (__m128d __A, __m128d __B) { return (__m128d)__builtin_ia32_mulpd ((__v2df)__A, (__v2df)__B); } extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mul_sd (__m128d __A, __m128d __B)