From patchwork Mon Oct 8 19:36:40 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [i386] recognize haddpd Date: Mon, 08 Oct 2012 09:36:40 -0000 From: Marc Glisse X-Patchwork-Id: 190112 Message-Id: To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org On Mon, 8 Oct 2012, Uros Bizjak wrote: > You missed the most important sseadd1 addition, the one that prevents > checking of operand2 when calculating "memory" attribute: > > (and (eq_attr "type" > "!alu1,negnot,ishift1, > imov,imovx,icmp,test,bitmanip, > fmov,fcmp,fsgn, > sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1, > sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt") > (match_operand 2 "memory_operand")) > > Please note "!" in the above expression. [...] > Also note that you have to add handling of sseadd1 attribute in other > (scheduler) *.md files. Simply grep for sseadd and add ",sseadd1" > everywhere. Thank you, it makes more sense now. The attached passed bootstrap+testsuite. I didn't know if I should be more precise in the ChangeLog, but it would make the ChangeLog as long as the patch with about 23 entries like: (define_insn_reservation bdver1_ssemuladd_256): Likewise Next goal would be to further recognize some DPPD potential uses, but that seems harder. 2012-10-09 Marc Glisse gcc/ PR target/54400 * config/i386/i386.md (type attribute): Add sseadd1. (unit attribute): Add support for sseadd1. (memory attribute): Likewise. * config/i386/athlon.md: Likewise. * config/i386/core2.md: Likewise. * config/i386/atom.md: Likewise. * config/i386/ppro.md: Likewise. * config/i386/bdver1.md: Likewise. * config/i386/sse.md (sse3_hv2df3): split into... (sse3_haddv2df3): ... expander. (*sse3_haddv2df3): ... define_insn. Accept permuted operands. (sse3_hsubv2df3): ... define_insn. (*sse3_haddv2df3_low): New define_insn. (*sse3_hsubv2df3_low): New define_insn. gcc/testsuite/ PR target/54400 * gcc.target/i386/pr54400.c: New testcase. Index: testsuite/gcc.target/i386/pr54400.c =================================================================== --- testsuite/gcc.target/i386/pr54400.c (revision 0) +++ testsuite/gcc.target/i386/pr54400.c (revision 0) @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse3 -mfpmath=sse" } */ + +#include + +double f (__m128d p) +{ + return p[0] - p[1]; +} + +double g1 (__m128d p) +{ + return p[0] + p[1]; +} + +double g2 (__m128d p) +{ + return p[1] + p[0]; +} + +__m128d h (__m128d p, __m128d q) +{ + __m128d r = { p[0] - p[1], q[0] - q[1] }; + return r; +} + +__m128d i1 (__m128d p, __m128d q) +{ + __m128d r = { p[0] + p[1], q[0] + q[1] }; + return r; +} + +__m128d i2 (__m128d p, __m128d q) +{ + __m128d r = { p[0] + p[1], q[1] + q[0] }; + return r; +} + +__m128d i3 (__m128d p, __m128d q) +{ + __m128d r = { p[1] + p[0], q[0] + q[1] }; + return r; +} + +__m128d i4 (__m128d p, __m128d q) +{ + __m128d r = { p[1] + p[0], q[1] + q[0] }; + return r; +} + +/* { dg-final { scan-assembler-times "hsubpd" 2 } } */ +/* { dg-final { scan-assembler-times "haddpd" 6 } } */ +/* { dg-final { scan-assembler-not "unpck" } } */ Property changes on: testsuite/gcc.target/i386/pr54400.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 192214) +++ config/i386/i386.md (working copy) @@ -320,36 +320,36 @@ ;; provided in other attributes. (define_attr "type" "other,multi, alu,alu1,negnot,imov,imovx,lea, incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv, icmp,test,ibr,setcc,icmov, push,pop,call,callv,leave, str,bitmanip, fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint, sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul, - sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins, - ssemuladd,sse4arg,lwp, + sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt, + ssediv,sseins,ssemuladd,sse4arg,lwp, mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft" (const_string "other")) ;; Main data type used by the insn (define_attr "mode" "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF" (const_string "unknown")) ;; The CPU unit operations uses. (define_attr "unit" "integer,i387,sse,mmx,unknown" (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint") (const_string "i387") (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul, - sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt, + sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt, ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg") (const_string "sse") (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft") (const_string "mmx") (eq_attr "type" "other") (const_string "unknown")] (const_string "integer"))) ;; The (bounding maximum) length of an instruction immediate. (define_attr "length_immediate" "" @@ -592,21 +592,21 @@ (const_string "both") (match_operand 0 "memory_operand") (const_string "store") (match_operand 1 "memory_operand") (const_string "load") (and (eq_attr "type" "!alu1,negnot,ishift1, imov,imovx,icmp,test,bitmanip, fmov,fcmp,fsgn, sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1, - sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt") + sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt") (match_operand 2 "memory_operand")) (const_string "load") (and (eq_attr "type" "icmov,ssemuladd,sse4arg") (match_operand 3 "memory_operand")) (const_string "load") ] (const_string "none"))) ;; Indicates if an instruction has both an immediate and a displacement. Index: config/i386/athlon.md =================================================================== --- config/i386/athlon.md (revision 192214) +++ config/i386/athlon.md (working copy) @@ -800,61 +800,61 @@ (and (eq_attr "cpu" "athlon,k8,generic64") (eq_attr "type" "ssecomi")) "athlon-vector,athlon-fpsched,athlon-fadd") (define_insn_reservation "athlon_ssecomi_amdfam10" 3 (and (eq_attr "cpu" "amdfam10") ;; It seems athlon_ssecomi has a bug in the attr_type, fixed for amdfam10 (eq_attr "type" "ssecomi")) "athlon-direct,athlon-fpsched,athlon-fadd") (define_insn_reservation "athlon_sseadd_load" 4 (and (eq_attr "cpu" "athlon") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (and (eq_attr "mode" "SF,DF,DI") (eq_attr "memory" "load")))) "athlon-direct,athlon-fpload,athlon-fadd") (define_insn_reservation "athlon_sseadd_load_k8" 6 (and (eq_attr "cpu" "k8,generic64,amdfam10") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (and (eq_attr "mode" "SF,DF,DI") (eq_attr "memory" "load")))) "athlon-direct,athlon-fploadk8,athlon-fadd") (define_insn_reservation "athlon_sseadd" 4 (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (eq_attr "mode" "SF,DF,DI"))) "athlon-direct,athlon-fpsched,athlon-fadd") (define_insn_reservation "athlon_sseaddvector_load" 5 (and (eq_attr "cpu" "athlon") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (eq_attr "memory" "load"))) "athlon-vector,athlon-fpload2,(athlon-fadd*2)") (define_insn_reservation "athlon_sseaddvector_load_k8" 7 (and (eq_attr "cpu" "k8,generic64") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (eq_attr "memory" "load"))) "athlon-double,athlon-fpload2k8,(athlon-fadd*2)") (define_insn_reservation "athlon_sseaddvector_load_amdfam10" 6 (and (eq_attr "cpu" "amdfam10") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (eq_attr "memory" "load"))) "athlon-direct,athlon-fploadk8,athlon-fadd") (define_insn_reservation "athlon_sseaddvector" 5 (and (eq_attr "cpu" "athlon") - (eq_attr "type" "sseadd")) + (eq_attr "type" "sseadd,sseadd1")) "athlon-vector,athlon-fpsched,(athlon-fadd*2)") (define_insn_reservation "athlon_sseaddvector_k8" 5 (and (eq_attr "cpu" "k8,generic64") - (eq_attr "type" "sseadd")) + (eq_attr "type" "sseadd,sseadd1")) "athlon-double,athlon-fpsched,(athlon-fadd*2)") (define_insn_reservation "athlon_sseaddvector_amdfam10" 4 (and (eq_attr "cpu" "amdfam10") - (eq_attr "type" "sseadd")) + (eq_attr "type" "sseadd,sseadd1")) "athlon-direct,athlon-fpsched,athlon-fadd") ;; Conversions behaves very irregularly and the scheduling is critical here. ;; Take each instruction separately. Assume that the mode is always set to the ;; destination one and athlon_decode is set to the K8 versions. ;; cvtss2sd (define_insn_reservation "athlon_ssecvt_cvtss2sd_load_k8" 4 (and (eq_attr "cpu" "k8,athlon,generic64") (and (eq_attr "type" "ssecvt") Index: config/i386/core2.md =================================================================== --- config/i386/core2.md (revision 192214) +++ config/i386/core2.md (working copy) @@ -29,21 +29,21 @@ ;; The core2_idiv, core2_fdiv and core2_ssediv automata are used to ;; model issue latencies of idiv, fdiv and ssediv type insns. (define_automaton "core2_decoder,core2_core,core2_idiv,core2_fdiv,core2_ssediv,core2_load,core2_store") ;; The CPU domain, used for Core i7 bypass latencies (define_attr "i7_domain" "int,float,simd" (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint") (const_string "float") (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul, - sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt, + sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt, ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg") (cond [(eq_attr "mode" "V4DF,V8SF,V2DF,V4SF,SF,DF") (const_string "float") (eq_attr "mode" "SI") (const_string "int")] (const_string "simd")) (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft") (const_string "simd")] (const_string "int"))) @@ -521,27 +521,27 @@ (define_insn_reservation "c2_sse_V4SF" 4 (and (eq_attr "cpu" "core2,corei7") (and (eq_attr "mode" "V4SF") (eq_attr "type" "sse"))) "c2_decoder0,c2_p1*2") (define_insn_reservation "c2_sse_addcmp" 3 (and (eq_attr "cpu" "core2,corei7") (and (eq_attr "memory" "none") - (eq_attr "type" "sseadd,ssecmp,ssecomi"))) + (eq_attr "type" "sseadd,sseadd1,ssecmp,ssecomi"))) "c2_decodern,c2_p1") (define_insn_reservation "c2_sse_addcmp_load" 3 (and (eq_attr "cpu" "core2,corei7") (and (eq_attr "memory" "load") - (eq_attr "type" "sseadd,ssecmp,ssecomi"))) + (eq_attr "type" "sseadd,sseadd1,ssecmp,ssecomi"))) "c2_decodern,c2_p2+c2_p1") (define_insn_reservation "c2_sse_mul_SF" 4 (and (eq_attr "cpu" "core2,corei7") (and (eq_attr "memory" "none") (and (eq_attr "mode" "SF,V4SF") (eq_attr "type" "ssemul")))) "c2_decodern,c2_p0") (define_insn_reservation "c2_sse_mul_SF_load" 4 Index: config/i386/atom.md =================================================================== --- config/i386/atom.md (revision 192214) +++ config/i386/atom.md (working copy) @@ -589,39 +589,39 @@ ;; movu mem (define_insn_reservation "atom_ssemov_5" 2 (and (eq_attr "cpu" "atom") (and (eq_attr "type" "ssemov") (ior (eq_attr "movu" "1") (eq_attr "memory" "!none")))) "atom-complex, atom-all-eu") ;; no memory simple (define_insn_reservation "atom_sseadd" 5 (and (eq_attr "cpu" "atom") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (and (eq_attr "memory" "none") (and (eq_attr "mode" "!V2DF") (eq_attr "atom_unit" "!complex"))))) "atom-fadd-5c") ;; memory simple (define_insn_reservation "atom_sseadd_mem" 5 (and (eq_attr "cpu" "atom") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (and (eq_attr "memory" "!none") (and (eq_attr "mode" "!V2DF") (eq_attr "atom_unit" "!complex"))))) "atom-dual-5c") ;; maxps, minps, *pd, hadd, hsub (define_insn_reservation "atom_sseadd_3" 8 (and (eq_attr "cpu" "atom") - (and (eq_attr "type" "sseadd") + (and (eq_attr "type" "sseadd,sseadd1") (ior (eq_attr "mode" "V2DF") (eq_attr "atom_unit" "complex")))) "atom-complex, atom-all-eu*7") ;; Except dppd/dpps (define_insn_reservation "atom_ssemul" 5 (and (eq_attr "cpu" "atom") (and (eq_attr "type" "ssemul") (eq_attr "mode" "!SF"))) "atom-fmul-5c") Index: config/i386/ppro.md =================================================================== --- config/i386/ppro.md (revision 192214) +++ config/i386/ppro.md (working copy) @@ -502,28 +502,28 @@ (define_insn_reservation "ppro_sse_SF" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "mode" "SF") (eq_attr "type" "sse"))) "decodern,p0") (define_insn_reservation "ppro_sse_add_SF" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "none") (and (eq_attr "mode" "SF") - (eq_attr "type" "sseadd")))) + (eq_attr "type" "sseadd,sseadd1")))) "decodern,p1") (define_insn_reservation "ppro_sse_add_SF_load" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "load") (and (eq_attr "mode" "SF") - (eq_attr "type" "sseadd")))) + (eq_attr "type" "sseadd,sseadd1")))) "decoder0,p2+p1") (define_insn_reservation "ppro_sse_cmp_SF" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "none") (and (eq_attr "mode" "SF") (eq_attr "type" "ssecmp")))) "decoder0,p1") (define_insn_reservation "ppro_sse_cmp_SF_load" 3 @@ -612,28 +612,28 @@ (define_insn_reservation "ppro_sse_V4SF" 4 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "mode" "V4SF") (eq_attr "type" "sse"))) "decoder0,p1*2") (define_insn_reservation "ppro_sse_add_V4SF" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V4SF") - (eq_attr "type" "sseadd")))) + (eq_attr "type" "sseadd,sseadd1")))) "decoder0,p1*2") (define_insn_reservation "ppro_sse_add_V4SF_load" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "load") (and (eq_attr "mode" "V4SF") - (eq_attr "type" "sseadd")))) + (eq_attr "type" "sseadd,sseadd1")))) "decoder0,(p2+p1)*2") (define_insn_reservation "ppro_sse_cmp_V4SF" 3 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V4SF") (eq_attr "type" "ssecmp")))) "decoder0,p1*2") (define_insn_reservation "ppro_sse_cmp_V4SF_load" 3 Index: config/i386/bdver1.md =================================================================== --- config/i386/bdver1.md (revision 192214) +++ config/i386/bdver1.md (working copy) @@ -690,38 +690,38 @@ (and (eq_attr "type" "ssecvt") (and (eq_attr "memory" "none") (and (match_operand:V4SF 1 "nonimmediate_operand") (ior (match_operand: V2SI 0 "register_operand") (match_operand: V4SI 0 "register_operand")))))) "bdver1-direct,bdver1-fpsched,bdver1-fcvt") ;; SSE MUL, ADD, and MULADD. (define_insn_reservation "bdver1_ssemuladd_load_256" 11 (and (eq_attr "cpu" "bdver1,bdver2") - (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd") (and (eq_attr "mode" "V8SF,V4DF") (eq_attr "memory" "load")))) "bdver1-double,bdver1-fpload,bdver1-ffma") (define_insn_reservation "bdver1_ssemuladd_256" 7 (and (eq_attr "cpu" "bdver1,bdver2") - (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd") (and (eq_attr "mode" "V8SF,V4DF") (eq_attr "memory" "none")))) "bdver1-double,bdver1-fpsched,bdver1-ffma") (define_insn_reservation "bdver1_ssemuladd_load" 10 (and (eq_attr "cpu" "bdver1,bdver2") - (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd") (eq_attr "memory" "load"))) "bdver1-direct,bdver1-fpload,bdver1-ffma") (define_insn_reservation "bdver1_ssemuladd" 6 (and (eq_attr "cpu" "bdver1,bdver2") - (and (eq_attr "type" "ssemul,sseadd,ssemuladd") + (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd") (eq_attr "memory" "none"))) "bdver1-direct,bdver1-fpsched,bdver1-ffma") (define_insn_reservation "bdver1_sseimul_load" 8 (and (eq_attr "cpu" "bdver1,bdver2") (and (eq_attr "type" "sseimul") (eq_attr "memory" "load"))) "bdver1-direct,bdver1-fpload,bdver1-fmma") (define_insn_reservation "bdver1_sseimul" 4 (and (eq_attr "cpu" "bdver1,bdver2") (and (eq_attr "type" "sseimul") Index: config/i386/sse.md =================================================================== --- config/i386/sse.md (revision 192214) +++ config/i386/sse.md (working copy) @@ -1209,42 +1209,120 @@ (vec_select:DF (match_dup 1) (parallel [(const_int 3)]))) (plusminus:DF (vec_select:DF (match_dup 2) (parallel [(const_int 2)])) (vec_select:DF (match_dup 2) (parallel [(const_int 3)]))))))] "TARGET_AVX" "vhpd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseadd") (set_attr "prefix" "vex") (set_attr "mode" "V4DF")]) -(define_insn "sse3_hv2df3" +(define_expand "sse3_haddv2df3" + [(set (match_operand:V2DF 0 "register_operand") + (vec_concat:V2DF + (plus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand") + (parallel [(const_int 0)])) + (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))) + (plus:DF + (vec_select:DF + (match_operand:V2DF 2 "nonimmediate_operand") + (parallel [(const_int 0)])) + (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))] + "TARGET_SSE3") + +(define_insn "*sse3_haddv2df3" [(set (match_operand:V2DF 0 "register_operand" "=x,x") (vec_concat:V2DF - (plusminus:DF + (plus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(match_operand:SI 3 "const_0_to_1_operand")])) + (vec_select:DF + (match_dup 1) + (parallel [(match_operand:SI 4 "const_0_to_1_operand")]))) + (plus:DF + (vec_select:DF + (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm") + (parallel [(match_operand:SI 5 "const_0_to_1_operand")])) + (vec_select:DF + (match_dup 2) + (parallel [(match_operand:SI 6 "const_0_to_1_operand")])))))] + "TARGET_SSE3 && INTVAL (operands[3]) != INTVAL (operands[4]) + && INTVAL (operands[5]) != INTVAL (operands[6])" + "@ + haddpd\t{%2, %0|%0, %2} + vhaddpd\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "V2DF")]) + +(define_insn "sse3_hsubv2df3" + [(set (match_operand:V2DF 0 "register_operand" "=x,x") + (vec_concat:V2DF + (minus:DF (vec_select:DF (match_operand:V2DF 1 "register_operand" "0,x") (parallel [(const_int 0)])) (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))) - (plusminus:DF + (minus:DF (vec_select:DF (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm") (parallel [(const_int 0)])) (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))] "TARGET_SSE3" "@ - hpd\t{%2, %0|%0, %2} - vhpd\t{%2, %1, %0|%0, %1, %2}" + hsubpd\t{%2, %0|%0, %2} + vhsubpd\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") (set_attr "prefix" "orig,vex") (set_attr "mode" "V2DF")]) +(define_insn "*sse3_haddv2df3_low" + [(set (match_operand:DF 0 "register_operand" "=x,x") + (plus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(match_operand:SI 2 "const_0_to_1_operand")])) + (vec_select:DF + (match_dup 1) + (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))))] + "TARGET_SSE3 && INTVAL (operands[2]) != INTVAL (operands[3])" + "@ + haddpd\t{%0, %0|%0, %0} + vhaddpd\t{%1, %1, %0|%0, %1, %1}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd1") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "V2DF")]) + +(define_insn "*sse3_hsubv2df3_low" + [(set (match_operand:DF 0 "register_operand" "=x,x") + (minus:DF + (vec_select:DF + (match_operand:V2DF 1 "register_operand" "0,x") + (parallel [(const_int 0)])) + (vec_select:DF + (match_dup 1) + (parallel [(const_int 1)]))))] + "TARGET_SSE3" + "@ + hsubpd\t{%0, %0|%0, %0} + vhsubpd\t{%1, %1, %0|%0, %1, %1}" + [(set_attr "isa" "noavx,avx") + (set_attr "type" "sseadd1") + (set_attr "prefix" "orig,vex") + (set_attr "mode" "V2DF")]) + (define_insn "avx_hv8sf3" [(set (match_operand:V8SF 0 "register_operand" "=x") (vec_concat:V8SF (vec_concat:V4SF (vec_concat:V2SF (plusminus:SF (vec_select:SF (match_operand:V8SF 1 "register_operand" "x") (parallel [(const_int 0)])) (vec_select:SF (match_dup 1) (parallel [(const_int 1)])))