diff mbox

[i386] recognize haddpd

Message ID alpine.DEB.2.02.1210082114190.3732@laptop-mg.saclay.inria.fr
State New
Headers show

Commit Message

Marc Glisse Oct. 8, 2012, 7:36 p.m. UTC
On Mon, 8 Oct 2012, Uros Bizjak wrote:

> You missed the most important sseadd1 addition, the one that prevents
> checking of operand2 when calculating "memory" attribute:
>
> 	 (and (eq_attr "type"
> 		 "!alu1,negnot,ishift1,
> 		   imov,imovx,icmp,test,bitmanip,
> 		   fmov,fcmp,fsgn,
> 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
> 		   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
> 	      (match_operand 2 "memory_operand"))
>
> Please note "!" in the above expression.
[...]
> Also note that you have to add handling of sseadd1 attribute in other
> (scheduler) *.md files. Simply grep for sseadd and add ",sseadd1"
> everywhere.

Thank you, it makes more sense now. The attached passed 
bootstrap+testsuite. I didn't know if I should be more precise in the 
ChangeLog, but it would make the ChangeLog as long as the patch with about 
23 entries like:
 	(define_insn_reservation bdver1_ssemuladd_256): Likewise

Next goal would be to further recognize some DPPD potential uses, but that 
seems harder.


2012-10-09  Marc Glisse  <marc.glisse@inria.fr>

gcc/
         PR target/54400
         * config/i386/i386.md (type attribute): Add sseadd1.
         (unit attribute): Add support for sseadd1.
         (memory attribute): Likewise.
 	* config/i386/athlon.md: Likewise.
 	* config/i386/core2.md: Likewise.
 	* config/i386/atom.md: Likewise.
 	* config/i386/ppro.md: Likewise.
 	* config/i386/bdver1.md: Likewise.
         * config/i386/sse.md (sse3_h<plusminus_insn>v2df3): split into...
         (sse3_haddv2df3): ... expander.
         (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
         (sse3_hsubv2df3): ... define_insn.
         (*sse3_haddv2df3_low): New define_insn.
         (*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
         PR target/54400
         * gcc.target/i386/pr54400.c: New testcase.

Comments

Uros Bizjak Oct. 8, 2012, 7:58 p.m. UTC | #1
On Mon, Oct 8, 2012 at 9:36 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 8 Oct 2012, Uros Bizjak wrote:
>
>> You missed the most important sseadd1 addition, the one that prevents
>> checking of operand2 when calculating "memory" attribute:
>>
>>          (and (eq_attr "type"
>>                  "!alu1,negnot,ishift1,
>>                    imov,imovx,icmp,test,bitmanip,
>>                    fmov,fcmp,fsgn,
>>
>> sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
>>                    sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
>>               (match_operand 2 "memory_operand"))
>>
>> Please note "!" in the above expression.
>
> [...]
>
>> Also note that you have to add handling of sseadd1 attribute in other
>> (scheduler) *.md files. Simply grep for sseadd and add ",sseadd1"
>> everywhere.
>
>
> Thank you, it makes more sense now. The attached passed bootstrap+testsuite.
> I didn't know if I should be more precise in the ChangeLog, but it would
> make the ChangeLog as long as the patch with about 23 entries like:
>         (define_insn_reservation bdver1_ssemuladd_256): Likewise
>
> Next goal would be to further recognize some DPPD potential uses, but that
> seems harder.
>
>
> 2012-10-09  Marc Glisse  <marc.glisse@inria.fr>
>
>
> gcc/
>         PR target/54400
>         * config/i386/i386.md (type attribute): Add sseadd1.
>         (unit attribute): Add support for sseadd1.
>         (memory attribute): Likewise.
>         * config/i386/athlon.md: Likewise.
>         * config/i386/core2.md: Likewise.
>         * config/i386/atom.md: Likewise.
>         * config/i386/ppro.md: Likewise.
>         * config/i386/bdver1.md: Likewise.
>
>         * config/i386/sse.md (sse3_h<plusminus_insn>v2df3): split into...
>         (sse3_haddv2df3): ... expander.
>         (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
>         (sse3_hsubv2df3): ... define_insn.
>         (*sse3_haddv2df3_low): New define_insn.
>         (*sse3_hsubv2df3_low): New define_insn.
>
> gcc/testsuite/
>         PR target/54400
>         * gcc.target/i386/pr54400.c: New testcase.

OK for mainline SVN with a couple of small changes below ...

> +(define_insn "*sse3_haddv2df3"
>    [(set (match_operand:V2DF 0 "register_operand" "=x,x")
>         (vec_concat:V2DF
> -         (plusminus:DF
> +         (plus:DF
> +           (vec_select:DF
> +             (match_operand:V2DF 1 "register_operand" "0,x")
> +             (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))
> +           (vec_select:DF
> +             (match_dup 1)
> +             (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
> +         (plus:DF
> +           (vec_select:DF
> +             (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm")
> +             (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
> +           (vec_select:DF
> +             (match_dup 2)
> +             (parallel [(match_operand:SI 6 "const_0_to_1_operand")])))))]
> +  "TARGET_SSE3 && INTVAL (operands[3]) != INTVAL (operands[4])
> +   && INTVAL (operands[5]) != INTVAL (operands[6])"

Please put every && expression in its own line:

"TARGET_SSE3
  && INTVAL (operands[3]) != INTVAL (operands[4])
  && INTVAL (operands[5]) != INTVAL (operands[6])"

> +(define_insn "*sse3_haddv2df3_low"
> +  [(set (match_operand:DF 0 "register_operand" "=x,x")
> +       (plus:DF
> +         (vec_select:DF
> +           (match_operand:V2DF 1 "register_operand" "0,x")
> +           (parallel [(match_operand:SI 2 "const_0_to_1_operand")]))
> +         (vec_select:DF
> +           (match_dup 1)
> +           (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))))]
> +  "TARGET_SSE3 && INTVAL (operands[2]) != INTVAL (operands[3])"

Also here.

Thanks,
Uros.
diff mbox

Patch

Index: testsuite/gcc.target/i386/pr54400.c
===================================================================
--- testsuite/gcc.target/i386/pr54400.c	(revision 0)
+++ testsuite/gcc.target/i386/pr54400.c	(revision 0)
@@ -0,0 +1,53 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse3 -mfpmath=sse" } */
+
+#include <x86intrin.h>
+
+double f (__m128d p)
+{
+  return p[0] - p[1];
+}
+
+double g1 (__m128d p)
+{
+  return p[0] + p[1];
+}
+
+double g2 (__m128d p)
+{
+  return p[1] + p[0];
+}
+
+__m128d h (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] - p[1], q[0] - q[1] };
+  return r;
+}
+
+__m128d i1 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[0] + q[1] };
+  return r;
+}
+
+__m128d i2 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[1] + q[0] };
+  return r;
+}
+
+__m128d i3 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[0] + q[1] };
+  return r;
+}
+
+__m128d i4 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[1] + q[0] };
+  return r;
+}
+
+/* { dg-final { scan-assembler-times "hsubpd" 2 } } */
+/* { dg-final { scan-assembler-times "haddpd" 6 } } */
+/* { dg-final { scan-assembler-not "unpck" } } */

Property changes on: testsuite/gcc.target/i386/pr54400.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 192214)
+++ config/i386/i386.md	(working copy)
@@ -320,36 +320,36 @@ 
 ;; provided in other attributes.
 (define_attr "type"
   "other,multi,
    alu,alu1,negnot,imov,imovx,lea,
    incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
    icmp,test,ibr,setcc,icmov,
    push,pop,call,callv,leave,
    str,bitmanip,
    fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
    sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,lwp,
+   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   ssediv,sseins,ssemuladd,sse4arg,lwp,
    mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
 ;; Main data type used by the insn
 (define_attr "mode"
   "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
 (define_attr "unit" "integer,i387,sse,mmx,unknown"
   (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
 	   (const_string "i387")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-			  sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
 			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (const_string "sse")
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "mmx")
 	 (eq_attr "type" "other")
 	   (const_string "unknown")]
 	 (const_string "integer")))
 
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
@@ -592,21 +592,21 @@ 
 	   (const_string "both")
 	 (match_operand 0 "memory_operand")
 	   (const_string "store")
 	 (match_operand 1 "memory_operand")
 	   (const_string "load")
 	 (and (eq_attr "type"
 		 "!alu1,negnot,ishift1,
 		   imov,imovx,icmp,test,bitmanip,
 		   fmov,fcmp,fsgn,
 		   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
-		   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
+		   sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
 	      (match_operand 2 "memory_operand"))
 	   (const_string "load")
 	 (and (eq_attr "type" "icmov,ssemuladd,sse4arg")
 	      (match_operand 3 "memory_operand"))
 	   (const_string "load")
 	]
 	(const_string "none")))
 
 ;; Indicates if an instruction has both an immediate and a displacement.
 
Index: config/i386/athlon.md
===================================================================
--- config/i386/athlon.md	(revision 192214)
+++ config/i386/athlon.md	(working copy)
@@ -800,61 +800,61 @@ 
 			 (and (eq_attr "cpu" "athlon,k8,generic64")
 			      (eq_attr "type" "ssecomi"))
 			 "athlon-vector,athlon-fpsched,athlon-fadd")
 (define_insn_reservation "athlon_ssecomi_amdfam10" 3
 			 (and (eq_attr "cpu" "amdfam10")
 ;; It seems athlon_ssecomi has a bug in the attr_type, fixed for amdfam10
 			      (eq_attr "type" "ssecomi"))
 			 "athlon-direct,athlon-fpsched,athlon-fadd")
 (define_insn_reservation "athlon_sseadd_load" 4
 			 (and (eq_attr "cpu" "athlon")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (and (eq_attr "mode" "SF,DF,DI")
 					(eq_attr "memory" "load"))))
 			 "athlon-direct,athlon-fpload,athlon-fadd")
 (define_insn_reservation "athlon_sseadd_load_k8" 6
 			 (and (eq_attr "cpu" "k8,generic64,amdfam10")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (and (eq_attr "mode" "SF,DF,DI")
 					(eq_attr "memory" "load"))))
 			 "athlon-direct,athlon-fploadk8,athlon-fadd")
 (define_insn_reservation "athlon_sseadd" 4
 			 (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (eq_attr "mode" "SF,DF,DI")))
 			 "athlon-direct,athlon-fpsched,athlon-fadd")
 (define_insn_reservation "athlon_sseaddvector_load" 5
 			 (and (eq_attr "cpu" "athlon")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (eq_attr "memory" "load")))
 			 "athlon-vector,athlon-fpload2,(athlon-fadd*2)")
 (define_insn_reservation "athlon_sseaddvector_load_k8" 7
 			 (and (eq_attr "cpu" "k8,generic64")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (eq_attr "memory" "load")))
 			 "athlon-double,athlon-fpload2k8,(athlon-fadd*2)")
 (define_insn_reservation "athlon_sseaddvector_load_amdfam10" 6
 			 (and (eq_attr "cpu" "amdfam10")
-			      (and (eq_attr "type" "sseadd")
+			      (and (eq_attr "type" "sseadd,sseadd1")
 				   (eq_attr "memory" "load")))
 			 "athlon-direct,athlon-fploadk8,athlon-fadd")
 (define_insn_reservation "athlon_sseaddvector" 5
 			 (and (eq_attr "cpu" "athlon")
-			      (eq_attr "type" "sseadd"))
+			      (eq_attr "type" "sseadd,sseadd1"))
 			 "athlon-vector,athlon-fpsched,(athlon-fadd*2)")
 (define_insn_reservation "athlon_sseaddvector_k8" 5
 			 (and (eq_attr "cpu" "k8,generic64")
-			      (eq_attr "type" "sseadd"))
+			      (eq_attr "type" "sseadd,sseadd1"))
 			 "athlon-double,athlon-fpsched,(athlon-fadd*2)")
 (define_insn_reservation "athlon_sseaddvector_amdfam10" 4
 			 (and (eq_attr "cpu" "amdfam10")
-			      (eq_attr "type" "sseadd"))
+			      (eq_attr "type" "sseadd,sseadd1"))
 			 "athlon-direct,athlon-fpsched,athlon-fadd")
 
 ;; Conversions behaves very irregularly and the scheduling is critical here.
 ;; Take each instruction separately.  Assume that the mode is always set to the
 ;; destination one and athlon_decode is set to the K8 versions.
 
 ;; cvtss2sd
 (define_insn_reservation "athlon_ssecvt_cvtss2sd_load_k8" 4
 			 (and (eq_attr "cpu" "k8,athlon,generic64")
 			      (and (eq_attr "type" "ssecvt")
Index: config/i386/core2.md
===================================================================
--- config/i386/core2.md	(revision 192214)
+++ config/i386/core2.md	(working copy)
@@ -29,21 +29,21 @@ 
 
 ;; The core2_idiv, core2_fdiv and core2_ssediv automata are used to
 ;; model issue latencies of idiv, fdiv and ssediv type insns.
 (define_automaton "core2_decoder,core2_core,core2_idiv,core2_fdiv,core2_ssediv,core2_load,core2_store")
 
 ;; The CPU domain, used for Core i7 bypass latencies
 (define_attr "i7_domain" "int,float,simd"
   (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
 	   (const_string "float")
 	 (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-			  sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+			  sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
 			  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
 	   (cond [(eq_attr "mode" "V4DF,V8SF,V2DF,V4SF,SF,DF")
 		    (const_string "float")
 		  (eq_attr "mode" "SI")
 		    (const_string "int")]
 		  (const_string "simd"))
 	 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
 	   (const_string "simd")]
 	(const_string "int")))
 
@@ -521,27 +521,27 @@ 
 
 (define_insn_reservation "c2_sse_V4SF" 4
 			 (and (eq_attr "cpu" "core2,corei7")
 			      (and (eq_attr "mode" "V4SF")
 				   (eq_attr "type" "sse")))
 			 "c2_decoder0,c2_p1*2")
 
 (define_insn_reservation "c2_sse_addcmp" 3
 			 (and (eq_attr "cpu" "core2,corei7")
 			      (and (eq_attr "memory" "none")
-				   (eq_attr "type" "sseadd,ssecmp,ssecomi")))
+				   (eq_attr "type" "sseadd,sseadd1,ssecmp,ssecomi")))
 			 "c2_decodern,c2_p1")
 
 (define_insn_reservation "c2_sse_addcmp_load" 3
 			 (and (eq_attr "cpu" "core2,corei7")
 			      (and (eq_attr "memory" "load")
-				   (eq_attr "type" "sseadd,ssecmp,ssecomi")))
+				   (eq_attr "type" "sseadd,sseadd1,ssecmp,ssecomi")))
 			 "c2_decodern,c2_p2+c2_p1")
 
 (define_insn_reservation "c2_sse_mul_SF" 4
 			 (and (eq_attr "cpu" "core2,corei7")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "SF,V4SF")
 					(eq_attr "type" "ssemul"))))
 			"c2_decodern,c2_p0")
 
 (define_insn_reservation "c2_sse_mul_SF_load" 4
Index: config/i386/atom.md
===================================================================
--- config/i386/atom.md	(revision 192214)
+++ config/i386/atom.md	(working copy)
@@ -589,39 +589,39 @@ 
 ;; movu mem
 (define_insn_reservation  "atom_ssemov_5" 2
   (and (eq_attr "cpu" "atom")
        (and (eq_attr "type" "ssemov")
             (ior (eq_attr "movu" "1") (eq_attr "memory" "!none"))))
   "atom-complex, atom-all-eu")
 
 ;; no memory simple
 (define_insn_reservation  "atom_sseadd" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd")
+       (and (eq_attr "type" "sseadd,sseadd1")
             (and (eq_attr "memory" "none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
   "atom-fadd-5c")
 
 ;; memory simple
 (define_insn_reservation  "atom_sseadd_mem" 5
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd")
+       (and (eq_attr "type" "sseadd,sseadd1")
             (and (eq_attr "memory" "!none")
                  (and (eq_attr "mode" "!V2DF")
                       (eq_attr "atom_unit" "!complex")))))
   "atom-dual-5c")
 
 ;; maxps, minps, *pd, hadd, hsub
 (define_insn_reservation  "atom_sseadd_3" 8
   (and (eq_attr "cpu" "atom")
-       (and (eq_attr "type" "sseadd")
+       (and (eq_attr "type" "sseadd,sseadd1")
             (ior (eq_attr "mode" "V2DF") (eq_attr "atom_unit" "complex"))))
   "atom-complex, atom-all-eu*7")
 
 ;; Except dppd/dpps
 (define_insn_reservation  "atom_ssemul" 5
   (and (eq_attr "cpu" "atom")
        (and (eq_attr "type" "ssemul")
             (eq_attr "mode" "!SF")))
   "atom-fmul-5c")
 
Index: config/i386/ppro.md
===================================================================
--- config/i386/ppro.md	(revision 192214)
+++ config/i386/ppro.md	(working copy)
@@ -502,28 +502,28 @@ 
 (define_insn_reservation "ppro_sse_SF" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "mode" "SF")
 				   (eq_attr "type" "sse")))
 			 "decodern,p0")
 
 (define_insn_reservation "ppro_sse_add_SF" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "SF")
-					(eq_attr "type" "sseadd"))))
+					(eq_attr "type" "sseadd,sseadd1"))))
 			 "decodern,p1")
 
 (define_insn_reservation "ppro_sse_add_SF_load" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "load")
 				   (and (eq_attr "mode" "SF")
-					(eq_attr "type" "sseadd"))))
+					(eq_attr "type" "sseadd,sseadd1"))))
 			 "decoder0,p2+p1")
 
 (define_insn_reservation "ppro_sse_cmp_SF" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "SF")
 					(eq_attr "type" "ssecmp"))))
 			 "decoder0,p1")
 
 (define_insn_reservation "ppro_sse_cmp_SF_load" 3
@@ -612,28 +612,28 @@ 
 (define_insn_reservation "ppro_sse_V4SF" 4
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "mode" "V4SF")
 				   (eq_attr "type" "sse")))
 			 "decoder0,p1*2")
 
 (define_insn_reservation "ppro_sse_add_V4SF" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "V4SF")
-					(eq_attr "type" "sseadd"))))
+					(eq_attr "type" "sseadd,sseadd1"))))
 			 "decoder0,p1*2")
 
 (define_insn_reservation "ppro_sse_add_V4SF_load" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "load")
 				   (and (eq_attr "mode" "V4SF")
-					(eq_attr "type" "sseadd"))))
+					(eq_attr "type" "sseadd,sseadd1"))))
 			 "decoder0,(p2+p1)*2")
 
 (define_insn_reservation "ppro_sse_cmp_V4SF" 3
 			 (and (eq_attr "cpu" "pentiumpro")
 			      (and (eq_attr "memory" "none")
 				   (and (eq_attr "mode" "V4SF")
 					(eq_attr "type" "ssecmp"))))
 			 "decoder0,p1*2")
 
 (define_insn_reservation "ppro_sse_cmp_V4SF_load" 3
Index: config/i386/bdver1.md
===================================================================
--- config/i386/bdver1.md	(revision 192214)
+++ config/i386/bdver1.md	(working copy)
@@ -690,38 +690,38 @@ 
 			      (and (eq_attr "type" "ssecvt")
 				   (and (eq_attr "memory" "none")
 				        (and (match_operand:V4SF 1 "nonimmediate_operand")
 				             (ior (match_operand: V2SI 0 "register_operand")
 						  (match_operand: V4SI 0 "register_operand"))))))
 			 "bdver1-direct,bdver1-fpsched,bdver1-fcvt")
 
 ;; SSE MUL, ADD, and MULADD.
 (define_insn_reservation "bdver1_ssemuladd_load_256" 11
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
+			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
 				   (and (eq_attr "mode" "V8SF,V4DF")
 					(eq_attr "memory" "load"))))
 			 "bdver1-double,bdver1-fpload,bdver1-ffma")
 (define_insn_reservation "bdver1_ssemuladd_256" 7
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
+			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
 				   (and (eq_attr "mode" "V8SF,V4DF")
 					(eq_attr "memory" "none"))))
 			 "bdver1-double,bdver1-fpsched,bdver1-ffma")
 (define_insn_reservation "bdver1_ssemuladd_load" 10
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
+			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
 				   (eq_attr "memory" "load")))
 			 "bdver1-direct,bdver1-fpload,bdver1-ffma")
 (define_insn_reservation "bdver1_ssemuladd" 6
 			 (and (eq_attr "cpu" "bdver1,bdver2")
-			      (and (eq_attr "type" "ssemul,sseadd,ssemuladd")
+			      (and (eq_attr "type" "ssemul,sseadd,sseadd1,ssemuladd")
 				   (eq_attr "memory" "none")))
 			 "bdver1-direct,bdver1-fpsched,bdver1-ffma")
 (define_insn_reservation "bdver1_sseimul_load" 8
 			 (and (eq_attr "cpu" "bdver1,bdver2")
 			      (and (eq_attr "type" "sseimul")
 				   (eq_attr "memory" "load")))
 			 "bdver1-direct,bdver1-fpload,bdver1-fmma")
 (define_insn_reservation "bdver1_sseimul" 4
 			 (and (eq_attr "cpu" "bdver1,bdver2")
 			      (and (eq_attr "type" "sseimul")
Index: config/i386/sse.md
===================================================================
--- config/i386/sse.md	(revision 192214)
+++ config/i386/sse.md	(working copy)
@@ -1209,42 +1209,120 @@ 
 	      (vec_select:DF (match_dup 1) (parallel [(const_int 3)])))
 	    (plusminus:DF
 	      (vec_select:DF (match_dup 2) (parallel [(const_int 2)]))
 	      (vec_select:DF (match_dup 2) (parallel [(const_int 3)]))))))]
   "TARGET_AVX"
   "vh<plusminus_mnemonic>pd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sseadd")
    (set_attr "prefix" "vex")
    (set_attr "mode" "V4DF")])
 
-(define_insn "sse3_h<plusminus_insn>v2df3"
+(define_expand "sse3_haddv2df3"
+  [(set (match_operand:V2DF 0 "register_operand")
+	(vec_concat:V2DF
+	  (plus:DF
+	    (vec_select:DF
+	      (match_operand:V2DF 1 "register_operand")
+	      (parallel [(const_int 0)]))
+	    (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
+	  (plus:DF
+	    (vec_select:DF
+	      (match_operand:V2DF 2 "nonimmediate_operand")
+	      (parallel [(const_int 0)]))
+	    (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))]
+  "TARGET_SSE3")
+
+(define_insn "*sse3_haddv2df3"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x")
 	(vec_concat:V2DF
-	  (plusminus:DF
+	  (plus:DF
+	    (vec_select:DF
+	      (match_operand:V2DF 1 "register_operand" "0,x")
+	      (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))
+	    (vec_select:DF
+	      (match_dup 1)
+	      (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
+	  (plus:DF
+	    (vec_select:DF
+	      (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm")
+	      (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
+	    (vec_select:DF
+	      (match_dup 2)
+	      (parallel [(match_operand:SI 6 "const_0_to_1_operand")])))))]
+  "TARGET_SSE3 && INTVAL (operands[3]) != INTVAL (operands[4])
+   && INTVAL (operands[5]) != INTVAL (operands[6])"
+  "@
+   haddpd\t{%2, %0|%0, %2}
+   vhaddpd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "V2DF")])
+
+(define_insn "sse3_hsubv2df3"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+	(vec_concat:V2DF
+	  (minus:DF
 	    (vec_select:DF
 	      (match_operand:V2DF 1 "register_operand" "0,x")
 	      (parallel [(const_int 0)]))
 	    (vec_select:DF (match_dup 1) (parallel [(const_int 1)])))
-	  (plusminus:DF
+	  (minus:DF
 	    (vec_select:DF
 	      (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm")
 	      (parallel [(const_int 0)]))
 	    (vec_select:DF (match_dup 2) (parallel [(const_int 1)])))))]
   "TARGET_SSE3"
   "@
-   h<plusminus_mnemonic>pd\t{%2, %0|%0, %2}
-   vh<plusminus_mnemonic>pd\t{%2, %1, %0|%0, %1, %2}"
+   hsubpd\t{%2, %0|%0, %2}
+   vhsubpd\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "V2DF")])
 
+(define_insn "*sse3_haddv2df3_low"
+  [(set (match_operand:DF 0 "register_operand" "=x,x")
+	(plus:DF
+	  (vec_select:DF
+	    (match_operand:V2DF 1 "register_operand" "0,x")
+	    (parallel [(match_operand:SI 2 "const_0_to_1_operand")]))
+	  (vec_select:DF
+	    (match_dup 1)
+	    (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))))]
+  "TARGET_SSE3 && INTVAL (operands[2]) != INTVAL (operands[3])"
+  "@
+   haddpd\t{%0, %0|%0, %0}
+   vhaddpd\t{%1, %1, %0|%0, %1, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd1")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "V2DF")])
+
+(define_insn "*sse3_hsubv2df3_low"
+  [(set (match_operand:DF 0 "register_operand" "=x,x")
+	(minus:DF
+	  (vec_select:DF
+	    (match_operand:V2DF 1 "register_operand" "0,x")
+	    (parallel [(const_int 0)]))
+	  (vec_select:DF
+	    (match_dup 1)
+	    (parallel [(const_int 1)]))))]
+  "TARGET_SSE3"
+  "@
+   hsubpd\t{%0, %0|%0, %0}
+   vhsubpd\t{%1, %1, %0|%0, %1, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd1")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "V2DF")])
+
 (define_insn "avx_h<plusminus_insn>v8sf3"
   [(set (match_operand:V8SF 0 "register_operand" "=x")
 	(vec_concat:V8SF
 	  (vec_concat:V4SF
 	    (vec_concat:V2SF
 	      (plusminus:SF
 		(vec_select:SF
 		  (match_operand:V8SF 1 "register_operand" "x")
 		  (parallel [(const_int 0)]))
 		(vec_select:SF (match_dup 1) (parallel [(const_int 1)])))