Patchwork [i386] fma,fma4 and xop flags

login
register
mail settings
Submitter Uros Bizjak
Date Aug. 10, 2012, 7:59 p.m.
Message ID <CAFULd4Z2u2y64BiUjZk6FoO7MtuWjepbZ7mRQmMRW0Rc7+T=FQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/176618/
State New
Headers show

Comments

Uros Bizjak - Aug. 10, 2012, 7:59 p.m.
On Fri, Aug 10, 2012 at 7:27 AM, Gopalasubramanian, Ganesh
<Ganesh.Gopalasubramanian@amd.com> wrote:
>> -mxop implies -mfma4, but reverse is not true.
>
> I think this handling went in for bdver1.
> But, with bdver2, we have both fma and fma4.
> So for bdver2, -mxop should not be enabling one of them.
>
>> if someone set -mfma4 together
>> with -mfma on the command line, we should NOT disable selected ISA
>> behind user's back
>
> If both -mfma4 and -mfma are enabled, GCC outputs fma4 instructions.
> This, I think is because fma4 instruction patterns are read before fma
> instruction patterns from the ".md" files.

Actually, this is the problem you are trying to solve. The fma4
patterns are defined before fma3, so gcc prefers these.

> So, enabling both -mfma4 and -mfma is not good for bdver2.

You can enable both, in fact - both should be enabled, since you have
various FMA4 intrinsics defined.

> Moreover, if user tries to use, -mfma -mno-fma4 -mxop, the order in which these
> options are used becomes crucial. -mxop enables -mfma4 and by instruction
> patterns fma4 instructions gets listed in the assembly file.
>
> For the below test,
>
> double a,b,c,d;
> int fn(){
>         a = b + c * d ;
>         return a;
> }
>
> #1) Using options "-O2 -mno-fma4 -mfma -mxop" outputs fma4.
> (vfmaddsd        b(%rip), %xmm2, %xmm1, %xmm0)
> #2) Using options "-O2 -mfma -mno-fma4 -mxop" outputs fma4.
> (vfmaddsd        b(%rip), %xmm2, %xmm1, %xmm0)
> #3) Using options "-mxop -mno-fma4 -mfma" outpts fma.
> (vfmadd132sd     d(%rip), %xmm1, %xmm0)

Attached patch solves this problem by rearranging sse.md patterns to
match preferred order. So, for all your cases, fma3 instructions are
now output, while we still support FMA3 and FMA4 intrinsics.

2012-08-10  Uros Bizjak  <ubizjak@gmail.com>

	* config/i386/sse.md (*fma_fmadd_<mode>, *fma_fmsub_<mode>,
	*fma_fnmadd_<mode>, *fma_fnmsub_<mode>, *fma_fmaddsub_<mode>,
	*fma_fmsubadd_<mode>): Move FMA3 insn patterns before FMA4 patterns.

Testing on x86_64-pc-linux-gnu {,-m32} in progress. Patch will be
committed to mainline SVN.

BTW: Since this patch only rearranges the patterns, it is also safe
for 4.7. A follow-up patch will merge patterns using enabled
attribute.

Uros.
Richard Henderson - Aug. 10, 2012, 8:02 p.m.
On 2012-08-10 12:59, Uros Bizjak wrote:
> Actually, this is the problem you are trying to solve. The fma4
> patterns are defined before fma3, so gcc prefers these.

The Real Problem is that they should not be separate patterns.
They should be a single pattern that selects alternatives via
the enabled isa.


r~
Uros Bizjak - Aug. 10, 2012, 8:21 p.m.
On Fri, Aug 10, 2012 at 10:02 PM, Richard Henderson <rth@redhat.com> wrote:
> On 2012-08-10 12:59, Uros Bizjak wrote:
>> Actually, this is the problem you are trying to solve. The fma4
>> patterns are defined before fma3, so gcc prefers these.
>
> The Real Problem is that they should not be separate patterns.
> They should be a single pattern that selects alternatives via
> the enabled isa.

Yes, I am just writing the patch ;)

Do you think that enabled isa is also appropriate for 4.7?

Uros.
Richard Henderson - Aug. 10, 2012, 8:22 p.m.
On 2012-08-10 13:21, Uros Bizjak wrote:
> Do you think that enabled isa is also appropriate for 4.7?

I think it's more reliable than relying on pattern ordering.


r~
Uros Bizjak - Aug. 10, 2012, 8:25 p.m.
On Fri, Aug 10, 2012 at 10:22 PM, Richard Henderson <rth@redhat.com> wrote:
> On 2012-08-10 13:21, Uros Bizjak wrote:
>> Do you think that enabled isa is also appropriate for 4.7?
>
> I think it's more reliable than relying on pattern ordering.

OK, I will write incremental patch that uses enabled attribute for
mainline and then backport everything to 4.7 after a couple of days.

Uros.

Patch

Index: sse.md
===================================================================
--- sse.md	(revision 190298)
+++ sse.md	(working copy)
@@ -403,8 +403,6 @@ 
 ;; Mix-n-match
 (define_mode_iterator AVX256MODE2P [V8SI V8SF V4DF])
 
-(define_mode_iterator FMAMODE [SF DF V4SF V2DF V8SF V4DF])
-
 ;; Mapping of immediate bits for blend instructions
 (define_mode_attr blendbits
   [(V8SF "255") (V4SF "15") (V4DF "15") (V2DF "3")])
@@ -1886,12 +1884,13 @@ 
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
-;; FMA4 floating point multiply/accumulate instructions.  This
-;; includes the scalar version of the instructions as well as the
-;; vector.
+;; FMA floating point multiply/accumulate instructions.  These include
+;; scalar versions of the instructions as well as vector versions.
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
+(define_mode_iterator FMAMODE [SF DF V4SF V2DF V8SF V4DF])
+
 ;; In order to match (*a * *b) + *c, particularly when vectorizing, allow
 ;; combine to generate a multiply/add with two memory references.  We then
 ;; split this insn, into loading up the destination register with one of the
@@ -1907,8 +1906,6 @@ 
 ;; We could now properly represent that only one memory operand is
 ;; allowed and not be penalized during optimization.
 
-;; Intrinsic FMA operations.
-
 ;; The standard names for fma is only available with SSE math enabled.
 (define_expand "fma<mode>4"
   [(set (match_operand:FMAMODE 0 "register_operand")
@@ -1942,7 +1939,7 @@ 
 	  (neg:FMAMODE (match_operand:FMAMODE 3 "nonimmediate_operand"))))]
   "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH")
 
-;; The builtin for fma4intrin.h is not constrained by SSE math enabled.
+;; The builtin for intrinsics is not constrained by SSE math enabled.
 (define_expand "fma4i_fmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand")
 	(fma:FMAMODE
@@ -1951,7 +1948,71 @@ 
 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "TARGET_FMA || TARGET_FMA4")
 
-(define_insn "*fma4i_fmadd_<mode>"
+;; FMA3 version
+
+(define_insn "*fma_fmadd_<mode>"
+  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
+	(fma:FMAMODE
+	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x")
+	  (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm")
+	  (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0")))]
+  "TARGET_FMA"
+  "@
+   vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fmsub_<mode>"
+  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
+	(fma:FMAMODE
+	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0,x")
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
+	  (neg:FMAMODE
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))]
+  "TARGET_FMA"
+  "@
+   vfmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fnmadd_<mode>"
+  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
+	(fma:FMAMODE
+	  (neg:FMAMODE
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
+	  (match_operand:FMAMODE   3 "nonimmediate_operand" " x,xm,0")))]
+  "TARGET_FMA"
+  "@
+   vfnmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfnmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfnmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fnmsub_<mode>"
+  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
+	(fma:FMAMODE
+	  (neg:FMAMODE
+	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x"))
+	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
+	  (neg:FMAMODE
+	    (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))]
+  "TARGET_FMA"
+  "@
+   vfnmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfnmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfnmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; FMA4 version
+
+(define_insn "*fma4_fmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x")
@@ -1962,7 +2023,7 @@ 
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma4i_fmsub_<mode>"
+(define_insn "*fma4_fmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=x,x")
 	(fma:FMAMODE
 	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x")
@@ -1974,7 +2035,7 @@ 
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma4i_fnmadd_<mode>"
+(define_insn "*fma4_fnmadd_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
@@ -1986,7 +2047,7 @@ 
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "*fma4i_fnmsub_<mode>"
+(define_insn "*fma4_fnmsub_<mode>"
   [(set (match_operand:FMAMODE 0 "register_operand" "=x,x")
 	(fma:FMAMODE
 	  (neg:FMAMODE
@@ -1999,23 +2060,89 @@ 
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
-;; Scalar versions of the above.  Unlike ADDSS et al, these write the
-;; entire destination register, with the high-order elements zeroed.
+;; FMA parallel floating point multiply addsub and subadd operations.
 
-(define_expand "fma4i_vmfmadd_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand")
-	(vec_merge:VF_128
-	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand")
-	    (match_operand:VF_128 2 "nonimmediate_operand")
-	    (match_operand:VF_128 3 "nonimmediate_operand"))
-	  (match_dup 4)
-	  (const_int 1)))]
+;; It would be possible to represent these without the UNSPEC as
+;;
+;; (vec_merge
+;;   (fma op1 op2 op3)
+;;   (fma op1 op2 (neg op3))
+;;   (merge-const))
+;;
+;; But this doesn't seem useful in practice.
+
+(define_expand "fmaddsub_<mode>"
+  [(set (match_operand:VF 0 "register_operand")
+	(unspec:VF
+	  [(match_operand:VF 1 "nonimmediate_operand")
+	   (match_operand:VF 2 "nonimmediate_operand")
+	   (match_operand:VF 3 "nonimmediate_operand")]
+	  UNSPEC_FMADDSUB))]
+  "TARGET_FMA || TARGET_FMA4")
+
+;; FMA3 version
+
+(define_insn "*fma_fmaddsub_<mode>"
+  [(set (match_operand:VF 0 "register_operand" "=x,x,x")
+	(unspec:VF
+	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x")
+	   (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm")
+	   (match_operand:VF 3 "nonimmediate_operand" " x,xm,0")]
+	  UNSPEC_FMADDSUB))]
+  "TARGET_FMA"
+  "@
+   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+(define_insn "*fma_fmsubadd_<mode>"
+  [(set (match_operand:VF 0 "register_operand" "=x,x,x")
+	(unspec:VF
+	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0,x")
+	   (match_operand:VF   2 "nonimmediate_operand" "xm, x,xm")
+	   (neg:VF
+	     (match_operand:VF 3 "nonimmediate_operand" " x,xm,0"))]
+	  UNSPEC_FMADDSUB))]
+  "TARGET_FMA"
+  "@
+   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
+   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
+   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; FMA4 version
+
+(define_insn "*fma4_fmaddsub_<mode>"
+  [(set (match_operand:VF 0 "register_operand" "=x,x")
+	(unspec:VF
+	  [(match_operand:VF 1 "nonimmediate_operand" "%x,x")
+	   (match_operand:VF 2 "nonimmediate_operand" " x,m")
+	   (match_operand:VF 3 "nonimmediate_operand" "xm,x")]
+	  UNSPEC_FMADDSUB))]
   "TARGET_FMA4"
-{
-  operands[4] = CONST0_RTX (<MODE>mode);
-})
+  "vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
 
+(define_insn "*fma4_fmsubadd_<mode>"
+  [(set (match_operand:VF 0 "register_operand" "=x,x")
+	(unspec:VF
+	  [(match_operand:VF 1 "nonimmediate_operand" "%x,x")
+	   (match_operand:VF 2 "nonimmediate_operand" " x,m")
+	   (neg:VF
+	     (match_operand:VF 3 "nonimmediate_operand" "xm,x"))]
+	  UNSPEC_FMADDSUB))]
+  "TARGET_FMA4"
+  "vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ssemuladd")
+   (set_attr "mode" "<MODE>")])
+
+;; FMA3 floating point scalar intrinsics. These merge result with
+;; high-order elements from the destination register.
+
 (define_expand "fmai_vmfmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand")
 	(vec_merge:VF_128
@@ -2099,6 +2226,23 @@ 
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
+;; FMA4 floating point scalar intrinsics.  These write the
+;; entire destination register, with the high-order elements zeroed.
+
+(define_expand "fma4i_vmfmadd_<mode>"
+  [(set (match_operand:VF_128 0 "register_operand")
+	(vec_merge:VF_128
+	  (fma:VF_128
+	    (match_operand:VF_128 1 "nonimmediate_operand")
+	    (match_operand:VF_128 2 "nonimmediate_operand")
+	    (match_operand:VF_128 3 "nonimmediate_operand"))
+	  (match_dup 4)
+	  (const_int 1)))]
+  "TARGET_FMA4"
+{
+  operands[4] = CONST0_RTX (<MODE>mode);
+})
+
 (define_insn "*fma4i_vmfmadd_<mode>"
   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 	(vec_merge:VF_128
@@ -2161,152 +2305,6 @@ 
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
-;; FMA4 Parallel floating point multiply addsub and subadd operations.
-;;
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-;; It would be possible to represent these without the UNSPEC as
-;;
-;; (vec_merge
-;;   (fma op1 op2 op3)
-;;   (fma op1 op2 (neg op3))
-;;   (merge-const))
-;;
-;; But this doesn't seem useful in practice.
-
-(define_expand "fmaddsub_<mode>"
-  [(set (match_operand:VF 0 "register_operand")
-	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand")
-	   (match_operand:VF 2 "nonimmediate_operand")
-	   (match_operand:VF 3 "nonimmediate_operand")]
-	  UNSPEC_FMADDSUB))]
-  "TARGET_FMA || TARGET_FMA4")
-
-(define_insn "*fma4_fmaddsub_<mode>"
-  [(set (match_operand:VF 0 "register_operand" "=x,x")
-	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" " x,m")
-	   (match_operand:VF 3 "nonimmediate_operand" "xm,x")]
-	  UNSPEC_FMADDSUB))]
-  "TARGET_FMA4"
-  "vfmaddsub<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma4_fmsubadd_<mode>"
-  [(set (match_operand:VF 0 "register_operand" "=x,x")
-	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%x,x")
-	   (match_operand:VF 2 "nonimmediate_operand" " x,m")
-	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" "xm,x"))]
-	  UNSPEC_FMADDSUB))]
-  "TARGET_FMA4"
-  "vfmsubadd<ssemodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;
-;; FMA3 floating point multiply/accumulate instructions.
-;;
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-
-(define_insn "*fma_fmadd_<mode>"
-  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
-	(fma:FMAMODE
-	  (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x")
-	  (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm")
-	  (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0")))]
-  "TARGET_FMA"
-  "@
-   vfmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma_fmsub_<mode>"
-  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
-	(fma:FMAMODE
-	  (match_operand:FMAMODE   1 "nonimmediate_operand" "%0, 0,x")
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
-	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))]
-  "TARGET_FMA"
-  "@
-   vfmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma_fnmadd_<mode>"
-  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
-	(fma:FMAMODE
-	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
-	  (match_operand:FMAMODE   3 "nonimmediate_operand" " x,xm,0")))]
-  "TARGET_FMA"
-  "@
-   vfnmadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma_fnmsub_<mode>"
-  [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x")
-	(fma:FMAMODE
-	  (neg:FMAMODE
-	    (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x"))
-	  (match_operand:FMAMODE   2 "nonimmediate_operand" "xm, x,xm")
-	  (neg:FMAMODE
-	    (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))]
-  "TARGET_FMA"
-  "@
-   vfnmsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma_fmaddsub_<mode>"
-  [(set (match_operand:VF 0 "register_operand" "=x,x,x")
-	(unspec:VF
-	  [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x")
-	   (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm")
-	   (match_operand:VF 3 "nonimmediate_operand" " x,xm,0")]
-	  UNSPEC_FMADDSUB))]
-  "TARGET_FMA"
-  "@
-   vfmaddsub132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmaddsub213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmaddsub231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-(define_insn "*fma_fmsubadd_<mode>"
-  [(set (match_operand:VF 0 "register_operand" "=x,x,x")
-	(unspec:VF
-	  [(match_operand:VF   1 "nonimmediate_operand" "%0, 0,x")
-	   (match_operand:VF   2 "nonimmediate_operand" "xm, x,xm")
-	   (neg:VF
-	     (match_operand:VF 3 "nonimmediate_operand" " x,xm,0"))]
-	  UNSPEC_FMADDSUB))]
-  "TARGET_FMA"
-  "@
-   vfmsubadd132<ssemodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsubadd213<ssemodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsubadd231<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssemuladd")
-   (set_attr "mode" "<MODE>")])
-
-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
-;;
 ;; Parallel single-precision floating point conversion operations
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;