diff mbox

Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

Message ID 20120913155258.GF22619@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Sept. 13, 2012, 3:52 p.m. UTC
Hi!

The fma-*.c testcase show that these intrinsics probably mean to preserve
the high elements (other than the lowest) of the first argument of the
fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
there the destination register, but that varies - for 132 and 213 it is the
first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
for 231 it is the last one).  What the expander did was to put there
an uninitialized pseudo, so we ended up with pretty random content, before
H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
to work by accident, but when things changed slightly and reload chose
different alternative, this broke.

The following patch fixes it, by tweaking the header so that the first
argument is not negated (we negate the second one instead), as we don't want
to negate the high elements if e.g. for whatever reason combiner doesn't
match it.  It fixes the expander to use a dup of the X operand as the high
element provider for the pattern, removes the 231 alternatives (because
those provide different destination high elements) and removes commutative
marker (again, that would mean different high elements).

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
with
make check-gcc RUNTESTFLAGS='--target_board=valgrind-sim/-m64 i386.exp=\*fma\*'
Ok for trunk/4.7?

2012-09-13  Jakub Jelinek  <jakub@redhat.com>

	PR target/54564
	* config/i386/sse.md (fmai_vmfmadd_<mode>): Use (match_dup 1)
	instead of (match_dup 0) as second argument to vec_merge.
	(*fmai_fmadd_<mode>, *fmai_fmsub_<mode>): Likewise.
	Remove third alternative.
	(*fmai_fnmadd_<mode>, *fmai_fnmsub_<mode>): Likewise.  Negate
	operand 2 instead of operand 1, but put it as first argument
	of fma.

	* config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
	_mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
	of the first.


	Jakub

Comments

Uros Bizjak Sept. 13, 2012, 5:42 p.m. UTC | #1
On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> The fma-*.c testcase show that these intrinsics probably mean to preserve
> the high elements (other than the lowest) of the first argument of the
> fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
> there the destination register, but that varies - for 132 and 213 it is the
> first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
> for 231 it is the last one).  What the expander did was to put there
> an uninitialized pseudo, so we ended up with pretty random content, before
> H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
> to work by accident, but when things changed slightly and reload chose
> different alternative, this broke.
>
> The following patch fixes it, by tweaking the header so that the first
> argument is not negated (we negate the second one instead), as we don't want
> to negate the high elements if e.g. for whatever reason combiner doesn't
> match it.  It fixes the expander to use a dup of the X operand as the high
> element provider for the pattern, removes the 231 alternatives (because
> those provide different destination high elements) and removes commutative
> marker (again, that would mean different high elements).

Can we introduce additional "*fmai_fmadd_<mode>_1" pattern (and
others) that would cover missing 231 alternative?

> 2012-09-13  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/54564
>         * config/i386/sse.md (fmai_vmfmadd_<mode>): Use (match_dup 1)
>         instead of (match_dup 0) as second argument to vec_merge.
>         (*fmai_fmadd_<mode>, *fmai_fmsub_<mode>): Likewise.
>         Remove third alternative.
>         (*fmai_fnmadd_<mode>, *fmai_fnmsub_<mode>): Likewise.  Negate
>         operand 2 instead of operand 1, but put it as first argument
>         of fma.
>
>         * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
>         _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
>         of the first.

OK, but header change should be also reviewed by H.J.

Thanks,
Uros.
Jakub Jelinek Sept. 13, 2012, 5:48 p.m. UTC | #2
On Thu, Sep 13, 2012 at 07:42:17PM +0200, Uros Bizjak wrote:
> Can we introduce additional "*fmai_fmadd_<mode>_1" pattern (and
> others) that would cover missing 231 alternative?

Sure.  Will cook up a patch soon.

> > 2012-09-13  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR target/54564
> >         * config/i386/sse.md (fmai_vmfmadd_<mode>): Use (match_dup 1)
> >         instead of (match_dup 0) as second argument to vec_merge.
> >         (*fmai_fmadd_<mode>, *fmai_fmsub_<mode>): Likewise.
> >         Remove third alternative.
> >         (*fmai_fnmadd_<mode>, *fmai_fnmsub_<mode>): Likewise.  Negate
> >         operand 2 instead of operand 1, but put it as first argument
> >         of fma.
> >
> >         * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
> >         _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
> >         of the first.
> 
> OK, but header change should be also reviewed by H.J.

H.J., are you ok with this?

	Jakub
Richard Henderson Sept. 13, 2012, 6:25 p.m. UTC | #3
On 09/13/2012 08:52 AM, Jakub Jelinek wrote:
> The following patch fixes it, by tweaking the header so that the first
> argument is not negated (we negate the second one instead), as we don't want
> to negate the high elements if e.g. for whatever reason combiner doesn't
> match it.  It fixes the expander to use a dup of the X operand as the high
> element provider for the pattern, removes the 231 alternatives (because
> those provide different destination high elements) and removes commutative
> marker (again, that would mean different high elements).

I don't think this is the best way to fix this up.

(1) Negating the second argument is arguably non-canonical rtl.

(2) It's not the best match if we were to extend these builtins to FMA4.
    There we really do have 4 inputs.  Thus

(define_insn "*fmai_fmadd_<mode>_4"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
        (vec_merge:VF_128
          (fma:VF_128
            (match_operand:VF_128 1 "nonimmediate_operand" "%x,x")
            (match_operand:VF_128 2 "nonimmediate_operand" " x,m")
            (match_operand:VF_128 3 "nonimmediate_operand" "xm,x"))
          (match_operand:VF_128 4 "register_operand" "0,0")
          (const_int 1)))]
  "TARGET_FMA4"
  "vfmadd<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])

which we can just as easily do by passing the appropriate input to
the expander.

It would be nice if Intel cleaned up their documentation for the
builtin, explicitly saying which high bits to copy.  I agree that
the testcase is probably as normative as we'll get though.


r~
Richard Henderson Sept. 13, 2012, 6:28 p.m. UTC | #4
On 09/13/2012 10:42 AM, Uros Bizjak wrote:
> Can we introduce additional "*fmai_fmadd_<mode>_1" pattern (and
> others) that would cover missing 231 alternative?

I really don't think that's necessary.  For that you'd need to
be computing fma(x, y, x), at which point vfmadd213 matches as well.


r~
Jakub Jelinek Sept. 13, 2012, 6:49 p.m. UTC | #5
On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
> On 09/13/2012 08:52 AM, Jakub Jelinek wrote:
> > The following patch fixes it, by tweaking the header so that the first
> > argument is not negated (we negate the second one instead), as we don't want
> > to negate the high elements if e.g. for whatever reason combiner doesn't
> > match it.  It fixes the expander to use a dup of the X operand as the high
> > element provider for the pattern, removes the 231 alternatives (because
> > those provide different destination high elements) and removes commutative
> > marker (again, that would mean different high elements).
> 
> I don't think this is the best way to fix this up.
> 
> (1) Negating the second argument is arguably non-canonical rtl.

That is why I've put in the *fmai_fnm{add,sub}_<mode> patterns
operands 2 with the neg as first operand of the FMA rtl.  That way it is
canonical (otherwise it didn't match in combine).  The FMA rtl operand
order doesn't need to imply the order of instruction operands.
The reason why I don't want to negate in fmaintrin.h the first operand
is that the higher elements of it shouldn't be negated.
So, when the second argument to the builtin is negated, combiner substitutes
... (fma (reg1) (neg (reg2) (reg3))) (reg1) ..., then canonicalizes
to ... (fma (neg (reg2) (reg1) (reg3))) (reg1) ... and it matches that way.

> (2) It's not the best match if we were to extend these builtins to FMA4.
>     There we really do have 4 inputs.  Thus

The first thing I've tried was for the *fmai_fmadd_<mode> patterns to use
match_operand:VF_128 4 instead of the match_dup I'm using now.  But that way
I've ended up with:

(define_insn "*fmai_fmadd_<mode>"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
        (vec_merge:VF_128
          (fma:VF_128
            (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
            (match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
            (match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
          (match_operand:VF_128   4 "nonimmediate_operand" " 0, 0")
          (const_int 1)))]
  "TARGET_FMA"
  "@
   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}"
  [(set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])

which was apparently too much for reload (supposedly the two "0" constraint
operands, even when the expander used (match_dup 1)).

> It would be nice if Intel cleaned up their documentation for the
> builtin, explicitly saying which high bits to copy.  I agree that
> the testcase is probably as normative as we'll get though.

Usually the docs just document instructions and list intrinsics.
When there is non-obvious match between the intrinsics and instructions, we
have a problem of insufficiently documented semantics.

	Jakub
Jakub Jelinek Sept. 13, 2012, 6:50 p.m. UTC | #6
On Thu, Sep 13, 2012 at 11:28:11AM -0700, Richard Henderson wrote:
> On 09/13/2012 10:42 AM, Uros Bizjak wrote:
> > Can we introduce additional "*fmai_fmadd_<mode>_1" pattern (and
> > others) that would cover missing 231 alternative?
> 
> I really don't think that's necessary.  For that you'd need to
> be computing fma(x, y, x), at which point vfmadd213 matches as well.

I agree.  I've posted the patch just because Uros requested it.

	Jakub
Jakub Jelinek Sept. 13, 2012, 7:03 p.m. UTC | #7
On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
> (2) It's not the best match if we were to extend these builtins to FMA4.
>     There we really do have 4 inputs.  Thus

How could you extend these builtins to FMA4 BTW?  Doesn't FMA4 zero up the
high elements?  In that case you'd need to expand it as copy of the X
operand register to DEST, doing vfmadd{ss,sd} to a temp register and
followed by vmovss/vmovsd instruction.

> (define_insn "*fmai_fmadd_<mode>_4"
>   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
>         (vec_merge:VF_128
>           (fma:VF_128
>             (match_operand:VF_128 1 "nonimmediate_operand" "%x,x")
>             (match_operand:VF_128 2 "nonimmediate_operand" " x,m")
>             (match_operand:VF_128 3 "nonimmediate_operand" "xm,x"))
>           (match_operand:VF_128 4 "register_operand" "0,0")
>           (const_int 1)))]
>   "TARGET_FMA4"
>   "vfmadd<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>   [(set_attr "type" "ssemuladd")
>    (set_attr "mode" "<MODE>")])

	Jakub
Richard Henderson Sept. 13, 2012, 7:11 p.m. UTC | #8
On 09/13/2012 11:49 AM, Jakub Jelinek wrote:
> On Thu, Sep 13, 2012 at 11:25:42AM -0700, Richard Henderson wrote:
>> (1) Negating the second argument is arguably non-canonical rtl.
> 
> That is why I've put in the *fmai_fnm{add,sub}_<mode> patterns
> operands 2 with the neg as first operand of the FMA rtl.  That way it is
> canonical (otherwise it didn't match in combine).  The FMA rtl operand
> order doesn't need to imply the order of instruction operands.

Sorry, I didn't read the unidiff properly.

>           (fma:VF_128
>             (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
>             (match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
>             (match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
>           (match_operand:VF_128   4 "nonimmediate_operand" " 0, 0")
...
> which was apparently too much for reload (supposedly the two "0" constraint
> operands, even when the expander used (match_dup 1)).

Yes.  We'd have to have two different patterns to "properly" support fma4.

Though I suppose now that I think about it this is extremely similar to
the vfmadd231 case, in that in order to want to generate

	vfmaddss  %xmm3, %xmm2, %xmm1, %xmm0

given the semantics of the builtin we'd have had to emit a copy of %xmm1
or %xmm2 into %xmm0 anyway.  So we might as well not support this and just do

(define_insn "*fmai_fmadd_<mode>"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x,x")
        (vec_merge:VF_128
          (fma:VF_128
            (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0, 0,0")
            (match_operand:VF_128 2 "nonimmediate_operand" "xm, x, x,m")
            (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,xm,x"))
          (match_dup 0)
          (const_int 1)))]
  "TARGET_FMA || TARGET_FMA4"
  "@
   vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
   vfmadd<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
   vfmadd<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
  [(set_attr "isa" "fma,fma,fma4,fma4")
   (set_attr "type" "ssemuladd")
   (set_attr "mode" "<MODE>")])


r~
Richard Henderson Sept. 13, 2012, 7:13 p.m. UTC | #9
On 09/13/2012 12:03 PM, Jakub Jelinek wrote:
> How could you extend these builtins to FMA4 BTW?  Doesn't FMA4 zero up the
> high elements?

Duh.  You're absolutely right.  I'd mis-read the document as clearing the
high lane of the %ymm register only.


r~
H.J. Lu Sept. 14, 2012, 12:18 p.m. UTC | #10
On Thu, Sep 13, 2012 at 10:42 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>
>> The fma-*.c testcase show that these intrinsics probably mean to preserve
>> the high elements (other than the lowest) of the first argument of the
>> fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
>> there the destination register, but that varies - for 132 and 213 it is the
>> first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
>> for 231 it is the last one).  What the expander did was to put there
>> an uninitialized pseudo, so we ended up with pretty random content, before
>> H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
>> to work by accident, but when things changed slightly and reload chose
>> different alternative, this broke.
>>
>> The following patch fixes it, by tweaking the header so that the first
>> argument is not negated (we negate the second one instead), as we don't want
>> to negate the high elements if e.g. for whatever reason combiner doesn't
>> match it.  It fixes the expander to use a dup of the X operand as the high
>> element provider for the pattern, removes the 231 alternatives (because
>> those provide different destination high elements) and removes commutative
>> marker (again, that would mean different high elements).
>
> Can we introduce additional "*fmai_fmadd_<mode>_1" pattern (and
> others) that would cover missing 231 alternative?
>
>> 2012-09-13  Jakub Jelinek  <jakub@redhat.com>
>>
>>         PR target/54564
>>         * config/i386/sse.md (fmai_vmfmadd_<mode>): Use (match_dup 1)
>>         instead of (match_dup 0) as second argument to vec_merge.
>>         (*fmai_fmadd_<mode>, *fmai_fmsub_<mode>): Likewise.
>>         Remove third alternative.
>>         (*fmai_fnmadd_<mode>, *fmai_fnmsub_<mode>): Likewise.  Negate
>>         operand 2 instead of operand 1, but put it as first argument
>>         of fma.
>>
>>         * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
>>         _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
>>         of the first.
>
> OK, but header change should be also reviewed by H.J.
>

It looks OK to me.

Thanks.
diff mbox

Patch

--- gcc/config/i386/sse.md.jj	2012-09-05 18:27:03.000000000 +0200
+++ gcc/config/i386/sse.md	2012-09-13 13:49:49.504968716 +0200
@@ -2072,79 +2072,75 @@  (define_expand "fmai_vmfmadd_<mode>"
 	    (match_operand:VF_128 1 "nonimmediate_operand")
 	    (match_operand:VF_128 2 "nonimmediate_operand")
 	    (match_operand:VF_128 3 "nonimmediate_operand"))
-	  (match_dup 0)
+	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA")
 
 (define_insn "*fmai_fmadd_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0,x")
-	    (match_operand:VF_128 2 "nonimmediate_operand" "xm, x,xm")
-	    (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,0"))
-	  (match_dup 0)
+	    (match_operand:VF_128 1 "nonimmediate_operand" " 0, 0")
+	    (match_operand:VF_128 2 "nonimmediate_operand" "xm, x")
+	    (match_operand:VF_128 3 "nonimmediate_operand" " x,xm"))
+	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA"
   "@
    vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmadd231<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fmsub_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
         (vec_merge:VF_128
 	  (fma:VF_128
-	    (match_operand:VF_128   1 "nonimmediate_operand" "%0, 0,x")
-	    (match_operand:VF_128   2 "nonimmediate_operand" "xm, x,xm")
+	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+	    (match_operand:VF_128   2 "nonimmediate_operand" "xm, x")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,0")))
-	  (match_dup 0)
+	      (match_operand:VF_128 3 "nonimmediate_operand" " x,xm")))
+	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA"
   "@
    vfmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfmsub231<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vfmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fnmadd_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0,x"))
-	    (match_operand:VF_128   2 "nonimmediate_operand" "xm, x,xm")
-	    (match_operand:VF_128   3 "nonimmediate_operand" " x,xm,0"))
-	  (match_dup 0)
+	      (match_operand:VF_128 2 "nonimmediate_operand" "xm, x"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
+	    (match_operand:VF_128   3 "nonimmediate_operand" " x,xm"))
+	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA"
   "@
    vfnmadd132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmadd231<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vfnmadd213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "*fmai_fnmsub_<mode>"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
         (vec_merge:VF_128
 	  (fma:VF_128
 	    (neg:VF_128
-	      (match_operand:VF_128 1 "nonimmediate_operand" "%0, 0,x"))
-	    (match_operand:VF_128   2 "nonimmediate_operand" "xm, x,xm")
+	      (match_operand:VF_128 2 "nonimmediate_operand" "xm, x"))
+	    (match_operand:VF_128   1 "nonimmediate_operand" " 0, 0")
 	    (neg:VF_128
-	      (match_operand:VF_128 3 "nonimmediate_operand" " x,xm,0")))
-	  (match_dup 0)
+	      (match_operand:VF_128 3 "nonimmediate_operand" " x,xm")))
+	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_FMA"
   "@
    vfnmsub132<ssescalarmodesuffix>\t{%2, %3, %0|%0, %3, %2}
-   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
-   vfnmsub231<ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
+   vfnmsub213<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssemuladd")
    (set_attr "mode" "<MODE>")])
 
--- gcc/config/i386/fmaintrin.h.jj	2011-09-02 16:29:38.000000000 +0200
+++ gcc/config/i386/fmaintrin.h	2012-09-13 13:32:20.162333244 +0200
@@ -1,4 +1,4 @@ 
-/* Copyright (C) 2011 Free Software Foundation, Inc.
+/* Copyright (C) 2011, 2012 Free Software Foundation, Inc.
 
    This file is part of GCC.
 
@@ -164,7 +164,7 @@  extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_sd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddsd3 (-(__v2df)__A, (__v2df)__B,
+  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B,
                                             (__v2df)__C);
 }
 
@@ -172,7 +172,7 @@  extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmadd_ss (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddss3 (-(__v4sf)__A, (__v4sf)__B,
+  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B,
                                            (__v4sf)__C);
 }
 
@@ -212,7 +212,7 @@  extern __inline __m128d
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_sd (__m128d __A, __m128d __B, __m128d __C)
 {
-  return (__m128d)__builtin_ia32_vfmaddsd3 (-(__v2df)__A, (__v2df)__B,
+  return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B,
                                             -(__v2df)__C);
 }
 
@@ -220,7 +220,7 @@  extern __inline __m128
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_fnmsub_ss (__m128 __A, __m128 __B, __m128 __C)
 {
-  return (__m128)__builtin_ia32_vfmaddss3 (-(__v4sf)__A, (__v4sf)__B,
+  return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B,
                                            -(__v4sf)__C);
 }