Patchwork [RFC] Add FMA support to sparc backend

login
register
mail settings
Submitter David Miller
Date Sept. 14, 2011, 8 a.m.
Message ID <20110914.040002.1635102376295493225.davem@davemloft.net>
Download mbox | patch
Permalink /patch/114588/
State New
Headers show

Comments

David Miller - Sept. 14, 2011, 8 a.m.
Eric, this is a preliminary version of the FMA patch I've been
working on.  Just so you can see what I'm doing.

First, ignore the fact that there are two configure tests for the
presence of support for these instructions.  I'm busy normalizing the
-xarch options which binutils supports so that they are the same as
Sun AS and therefore just one test is necessary.

Second, like rs6000 the sparc negate fused multiply instructions
negate the full result, not the multiply result.  So we cannot use
those instructions for the fnmadf4/fnmsdf4/fnmasf4/fnmssf4 patterns.
Since rs6000 provides patterns for such negate operations (presumably
just in case the combiner creates a match) I have done so for sparc
as well.

I was really surprised that cpu designers haven't settled on a
consistent formula for negated fused multiply add/sub instructions.
Ho hum...

For now my plan is to turn these fused multiply instructions on if you
ask to compile targetting a cpu that supports them.

I'll write a suitable changelog etc. once everything is finalized, this
patch posting is just to elicit feedback.

Thanks!
Michael Meissner - Sept. 15, 2011, 6:16 p.m.
On Wed, Sep 14, 2011 at 04:00:02AM -0400, David Miller wrote:
> 
> Eric, this is a preliminary version of the FMA patch I've been
> working on.  Just so you can see what I'm doing.
> 
> First, ignore the fact that there are two configure tests for the
> presence of support for these instructions.  I'm busy normalizing the
> -xarch options which binutils supports so that they are the same as
> Sun AS and therefore just one test is necessary.
> 
> Second, like rs6000 the sparc negate fused multiply instructions
> negate the full result, not the multiply result.  So we cannot use
> those instructions for the fnmadf4/fnmsdf4/fnmasf4/fnmssf4 patterns.
> Since rs6000 provides patterns for such negate operations (presumably
> just in case the combiner creates a match) I have done so for sparc
> as well.
> 
> I was really surprised that cpu designers haven't settled on a
> consistent formula for negated fused multiply add/sub instructions.
> Ho hum...

On the powerpc, we have an issue with Spec 2006 and calculix when FMAs are
generated and -ffast-math is used, where line 307 of rubber.f is:

	tt=datan2(dsqrt(1.d0-cn*cn),cn)/3.d0

The FNMSUB instruction generates a -0.0 while doing the multiply and subtract
generates +0.0.  Dsqrt returns a -0.0 when given a -0.0, and datan2 (-0.0, 1.0)
returns -0.0.  Note, calculix is nothing but nested FMAs, and if you disable
FMAs you get about a 10% drop in performance.  I suspect that the issue may be
a powerpc backend issue where the wrong comparison is generated, but I haven't
tracked it down.
Eric Botcazou - Sept. 16, 2011, 8:25 p.m.
> Second, like rs6000 the sparc negate fused multiply instructions
> negate the full result, not the multiply result.  So we cannot use
> those instructions for the fnmadf4/fnmsdf4/fnmasf4/fnmssf4 patterns.
> Since rs6000 provides patterns for such negate operations (presumably
> just in case the combiner creates a match) I have done so for sparc
> as well.

OK, this makes sense indeed.

> For now my plan is to turn these fused multiply instructions on if you
> ask to compile targetting a cpu that supports them.

> I'll write a suitable changelog etc. once everything is finalized, this
> patch posting is just to elicit feedback.

What's the story with TFmode for FMA?

Thanks for working on this!
David Miller - Sept. 16, 2011, 8:39 p.m.
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Fri, 16 Sep 2011 22:25:41 +0200

> What's the story with TFmode for FMA?

There have never been TFmode float operations implemented in hardware
ever for sparc, and I doubt we'll see it in the future.

And this applies also to the FMA instructions.

And especially since the presence of the FMA patterns is meant to be a
performance enhancement, I don't see much value to considering TFmode
cases.

Did you have something specific in mind?

> Thanks for working on this!

No problem.
Eric Botcazou - Sept. 16, 2011, 8:53 p.m.
> There have never been TFmode float operations implemented in hardware
> ever for sparc, and I doubt we'll see it in the future.
>
> And this applies also to the FMA instructions.

Do the specs totally disregard quad floats for FMA or...?

> And especially since the presence of the FMA patterns is meant to be a
> performance enhancement, I don't see much value to considering TFmode
> cases.
>
> Did you have something specific in mind?

No, this was purely for my own education. :-)  Maybe a comment explaining the 
situation/(non-)implementation choice wrt TFmode would be in order.
David Miller - Sept. 16, 2011, 9:04 p.m.
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Fri, 16 Sep 2011 22:53:09 +0200

>> There have never been TFmode float operations implemented in hardware
>> ever for sparc, and I doubt we'll see it in the future.
>>
>> And this applies also to the FMA instructions.
> 
> Do the specs totally disregard quad floats for FMA or...?

The documentation I've read merely states that presence of single and
double precision versions of these instructions, and their behavior.

The same is also the case for all of the HPC instructions (such as
"fhadd" which is "floating point add and halve").  Only single and
double precision versions are provided and described.

Absolutely no consideration nor mention is made to quad precision at
all.

These are instruction set extensions, rather than an addition or
modification to v9.  So I wouldn't go so far as to say that they have
some requirement to take quad floating point into consideration, or
even mention it at all.
David Miller - Sept. 21, 2011, 9:01 p.m.
From: Michael Meissner <meissner@linux.vnet.ibm.com>
Date: Thu, 15 Sep 2011 14:16:45 -0400

> On the powerpc, we have an issue with Spec 2006 and calculix when FMAs are
> generated and -ffast-math is used, where line 307 of rubber.f is:
> 
> 	tt=datan2(dsqrt(1.d0-cn*cn),cn)/3.d0
> 
> The FNMSUB instruction generates a -0.0 while doing the multiply and subtract
> generates +0.0.  Dsqrt returns a -0.0 when given a -0.0, and datan2 (-0.0, 1.0)
> returns -0.0.  Note, calculix is nothing but nested FMAs, and if you disable
> FMAs you get about a 10% drop in performance.  I suspect that the issue may be
> a powerpc backend issue where the wrong comparison is generated, but I haven't
> tracked it down.

Have you learned anything more about this?  Just curious.
Michael Meissner - Sept. 22, 2011, 6:39 p.m.
On Wed, Sep 21, 2011 at 05:01:31PM -0400, David Miller wrote:
> From: Michael Meissner <meissner@linux.vnet.ibm.com>
> Date: Thu, 15 Sep 2011 14:16:45 -0400
> 
> > On the powerpc, we have an issue with Spec 2006 and calculix when FMAs are
> > generated and -ffast-math is used, where line 307 of rubber.f is:
> > 
> > 	tt=datan2(dsqrt(1.d0-cn*cn),cn)/3.d0
> > 
> > The FNMSUB instruction generates a -0.0 while doing the multiply and subtract
> > generates +0.0.  Dsqrt returns a -0.0 when given a -0.0, and datan2 (-0.0, 1.0)
> > returns -0.0.  Note, calculix is nothing but nested FMAs, and if you disable
> > FMAs you get about a 10% drop in performance.  I suspect that the issue may be
> > a powerpc backend issue where the wrong comparison is generated, but I haven't
> > tracked it down.
> 
> Have you learned anything more about this?  Just curious.

No, I haven't had time to go back to this.  At the moment, I am compiling stuff
with -Wl,--wrap,atan2 and using an atan2 wrapper that adds 0.0 to the value so
-0.0 is not returned.

Patch

diff --git a/gcc/config.in b/gcc/config.in
index d202b038..4bb6271 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -272,6 +272,12 @@ 
 #endif
 
 
+/* Define if your assembler supports FMAF instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_GAS_FMAF
+#endif
+
+
 /* Define if your assembler supports the --gdwarf2 option. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_GDWARF2_DEBUG_FLAG
@@ -479,6 +485,12 @@ 
 #endif
 
 
+/* Define if your assembler supports FMAF instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SUN_FMAF
+#endif
+
+
 /* Define if your assembler and linker support thread-local storage. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_TLS
@@ -1047,12 +1059,6 @@ 
 #endif
 
 
-/* Define if _Unwind_GetIPInfo is available. */
-#ifndef USED_FOR_TARGET
-#undef HAVE_GETIPINFO
-#endif
-
-
 /* Define to 1 if you have the `getrlimit' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GETRLIMIT
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index cf9e197..ce15730 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -752,9 +752,9 @@  sparc_option_override (void)
     /* UltraSPARC T2 */
     { MASK_ISA, MASK_V9},
     /* UltraSPARC T3 */
-    { MASK_ISA, MASK_V9},
+    { MASK_ISA, MASK_V9 | MASK_FMAF},
     /* UltraSPARC T4 */
-    { MASK_ISA, MASK_V9},
+    { MASK_ISA, MASK_V9 | MASK_FMAF},
   };
   const struct cpu_table *cpu;
   unsigned int i;
@@ -833,9 +833,9 @@  sparc_option_override (void)
   if (target_flags_explicit & MASK_FPU)
     target_flags = (target_flags & ~MASK_FPU) | fpu;
 
-  /* Don't allow -mvis if FPU is disabled.  */
+  /* Don't allow -mvis or -mfmaf if FPU is disabled.  */
   if (! TARGET_FPU)
-    target_flags &= ~MASK_VIS;
+    target_flags &= ~(MASK_VIS | MASK_FMAF);
 
   /* -mvis assumes UltraSPARC+, so we are sure v9 instructions
      are available.
@@ -9505,6 +9505,25 @@  sparc_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	*total = COSTS_N_INSNS (1);
       return false;
 
+    case FMA:
+      {
+	rtx sub;
+
+	gcc_assert (float_mode_p);
+	*total = sparc_costs->float_mul;
+
+	sub = XEXP (x, 0);
+	if (GET_CODE (sub) == NEG)
+	  sub = XEXP (sub, 0);
+	*total += rtx_cost (sub, FMA, 0, speed);
+
+	sub = XEXP (x, 2);
+	if (GET_CODE (sub) == NEG)
+	  sub = XEXP (sub, 0);
+	*total += rtx_cost (sub, FMA, 2, speed);
+	return true;
+      }
+
     case MULT:
       if (float_mode_p)
 	*total = sparc_costs->float_mul;
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index afdca1e..c8174ce 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -1880,6 +1880,11 @@  extern int sparc_indent_opcode;
 #define TARGET_SUN_TLS TARGET_TLS
 #define TARGET_GNU_TLS 0
 
+#if !(defined(HAVE_AS_GAS_FMAF) || defined(HAVE_AS_SUN_FMAF))
+#undef TARGET_FMAF
+#define TARGET_FMAF 0
+#endif
+
 /* The number of Pmode words for the setjmp buffer.  */
 #define JMP_BUF_SIZE 12
 
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 721db93..58ac6d7 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -5345,6 +5345,78 @@ 
   "fmuls\t%1, %2, %0"
   [(set_attr "type" "fpmul")])
 
+(define_insn "fmadf4"
+  [(set (match_operand:DF 0 "register_operand" "=e")
+        (fma:DF (match_operand:DF 1 "register_operand" "e")
+		(match_operand:DF 2 "register_operand" "e")
+		(match_operand:DF 3 "register_operand" "e")))]
+  "TARGET_FMAF"
+  "fmaddd\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "fmsdf4"
+  [(set (match_operand:DF 0 "register_operand" "=e")
+        (fma:DF (match_operand:DF 1 "register_operand" "e")
+		(match_operand:DF 2 "register_operand" "e")
+		(neg:DF (match_operand:DF 3 "register_operand" "e"))))]
+  "TARGET_FMAF"
+  "fmsubd\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "*nfmadf4"
+  [(set (match_operand:DF 0 "register_operand" "=e")
+        (neg:DF (fma:DF (match_operand:DF 1 "register_operand" "e")
+			(match_operand:DF 2 "register_operand" "e")
+			(match_operand:DF 3 "register_operand" "e"))))]
+  "TARGET_FMAF"
+  "fnmaddd\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "*nfmsdf4"
+  [(set (match_operand:DF 0 "register_operand" "=e")
+        (neg:DF (fma:DF (match_operand:DF 1 "register_operand" "e")
+			(match_operand:DF 2 "register_operand" "e")
+			(neg:DF (match_operand:DF 3 "register_operand" "e")))))]
+  "TARGET_FMAF"
+  "fnmsubd\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "fmasf4"
+  [(set (match_operand:SF 0 "register_operand" "=f")
+        (fma:SF (match_operand:SF 1 "register_operand" "f")
+		(match_operand:SF 2 "register_operand" "f")
+		(match_operand:SF 3 "register_operand" "f")))]
+  "TARGET_FMAF"
+  "fmadds\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "fmssf4"
+  [(set (match_operand:SF 0 "register_operand" "=f")
+        (fma:SF (match_operand:SF 1 "register_operand" "f")
+		(match_operand:SF 2 "register_operand" "f")
+		(neg:SF (match_operand:SF 3 "register_operand" "f"))))]
+  "TARGET_FMAF"
+  "fmsubs\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "*nfmasf4"
+  [(set (match_operand:SF 0 "register_operand" "=f")
+        (neg:SF (fma:SF (match_operand:SF 1 "register_operand" "f")
+			(match_operand:SF 2 "register_operand" "f")
+			(match_operand:SF 3 "register_operand" "f"))))]
+  "TARGET_FMAF"
+  "fnmadds\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
+(define_insn "*nfmssf4"
+  [(set (match_operand:SF 0 "register_operand" "=f")
+        (neg:SF (fma:SF (match_operand:SF 1 "register_operand" "f")
+			(match_operand:SF 2 "register_operand" "f")
+			(neg:SF (match_operand:SF 3 "register_operand" "f")))))]
+  "TARGET_FMAF"
+  "fnmsubs\t%1, %2, %3, %0"
+  [(set_attr "type" "fpmul")])
+
 (define_insn "*muldf3_extend"
   [(set (match_operand:DF 0 "register_operand" "=e")
 	(mult:DF (float_extend:DF (match_operand:SF 1 "register_operand" "f"))
diff --git a/gcc/config/sparc/sparc.opt b/gcc/config/sparc/sparc.opt
index ce6fa94..8e2c6bd 100644
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@@ -61,6 +61,10 @@  mvis
 Target Report Mask(VIS)
 Use UltraSPARC Visual Instruction Set extensions
 
+mfmaf
+Target Report Mask(FMAF)
+Use UltraSPARC Fused Multiply extensions
+
 mptr64
 Target Report RejectNegative Mask(PTR64)
 Pointers are 64-bit
diff --git a/gcc/configure b/gcc/configure
index b1dd57b..a046e94 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24037,6 +24037,72 @@  if test $gcc_cv_as_sparc_offsetable_lo10 = yes; then
 $as_echo "#define HAVE_AS_OFFSETABLE_LO10 1" >>confdefs.h
 
 fi
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for FMAF instructions (GAS)" >&5
+$as_echo_n "checking assembler for FMAF instructions (GAS)... " >&6; }
+if test "${gcc_cv_as_sparc_gas_fmaf+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_sparc_gas_fmaf=no
+  if test x$gcc_cv_as != x; then
+    $as_echo '.text
+       .align 4
+       fmaddd %f0, %f2, %f4, %f6' > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -xarch=v9b -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_sparc_gas_fmaf=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_sparc_gas_fmaf" >&5
+$as_echo "$gcc_cv_as_sparc_gas_fmaf" >&6; }
+if test $gcc_cv_as_sparc_gas_fmaf = yes; then
+
+$as_echo "#define HAVE_AS_GAS_FMAF 1" >>confdefs.h
+
+fi
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for FMAF instructions (Sun AS)" >&5
+$as_echo_n "checking assembler for FMAF instructions (Sun AS)... " >&6; }
+if test "${gcc_cv_as_sparc_sunas_fmaf+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_sparc_sunas_fmaf=no
+  if test x$gcc_cv_as != x; then
+    $as_echo '.text
+       .align 4
+       fmaddd %f0, %f2, %f4, %f6' > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -xarch=sparcfmaf -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_sparc_sunas_fmaf=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_sparc_sunas_fmaf" >&5
+$as_echo "$gcc_cv_as_sparc_sunas_fmaf" >&6; }
+if test $gcc_cv_as_sparc_sunas_fmaf = yes; then
+
+$as_echo "#define HAVE_AS_SUN_FMAF 1" >>confdefs.h
+
+fi
     ;;
 
   i[34567]86-*-* | x86_64-*-*)
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 51ab3ac..e40ffc2 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3473,6 +3473,24 @@  foo:
        fi],
        [AC_DEFINE(HAVE_AS_OFFSETABLE_LO10, 1,
 	         [Define if your assembler supports offsetable %lo().])])
+
+    gcc_GAS_CHECK_FEATURE([FMAF instructions (GAS)],
+      gcc_cv_as_sparc_gas_fmaf,,
+      [-xarch=v9b],
+      [.text
+       .align 4
+       fmaddd %f0, %f2, %f4, %f6],,
+      [AC_DEFINE(HAVE_AS_GAS_FMAF, 1,
+                [Define if your assembler supports FMAF instructions.])])
+
+    gcc_GAS_CHECK_FEATURE([FMAF instructions (Sun AS)],
+      gcc_cv_as_sparc_sunas_fmaf,,
+      [-xarch=sparcfmaf],
+      [.text
+       .align 4
+       fmaddd %f0, %f2, %f4, %f6],,
+      [AC_DEFINE(HAVE_AS_SUN_FMAF, 1,
+                [Define if your assembler supports FMAF instructions.])])
     ;;
 
 changequote(,)dnl