diff mbox series

, Define __FP_FAST_FMAF128 on PowerPC ISA 3.0

Message ID 20170927221818.GA20589@ibm-tiger.the-meissners.org
State New
Headers show
Series , Define __FP_FAST_FMAF128 on PowerPC ISA 3.0 | expand

Commit Message

Michael Meissner Sept. 27, 2017, 10:18 p.m. UTC
The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
for PowerPC code when we have the IEEE 128-bit floating point hardware
instructions enabled.

This patch does this in the PowerPC backend.  As I look at the whole issue, at
some point we should do this more in the machine independent portion of the
compiler.  I have some initial patches to do this in the c-family files, but at
the present time, the patches are not complete, and I need to think about it
more.

So, I would like to check in this patch now, and if we come up with a machine
independent version, we can back out this particular patch.  I have done a full
bootstrap and regression test, there were no regressions, and the new test case
does run correctly.  Can I check this into the trunk?

[gcc]
2017-09-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
	__FP_FAST_FMAF128 on ISA 3.0.

[gcc/testsuite]
2017-09-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/float128-fma3.c: New test.

Comments

Joseph Myers Sept. 28, 2017, 12:40 a.m. UTC | #1
On Wed, 27 Sep 2017, Michael Meissner wrote:

> The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
> for PowerPC code when we have the IEEE 128-bit floating point hardware
> instructions enabled.

It's not a standard macro.  TS 18661-3 has FP_FAST_FMAF128 as an optional 
math.h macro (but glibc doesn't define it anywhere at present).

> This patch does this in the PowerPC backend.  As I look at the whole issue, at
> some point we should do this more in the machine independent portion of the
> compiler.  I have some initial patches to do this in the c-family files, but at
> the present time, the patches are not complete, and I need to think about it
> more.

I think a machine-independent definition (for _FloatN / _FloatNx types in 
general) should go along with machine-independent fmafN / fmafNx built-in 
functions; when the built-in function is machine-specific, it's natural 
for the macro to be as well.

But in any case, the new macro should be documented in cpp.texi alongside 
the existing __FP_FAST_FMA* macros (probably in the generic 
__FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
Michael Meissner Oct. 2, 2017, 5:54 p.m. UTC | #2
On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote:
> On Wed, 27 Sep 2017, Michael Meissner wrote:
> 
> > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
> > for PowerPC code when we have the IEEE 128-bit floating point hardware
> > instructions enabled.
> 
> It's not a standard macro.  TS 18661-3 has FP_FAST_FMAF128 as an optional 
> math.h macro (but glibc doesn't define it anywhere at present).
> 
> > This patch does this in the PowerPC backend.  As I look at the whole issue, at
> > some point we should do this more in the machine independent portion of the
> > compiler.  I have some initial patches to do this in the c-family files, but at
> > the present time, the patches are not complete, and I need to think about it
> > more.
> 
> I think a machine-independent definition (for _FloatN / _FloatNx types in 
> general) should go along with machine-independent fmafN / fmafNx built-in 
> functions; when the built-in function is machine-specific, it's natural 
> for the macro to be as well.

I have patches for this that I will submit shortly to replace the rs6000
specific patch.

I haven't yet found all of the places that need to be changed for more
traditional math functions like sqrtf128.

> But in any case, the new macro should be documented in cpp.texi alongside 
> the existing __FP_FAST_FMA* macros (probably in the generic 
> __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
Michael Meissner Oct. 2, 2017, 11:51 p.m. UTC | #3
On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote:
> On Wed, 27 Sep 2017, Michael Meissner wrote:
> 
> > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
> > for PowerPC code when we have the IEEE 128-bit floating point hardware
> > instructions enabled.
> 
> It's not a standard macro.  TS 18661-3 has FP_FAST_FMAF128 as an optional 
> math.h macro (but glibc doesn't define it anywhere at present).
> 
> > This patch does this in the PowerPC backend.  As I look at the whole issue, at
> > some point we should do this more in the machine independent portion of the
> > compiler.  I have some initial patches to do this in the c-family files, but at
> > the present time, the patches are not complete, and I need to think about it
> > more.
> 
> I think a machine-independent definition (for _FloatN / _FloatNx types in 
> general) should go along with machine-independent fmafN / fmafNx built-in 
> functions; when the built-in function is machine-specific, it's natural 
> for the macro to be as well.
> 
> But in any case, the new macro should be documented in cpp.texi alongside 
> the existing __FP_FAST_FMA* macros (probably in the generic 
> __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).

This patch adds support for adding the built-in __builtin_fmaf<N> and
__builtin_fmaf<N>x functions if the target machine supports an appropriate
fused multiply-add (FMA) instruction.  This patch replaces the original PowerPC
specific patch.

Because it involves changes in the built-in support, both the c and c-family
subdirectories, as well as PowerPC changes, I added the global/release
maintainers to the To: list.

I have done a bootstrap and make check on a little endian Power8 with no
regresions in the tests.  I have verified that the changed and new tests both
ran fine.

I have also bootstrapped the changes on an x86-64 compiler, and it bootstrapped
fine.  I am currently running the unmodified build, but I'm not expecting any
changes in the test suite.

Assuming the x86-64 tests also have no regressions, can I check these changes
into the trunk?

[gcc]
2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* builtins.def (BUILT_IN_FMAF16): Add support for fused
	multiply-add built-in functions for _Float<N> and _Float<N>x
	types.
	(BUILT_IN_FMAF32): Likewise.
	(BUILT_IN_FMAF64): Likewise.
	(BUILT_IN_FMAF128): Likewise.
	(BUILT_IN_FMAF32X): Likewise.
	(BUILT_IN_FMAF64X): Likewise.
	(BUILT_IN_FMAF128X): Likewise.
	* builtin-types.def (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16):
	Likewise.
	(BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32): Likewise.
	(BT_FN_FLOAT64_FLOAT64_FLOAT64_FLOAT64): Likewise.
	(BT_FN_FLOAT128_FLOAT128_FLOAT128_FLOAT128): Likewise.
	(BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_FLOAT32X): Likewise.
	(BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_FLOAT64X): Likewise.
	(BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_FLOAT128X): Likewise.
	* builtins.c (expand_builtin_mathfn_ternary): Likewise.
	(expand_builtin): Add fused multiply-add builtin support for
	_Float<N> and _Float<N>X types.  Issue a warning if the machine
	does not provide an appropriate FMA insn.
	(fold_builtin_3): Add support for fused multiply-add built-in
	functions for _Float<N> and _Float<N>x types.
	* config/rs6000/rs6000-builtins.def (FMAF128): Delete creating
	__builtin_fmaf128, since this is now done in machine independent
	code.
	* doc/cpp.texi (__FP_FAST_FMAF16): Document macros set to declare
	that the appropriate fused multiply-add on _Float<N> and
	_Float<N>X types is implemented.
	(__FP_FAST_FMAF32): Likewise.
	(__FP_FAST_FMAF64): Likewise.
	(__FP_FAST_FMAF128): Likewise.
	(__FP_FAST_FMAF32X): Likewise.
	(__FP_FAST_FMAF64X): Likewise.
	(__FP_FAST_FMAF128X): Likewise.

[gcc/c]
2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* c-decl.c (header_for_builtin_fn): Add support for fused
	multiply-add built-in functions for _Float<N> and _Float<N>x
	types.

[gcc/c-family]
2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* c-cppbuiltin.c (mode_has_fma): Add support for PowerPC _float128
	FMA (KFmode) if long double != __float128.
	(c_cpp_builtins): Define __FP_FAST_FMAF<N> if _Float<N> fused
	multiply-add is supported.  Define __FP_FAST_FMAF<N>X if
	_Float<N>x fused multiply-add is supported.

[gcc/testsuite]
2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/float128-fma2.c: Change error to new
	warning.
	* gcc.target/powerpc/float128-fma3.c: New test.
Michael Meissner Oct. 2, 2017, 11:52 p.m. UTC | #4
Whoops, I forgot to attach the patch.

On Mon, Oct 02, 2017 at 07:51:00PM -0400, Michael Meissner wrote:
> On Thu, Sep 28, 2017 at 12:40:24AM +0000, Joseph Myers wrote:
> > On Wed, 27 Sep 2017, Michael Meissner wrote:
> > 
> > > The glibc team has requested we define the standard macro (__FP_FAST_FMAF128)
> > > for PowerPC code when we have the IEEE 128-bit floating point hardware
> > > instructions enabled.
> > 
> > It's not a standard macro.  TS 18661-3 has FP_FAST_FMAF128 as an optional 
> > math.h macro (but glibc doesn't define it anywhere at present).
> > 
> > > This patch does this in the PowerPC backend.  As I look at the whole issue, at
> > > some point we should do this more in the machine independent portion of the
> > > compiler.  I have some initial patches to do this in the c-family files, but at
> > > the present time, the patches are not complete, and I need to think about it
> > > more.
> > 
> > I think a machine-independent definition (for _FloatN / _FloatNx types in 
> > general) should go along with machine-independent fmafN / fmafNx built-in 
> > functions; when the built-in function is machine-specific, it's natural 
> > for the macro to be as well.
> > 
> > But in any case, the new macro should be documented in cpp.texi alongside 
> > the existing __FP_FAST_FMA* macros (probably in the generic 
> > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
> 
> This patch adds support for adding the built-in __builtin_fmaf<N> and
> __builtin_fmaf<N>x functions if the target machine supports an appropriate
> fused multiply-add (FMA) instruction.  This patch replaces the original PowerPC
> specific patch.
> 
> Because it involves changes in the built-in support, both the c and c-family
> subdirectories, as well as PowerPC changes, I added the global/release
> maintainers to the To: list.
> 
> I have done a bootstrap and make check on a little endian Power8 with no
> regresions in the tests.  I have verified that the changed and new tests both
> ran fine.
> 
> I have also bootstrapped the changes on an x86-64 compiler, and it bootstrapped
> fine.  I am currently running the unmodified build, but I'm not expecting any
> changes in the test suite.
> 
> Assuming the x86-64 tests also have no regressions, can I check these changes
> into the trunk?
> 
> [gcc]
> 2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* builtins.def (BUILT_IN_FMAF16): Add support for fused
> 	multiply-add built-in functions for _Float<N> and _Float<N>x
> 	types.
> 	(BUILT_IN_FMAF32): Likewise.
> 	(BUILT_IN_FMAF64): Likewise.
> 	(BUILT_IN_FMAF128): Likewise.
> 	(BUILT_IN_FMAF32X): Likewise.
> 	(BUILT_IN_FMAF64X): Likewise.
> 	(BUILT_IN_FMAF128X): Likewise.
> 	* builtin-types.def (BT_FN_FLOAT16_FLOAT16_FLOAT16_FLOAT16):
> 	Likewise.
> 	(BT_FN_FLOAT32_FLOAT32_FLOAT32_FLOAT32): Likewise.
> 	(BT_FN_FLOAT64_FLOAT64_FLOAT64_FLOAT64): Likewise.
> 	(BT_FN_FLOAT128_FLOAT128_FLOAT128_FLOAT128): Likewise.
> 	(BT_FN_FLOAT32X_FLOAT32X_FLOAT32X_FLOAT32X): Likewise.
> 	(BT_FN_FLOAT64X_FLOAT64X_FLOAT64X_FLOAT64X): Likewise.
> 	(BT_FN_FLOAT128X_FLOAT128X_FLOAT128X_FLOAT128X): Likewise.
> 	* builtins.c (expand_builtin_mathfn_ternary): Likewise.
> 	(expand_builtin): Add fused multiply-add builtin support for
> 	_Float<N> and _Float<N>X types.  Issue a warning if the machine
> 	does not provide an appropriate FMA insn.
> 	(fold_builtin_3): Add support for fused multiply-add built-in
> 	functions for _Float<N> and _Float<N>x types.
> 	* config/rs6000/rs6000-builtins.def (FMAF128): Delete creating
> 	__builtin_fmaf128, since this is now done in machine independent
> 	code.
> 	* doc/cpp.texi (__FP_FAST_FMAF16): Document macros set to declare
> 	that the appropriate fused multiply-add on _Float<N> and
> 	_Float<N>X types is implemented.
> 	(__FP_FAST_FMAF32): Likewise.
> 	(__FP_FAST_FMAF64): Likewise.
> 	(__FP_FAST_FMAF128): Likewise.
> 	(__FP_FAST_FMAF32X): Likewise.
> 	(__FP_FAST_FMAF64X): Likewise.
> 	(__FP_FAST_FMAF128X): Likewise.
> 
> [gcc/c]
> 2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* c-decl.c (header_for_builtin_fn): Add support for fused
> 	multiply-add built-in functions for _Float<N> and _Float<N>x
> 	types.
> 
> [gcc/c-family]
> 2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* c-cppbuiltin.c (mode_has_fma): Add support for PowerPC _float128
> 	FMA (KFmode) if long double != __float128.
> 	(c_cpp_builtins): Define __FP_FAST_FMAF<N> if _Float<N> fused
> 	multiply-add is supported.  Define __FP_FAST_FMAF<N>X if
> 	_Float<N>x fused multiply-add is supported.
> 
> [gcc/testsuite]
> 2017-10-02  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/float128-fma2.c: Change error to new
> 	warning.
> 	* gcc.target/powerpc/float128-fma3.c: New test.
> 
> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Joseph Myers Oct. 3, 2017, 1:19 a.m. UTC | #5
On Mon, 2 Oct 2017, Michael Meissner wrote:

> > > But in any case, the new macro should be documented in cpp.texi alongside 
> > > the existing __FP_FAST_FMA* macros (probably in the generic 
> > > __FP_FAST_FMAF@var{n} and __FP_FAST_FMAF@var{n}X form).
> > 
> > This patch adds support for adding the built-in __builtin_fmaf<N> and
> > __builtin_fmaf<N>x functions if the target machine supports an appropriate
> > fused multiply-add (FMA) instruction.  This patch replaces the original PowerPC
> > specific patch.

Certainly the <math.h> FP_FAST_FMA* macros are supposed to relate to 
whether the public functions such as fmaf128 are fast rather than to 
__builtin_* names.

I think there's a strong case that you should provide built-in functions 
under the public names when defining __FP_FAST_FMA*.  I.e., add a variant 
of DEF_GCC_FLOATN_NX_BUILTINS that uses DEF_EXT_LIB_BUILTIN instead of 
DEF_GCC_BUILTIN, and use that for the new built-in functions.

Then, the built-in functions, in whatever form they are provided, should 
be documented in extend.texi, alongside the documentation of 
__builtin_fabsf@var{n} etc. (with, of course, the caveats about 
availability when appropriate instruction support isn't available - the 
__builtin_fabsfN, __builtin_copysignfN functions are always inlined, the 
fma ones aren't, and people may well lack C library support for the 
underlying functions).

Given that, I don't think the warning about lack of instruction support is 
appropriate; a call to __builtin_fmaf128, if the type is supported but 
there is no corresponding instruction (on x86_64, say), would just fall 
back to calling an external fmaf128 function (which in that case would 
work with glibc 2.26 or later, though calls to fmaf64 etc. wouldn't), much 
like any other such built-in function (we don't warn about e.g. calling 
__builtin_clog10 on systems whose C library may not have clog10).
Segher Boessenkool Oct. 4, 2017, 1 p.m. UTC | #6
Hi!

On Mon, Oct 02, 2017 at 07:52:50PM -0400, Michael Meissner wrote:
> Whoops, I forgot to attach the patch.

Heh.  The rs6000 parts are of course okay (trivial / obvious, but maybe
you are waiting for an ack).

Thanks,


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 253236)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -585,7 +585,10 @@  rs6000_target_modify_macros (bool define
   /* OPTION_MASK_FLOAT128_HARDWARE can be turned on if -mcpu=power9 is used or
      via the target attribute/pragma.  */
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
-    rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
+    {
+      rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
+      rs6000_define_or_undefine_macro (define_p, "__FP_FAST_FMAF128");
+    }
 
   /* options from the builtin masks.  */
   /* Note that RS6000_BTM_PAIRED is enabled only if
Index: gcc/testsuite/gcc.target/powerpc/float128-fma3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-fma3.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/float128-fma3.c	(working copy)
@@ -0,0 +1,33 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mpower9-vector -O2" } */
+
+/* Make sure the appropriate FMA fast macros are defined.  */
+
+#ifdef __FP_FAST_FMAF
+float
+do_fmaf (float a, float b, float c)
+{
+  return __builtin_fmaf (a, b, c);
+}
+#endif
+
+#ifdef __FP_FAST_FMA
+double
+do_fma (double a, double b, double c)
+{
+  return __builtin_fma (a, b, c);
+}
+#endif
+
+#ifdef __FP_FAST_FMAF128
+_Float128
+do_fmaf128 (_Float128 a, _Float128 b, _Float128 c)
+{
+  return __builtin_fmaf128 (a, b, c);
+}
+#endif
+
+/* { dg-final { scan-assembler {\mfmadds\M|\mxsmadd.sp\M}  } } */
+/* { dg-final { scan-assembler {\mfmadd\M|\mxsmadd.dp\M}   } } */
+/* { dg-final { scan-assembler {\mxsmaddqp\M}              } } */