diff mbox

[powerpc] Add -mmass to use XL's MASS vectorization library

Message ID 20100818203218.GA18478@hungry-tiger.westford.ibm.com
State New
Headers show

Commit Message

Michael Meissner Aug. 18, 2010, 8:32 p.m. UTC
This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
new switch (-mmass) that says to vectorize various mathematical functions (sin,
cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
functions.  I have done bootstraps on my power systems, and comparison tests
and there were no regressions.  Is it ok to install in the tree?

[gcc]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.opt (-mmass): New option to enable the
	compiler to autovectorize mathmetical functions for power7 using
	the Mathematical Acceleration Subsystem library.

	* config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): New
	function to handle auto vectorizing math functions that are in the
	MASS library.
	(rs6000_builtin_vectorized_function): Call it.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mmass.

[gcc/testsuite]
2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-mass-1.c: New file, test -mmass.

Comments

Richard Biener Aug. 18, 2010, 8:36 p.m. UTC | #1
On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> new switch (-mmass) that says to vectorize various mathematical functions (sin,
> cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> functions.  I have done bootstraps on my power systems, and comparison tests
> and there were no regressions.  Is it ok to install in the tree?

In the case that we develop a common library for all archs it would be nice
to have the same switch for ppc as we have for x86, so why didn't you
use -mveclibabi=mass?

Richard.

> [gcc]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * config/rs6000/rs6000.opt (-mmass): New option to enable the
>        compiler to autovectorize mathmetical functions for power7 using
>        the Mathematical Acceleration Subsystem library.
>
>        * config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): New
>        function to handle auto vectorizing math functions that are in the
>        MASS library.
>        (rs6000_builtin_vectorized_function): Call it.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mmass.
>
> [gcc/testsuite]
> 2010-08-18  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/vsx-mass-1.c: New file, test -mmass.
>
> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meissner@linux.vnet.ibm.com
>
Michael Meissner Aug. 18, 2010, 8:50 p.m. UTC | #2
On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
> > functions.  I have done bootstraps on my power systems, and comparison tests
> > and there were no regressions.  Is it ok to install in the tree?
> 
> In the case that we develop a common library for all archs it would be nice
> to have the same switch for ppc as we have for x86, so why didn't you
> use -mveclibabi=mass?

That sounds reasonable.

It isn't in this patch, but at some point, I think it would be a useful to add
a SSA pass to transform the code to call a function function that takes
pointers and a length argument, and eliminate the loop.  This way, the library
can properly deal with load delays, etc.  If memory serves, the Intel and AMD
optimized math libraries have similar functions, though the order of the
arguments is different than the MASS arguments.  Is this the case?

If I wasn't clear, consider the loop:

	for (i = 0; i < size; i++)
	  a[i] = __builtin_sin (b[i])

right now gets transformed to:

	V2DF_a_ptr = (V2DF *)a;
	V2DF_b_ptr = (V2DF *)b;
	for (i = 0; i < size/2; i++)
	  V2DF_a_ptr[i] = sind2 (V2DF_b_ptr[i])

and instead it should generate:

	len_tmp = size;
	vsin (a, b, &len_tmp);
Sebastian Pop Aug. 18, 2010, 9 p.m. UTC | #3
On Wed, Aug 18, 2010 at 15:50, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote:
>> On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>> > This patch was cloned from the i386 -mveclibabi=<xxx> support, and it adds a
>> > new switch (-mmass) that says to vectorize various mathematical functions (sin,
>> > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006
>> > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math
>> > functions.  I have done bootstraps on my power systems, and comparison tests
>> > and there were no regressions.  Is it ok to install in the tree?
>>
>> In the case that we develop a common library for all archs it would be nice
>> to have the same switch for ppc as we have for x86, so why didn't you
>> use -mveclibabi=mass?
>
> That sounds reasonable.
>
> It isn't in this patch, but at some point, I think it would be a useful to add
> a SSA pass to transform the code to call a function function that takes
> pointers and a length argument, and eliminate the loop.  This way, the library
> can properly deal with load delays, etc.  If memory serves, the Intel and AMD
> optimized math libraries have similar functions, though the order of the
> arguments is different than the MASS arguments.  Is this the case?
>
> If I wasn't clear, consider the loop:
>
>        for (i = 0; i < size; i++)
>          a[i] = __builtin_sin (b[i])
>
> right now gets transformed to:
>
>        V2DF_a_ptr = (V2DF *)a;
>        V2DF_b_ptr = (V2DF *)b;
>        for (i = 0; i < size/2; i++)
>          V2DF_a_ptr[i] = sind2 (V2DF_b_ptr[i])
>
> and instead it should generate:
>
>        len_tmp = size;
>        vsin (a, b, &len_tmp);
>

I also thought about this transform, and I think it
could be called from loop distribution.

Sebastian
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 163347)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -115,6 +115,10 @@  mpopcntd
 Target Report Mask(POPCNTD)
 Use PowerPC V2.06 popcntd instruction
 
+mmass
+Target Report Var(TARGET_MASS) Init(0)
+Use the Mathematical Acceleration Subsystem library high performance math libraries.
+
 mvsx
 Target Report Mask(VSX)
 Use vector/scalar (VSX) instructions
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 163347)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -989,6 +989,7 @@  static rtx rs6000_emit_stack_reset (rs60
 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
 				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
+static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
 static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int);
 static void rs6000_restore_saved_cr (rtx, int);
@@ -3602,6 +3603,145 @@  rs6000_parse_fpu_option (const char *opt
   return FPU_NONE;
 }
 
+
+/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
+   library with vectorized intrinsics.  */
+
+static tree
+rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+{
+  char name[32];
+  const char *suffix = NULL;
+  tree fntype, new_fndecl, bdecl = NULL_TREE;
+  int n_args = 1;
+  const char *bname;
+  enum machine_mode el_mode, in_mode;
+  int n, in_n;
+
+  /* Libmass is suitable for unsafe math only as it does not correctly support
+     parts of IEEE with the required precision such as denormals.  Only support
+     it if we have VSX to use the simd d2 or f4 functions.
+     XXX: Add variable length support.  */
+  if (!flag_unsafe_math_optimizations || !TARGET_VSX)
+    return NULL_TREE;
+
+  el_mode = TYPE_MODE (TREE_TYPE (type_out));
+  n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  if (el_mode != in_mode
+      || n != in_n)
+    return NULL_TREE;
+
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+    {
+      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
+      switch (fn)
+	{
+	case BUILT_IN_ATAN2:
+	case BUILT_IN_HYPOT:
+	case BUILT_IN_POW:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOS:
+	case BUILT_IN_ACOSH:
+	case BUILT_IN_ASIN:
+	case BUILT_IN_ASINH:
+	case BUILT_IN_ATAN:
+	case BUILT_IN_ATANH:
+	case BUILT_IN_CBRT:
+	case BUILT_IN_COS:
+	case BUILT_IN_COSH:
+	case BUILT_IN_ERF:
+	case BUILT_IN_ERFC:
+	case BUILT_IN_EXP2:
+	case BUILT_IN_EXP:
+	case BUILT_IN_EXPM1:
+	case BUILT_IN_LGAMMA:
+	case BUILT_IN_LOG10:
+	case BUILT_IN_LOG1P:
+	case BUILT_IN_LOG2:
+	case BUILT_IN_LOG:
+	case BUILT_IN_SIN:
+	case BUILT_IN_SINH:
+	case BUILT_IN_SQRT:
+	case BUILT_IN_TAN:
+	case BUILT_IN_TANH:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "d2";				/* pow -> powd2 */
+	  if (el_mode != DFmode
+	      || n != 2)
+	    return NULL_TREE;
+	  break;
+
+	case BUILT_IN_ATAN2F:
+	case BUILT_IN_HYPOTF:
+	case BUILT_IN_POWF:
+	  n_args = 2;
+	  /* fall through */
+
+	case BUILT_IN_ACOSF:
+	case BUILT_IN_ACOSHF:
+	case BUILT_IN_ASINF:
+	case BUILT_IN_ASINHF:
+	case BUILT_IN_ATANF:
+	case BUILT_IN_ATANHF:
+	case BUILT_IN_CBRTF:
+	case BUILT_IN_COSF:
+	case BUILT_IN_COSHF:
+	case BUILT_IN_ERFF:
+	case BUILT_IN_ERFCF:
+	case BUILT_IN_EXP2F:
+	case BUILT_IN_EXPF:
+	case BUILT_IN_EXPM1F:
+	case BUILT_IN_LGAMMAF:
+	case BUILT_IN_LOG10F:
+	case BUILT_IN_LOG1PF:
+	case BUILT_IN_LOG2F:
+	case BUILT_IN_LOGF:
+	case BUILT_IN_SINF:
+	case BUILT_IN_SINHF:
+	case BUILT_IN_SQRTF:
+	case BUILT_IN_TANF:
+	case BUILT_IN_TANHF:
+	  bdecl = implicit_built_in_decls[fn];
+	  suffix = "4";					/* powf -> powf4 */
+	  if (el_mode != SFmode
+	      || n != 4)
+	    return NULL_TREE;
+	  break;
+
+	default:
+	  return NULL_TREE;
+	}
+    }
+  else
+    return NULL_TREE;
+
+  gcc_assert (suffix != NULL);
+  bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
+  strcpy (name, bname + sizeof ("__builtin_") - 1);
+  strcat (name, suffix);
+
+  if (n_args == 1)
+    fntype = build_function_type_list (type_out, type_in, NULL);
+  else if (n_args == 2)
+    fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+  else
+    gcc_unreachable ();
+
+  /* Build a function declaration for the vectorized function.  */
+  new_fndecl = build_decl (BUILTINS_LOCATION,
+			   FUNCTION_DECL, get_identifier (name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
 /* Returns a function decl for a vectorized version of the builtin function
    with builtin function code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
@@ -3768,6 +3908,10 @@  rs6000_builtin_vectorized_function (tree
 	}
     }
 
+  /* Generate calls to libmass if appropriate.  */
+  if (TARGET_MASS)
+    return rs6000_builtin_vectorized_libmass (fndecl, type_out, type_in);
+
   return NULL_TREE;
 }
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 163347)
+++ gcc/doc/invoke.texi	(working copy)
@@ -786,7 +786,9 @@  See RS/6000 and PowerPC Options.
 -mprototype  -mno-prototype @gol
 -msim  -mmvme  -mads  -myellowknife  -memb  -msdata @gol
 -msdata=@var{opt}  -mvxworks  -G @var{num}  -pthread @gol
--mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision}
+-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision
+-mno-recip-precision @gol
+-mmass}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -15847,6 +15849,29 @@  automatically selects @option{-mrecip-pr
 precision square root estimate instructions are not generated by
 default on low precision machines, since they do not provide an
 estimate that converges after three steps.
+
+@item -mmass
+@itemx -mno-mass
+@opindex mmass
+Specifies to use IBM's Mathematical Acceleration Subsystem (MASS)
+libraries for vectorizing intrinsics using external libraries.  GCC
+will currently emit calls to @code{acosd2}, @code{acosf4},
+@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4},
+@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4},
+@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4},
+@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4},
+@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4},
+@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4},
+@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4},
+@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4},
+@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4},
+@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4},
+@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2},
+@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2},
+@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code
+for power7.  Both @option{-ftree-vectorize} and
+@option{-funsafe-math-optimizations} have to be enabled.  The MASS
+libraries will have to be specified at link time.
 @end table
 
 @node RX Options
Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c	(revision 0)
@@ -0,0 +1,554 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mmass" } */
+/* { dg-final { scan-assembler "bl atan2d2" } } */
+/* { dg-final { scan-assembler "bl atan2f4" } } */
+/* { dg-final { scan-assembler "bl hypotd2" } } */
+/* { dg-final { scan-assembler "bl hypotf4" } } */
+/* { dg-final { scan-assembler "bl powd2" } } */
+/* { dg-final { scan-assembler "bl powf4" } } */
+/* { dg-final { scan-assembler "bl acosd2" } } */
+/* { dg-final { scan-assembler "bl acosf4" } } */
+/* { dg-final { scan-assembler "bl acoshd2" } } */
+/* { dg-final { scan-assembler "bl acoshf4" } } */
+/* { dg-final { scan-assembler "bl asind2" } } */
+/* { dg-final { scan-assembler "bl asinf4" } } */
+/* { dg-final { scan-assembler "bl asinhd2" } } */
+/* { dg-final { scan-assembler "bl asinhf4" } } */
+/* { dg-final { scan-assembler "bl atand2" } } */
+/* { dg-final { scan-assembler "bl atanf4" } } */
+/* { dg-final { scan-assembler "bl atanhd2" } } */
+/* { dg-final { scan-assembler "bl atanhf4" } } */
+/* { dg-final { scan-assembler "bl cbrtd2" } } */
+/* { dg-final { scan-assembler "bl cbrtf4" } } */
+/* { dg-final { scan-assembler "bl cosd2" } } */
+/* { dg-final { scan-assembler "bl cosf4" } } */
+/* { dg-final { scan-assembler "bl coshd2" } } */
+/* { dg-final { scan-assembler "bl coshf4" } } */
+/* { dg-final { scan-assembler "bl erfd2" } } */
+/* { dg-final { scan-assembler "bl erff4" } } */
+/* { dg-final { scan-assembler "bl erfcd2" } } */
+/* { dg-final { scan-assembler "bl erfcf4" } } */
+/* { dg-final { scan-assembler "bl exp2d2" } } */
+/* { dg-final { scan-assembler "bl exp2f4" } } */
+/* { dg-final { scan-assembler "bl expd2" } } */
+/* { dg-final { scan-assembler "bl expf4" } } */
+/* { dg-final { scan-assembler "bl expm1d2" } } */
+/* { dg-final { scan-assembler "bl expm1f4" } } */
+/* { dg-final { scan-assembler "bl lgamma" } } */
+/* { dg-final { scan-assembler "bl lgammaf" } } */
+/* { dg-final { scan-assembler "bl log10d2" } } */
+/* { dg-final { scan-assembler "bl log10f4" } } */
+/* { dg-final { scan-assembler "bl log1pd2" } } */
+/* { dg-final { scan-assembler "bl log1pf4" } } */
+/* { dg-final { scan-assembler "bl log2d2" } } */
+/* { dg-final { scan-assembler "bl log2f4" } } */
+/* { dg-final { scan-assembler "bl logd2" } } */
+/* { dg-final { scan-assembler "bl logf4" } } */
+/* { dg-final { scan-assembler "bl sind2" } } */
+/* { dg-final { scan-assembler "bl sinf4" } } */
+/* { dg-final { scan-assembler "bl sinhd2" } } */
+/* { dg-final { scan-assembler "bl sinhf4" } } */
+/* { dg-final { scan-assembler "bl tand2" } } */
+/* { dg-final { scan-assembler "bl tanf4" } } */
+/* { dg-final { scan-assembler "bl tanhd2" } } */
+/* { dg-final { scan-assembler "bl tanhf4" } } */
+
+#ifndef SIZE
+#define SIZE 1024
+#endif
+
+double d1[SIZE] __attribute__((__aligned__(32)));
+double d2[SIZE] __attribute__((__aligned__(32)));
+double d3[SIZE] __attribute__((__aligned__(32)));
+
+float f1[SIZE] __attribute__((__aligned__(32)));
+float f2[SIZE] __attribute__((__aligned__(32)));
+float f3[SIZE] __attribute__((__aligned__(32)));
+
+void
+test_double_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan2 (d2[i], d3[i]);
+}
+
+void
+test_float_atan2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atan2f (f2[i], f3[i]);
+}
+
+void
+test_double_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_hypot (d2[i], d3[i]);
+}
+
+void
+test_float_hypot (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_hypotf (f2[i], f3[i]);
+}
+
+void
+test_double_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_pow (d2[i], d3[i]);
+}
+
+void
+test_float_pow (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_powf (f2[i], f3[i]);
+}
+
+void
+test_double_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acos (d2[i]);
+}
+
+void
+test_float_acos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acosf (f2[i]);
+}
+
+void
+test_double_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_acosh (d2[i]);
+}
+
+void
+test_float_acosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_acoshf (f2[i]);
+}
+
+void
+test_double_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asin (d2[i]);
+}
+
+void
+test_float_asin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinf (f2[i]);
+}
+
+void
+test_double_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_asinh (d2[i]);
+}
+
+void
+test_float_asinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_asinhf (f2[i]);
+}
+
+void
+test_double_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atan (d2[i]);
+}
+
+void
+test_float_atan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanf (f2[i]);
+}
+
+void
+test_double_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_atanh (d2[i]);
+}
+
+void
+test_float_atanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_atanhf (f2[i]);
+}
+
+void
+test_double_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cbrt (d2[i]);
+}
+
+void
+test_float_cbrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cbrtf (f2[i]);
+}
+
+void
+test_double_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cos (d2[i]);
+}
+
+void
+test_float_cos (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_cosf (f2[i]);
+}
+
+void
+test_double_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_cosh (d2[i]);
+}
+
+void
+test_float_cosh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_coshf (f2[i]);
+}
+
+void
+test_double_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erf (d2[i]);
+}
+
+void
+test_float_erf (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erff (f2[i]);
+}
+
+void
+test_double_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_erfc (d2[i]);
+}
+
+void
+test_float_erfc (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_erfcf (f2[i]);
+}
+
+void
+test_double_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp2 (d2[i]);
+}
+
+void
+test_float_exp2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_exp2f (f2[i]);
+}
+
+void
+test_double_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_exp (d2[i]);
+}
+
+void
+test_float_exp (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expf (f2[i]);
+}
+
+void
+test_double_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_expm1 (d2[i]);
+}
+
+void
+test_float_expm1 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_expm1f (f2[i]);
+}
+
+void
+test_double_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_lgamma (d2[i]);
+}
+
+void
+test_float_lgamma (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_lgammaf (f2[i]);
+}
+
+void
+test_double_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log10 (d2[i]);
+}
+
+void
+test_float_log10 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log10f (f2[i]);
+}
+
+void
+test_double_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log1p (d2[i]);
+}
+
+void
+test_float_log1p (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log1pf (f2[i]);
+}
+
+void
+test_double_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log2 (d2[i]);
+}
+
+void
+test_float_log2 (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_log2f (f2[i]);
+}
+
+void
+test_double_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_log (d2[i]);
+}
+
+void
+test_float_log (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_logf (f2[i]);
+}
+
+void
+test_double_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sin (d2[i]);
+}
+
+void
+test_float_sin (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinf (f2[i]);
+}
+
+void
+test_double_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sinh (d2[i]);
+}
+
+void
+test_float_sinh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sinhf (f2[i]);
+}
+
+void
+test_double_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_sqrt (d2[i]);
+}
+
+void
+test_float_sqrt (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_sqrtf (f2[i]);
+}
+
+void
+test_double_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tan (d2[i]);
+}
+
+void
+test_float_tan (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanf (f2[i]);
+}
+
+void
+test_double_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    d1[i] = __builtin_tanh (d2[i]);
+}
+
+void
+test_float_tanh (void)
+{
+  int i;
+
+  for (i = 0; i < SIZE; i++)
+    f1[i] = __builtin_tanhf (f2[i]);
+}