From patchwork Wed Aug 18 20:32:18 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 62086 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 62DA1B6F14 for ; Thu, 19 Aug 2010 06:32:40 +1000 (EST) Received: (qmail 1699 invoked by alias); 18 Aug 2010 20:32:37 -0000 Received: (qmail 1674 invoked by uid 22791); 18 Aug 2010 20:32:33 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, TW_CP, TW_FN, TW_MM, TW_MV, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e4.ny.us.ibm.com (HELO e4.ny.us.ibm.com) (32.97.182.144) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 18 Aug 2010 20:32:25 +0000 Received: from d01relay06.pok.ibm.com (d01relay06.pok.ibm.com [9.56.227.116]) by e4.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o7IKHe4H008515 for ; Wed, 18 Aug 2010 16:17:40 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay06.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o7IKWL731966248 for ; Wed, 18 Aug 2010 16:32:21 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o7IKWK1F018260 for ; Wed, 18 Aug 2010 17:32:20 -0300 Received: from hungry-tiger.westford.ibm.com (IBM-C78937C9630.westford.ibm.com [9.33.37.152]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id o7IKWJiv018069; Wed, 18 Aug 2010 17:32:19 -0300 Received: from hungry-tiger.westford.ibm.com (hungry-tiger.westford.ibm.com [127.0.0.1]) by hungry-tiger.westford.ibm.com (Postfix) with ESMTP id 22A8A3FEE9; Wed, 18 Aug 2010 16:32:19 -0400 (EDT) Received: (from meissner@localhost) by hungry-tiger.westford.ibm.com (8.14.3/8.14.3/Submit) id o7IKWItT018619; Wed, 18 Aug 2010 16:32:18 -0400 Date: Wed, 18 Aug 2010 16:32:18 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library Message-ID: <20100818203218.GA18478@hungry-tiger.westford.ibm.com> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-08-17) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch was cloned from the i386 -mveclibabi= support, and it adds a new switch (-mmass) that says to vectorize various mathematical functions (sin, cos, etc.) on power7 systems. This patch greatly speeds up 3 of the Spec 2006 floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math functions. I have done bootstraps on my power systems, and comparison tests and there were no regressions. Is it ok to install in the tree? [gcc] 2010-08-18 Michael Meissner * config/rs6000/rs6000.opt (-mmass): New option to enable the compiler to autovectorize mathmetical functions for power7 using the Mathematical Acceleration Subsystem library. * config/rs6000/rs6000.c (rs6000_builtin_vectorized_libmass): New function to handle auto vectorizing math functions that are in the MASS library. (rs6000_builtin_vectorized_function): Call it. * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mmass. [gcc/testsuite] 2010-08-18 Michael Meissner * gcc.target/powerpc/vsx-mass-1.c: New file, test -mmass. Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (revision 163347) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -115,6 +115,10 @@ mpopcntd Target Report Mask(POPCNTD) Use PowerPC V2.06 popcntd instruction +mmass +Target Report Var(TARGET_MASS) Init(0) +Use the Mathematical Acceleration Subsystem library high performance math libraries. + mvsx Target Report Mask(VSX) Use vector/scalar (VSX) instructions Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 163347) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -989,6 +989,7 @@ static rtx rs6000_emit_stack_reset (rs60 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); +static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int); static void rs6000_restore_saved_cr (rtx, int); @@ -3602,6 +3603,145 @@ rs6000_parse_fpu_option (const char *opt return FPU_NONE; } + +/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a + library with vectorized intrinsics. */ + +static tree +rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in) +{ + char name[32]; + const char *suffix = NULL; + tree fntype, new_fndecl, bdecl = NULL_TREE; + int n_args = 1; + const char *bname; + enum machine_mode el_mode, in_mode; + int n, in_n; + + /* Libmass is suitable for unsafe math only as it does not correctly support + parts of IEEE with the required precision such as denormals. Only support + it if we have VSX to use the simd d2 or f4 functions. + XXX: Add variable length support. */ + if (!flag_unsafe_math_optimizations || !TARGET_VSX) + return NULL_TREE; + + el_mode = TYPE_MODE (TREE_TYPE (type_out)); + n = TYPE_VECTOR_SUBPARTS (type_out); + in_mode = TYPE_MODE (TREE_TYPE (type_in)); + in_n = TYPE_VECTOR_SUBPARTS (type_in); + if (el_mode != in_mode + || n != in_n) + return NULL_TREE; + + if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) + { + enum built_in_function fn = DECL_FUNCTION_CODE (fndecl); + switch (fn) + { + case BUILT_IN_ATAN2: + case BUILT_IN_HYPOT: + case BUILT_IN_POW: + n_args = 2; + /* fall through */ + + case BUILT_IN_ACOS: + case BUILT_IN_ACOSH: + case BUILT_IN_ASIN: + case BUILT_IN_ASINH: + case BUILT_IN_ATAN: + case BUILT_IN_ATANH: + case BUILT_IN_CBRT: + case BUILT_IN_COS: + case BUILT_IN_COSH: + case BUILT_IN_ERF: + case BUILT_IN_ERFC: + case BUILT_IN_EXP2: + case BUILT_IN_EXP: + case BUILT_IN_EXPM1: + case BUILT_IN_LGAMMA: + case BUILT_IN_LOG10: + case BUILT_IN_LOG1P: + case BUILT_IN_LOG2: + case BUILT_IN_LOG: + case BUILT_IN_SIN: + case BUILT_IN_SINH: + case BUILT_IN_SQRT: + case BUILT_IN_TAN: + case BUILT_IN_TANH: + bdecl = implicit_built_in_decls[fn]; + suffix = "d2"; /* pow -> powd2 */ + if (el_mode != DFmode + || n != 2) + return NULL_TREE; + break; + + case BUILT_IN_ATAN2F: + case BUILT_IN_HYPOTF: + case BUILT_IN_POWF: + n_args = 2; + /* fall through */ + + case BUILT_IN_ACOSF: + case BUILT_IN_ACOSHF: + case BUILT_IN_ASINF: + case BUILT_IN_ASINHF: + case BUILT_IN_ATANF: + case BUILT_IN_ATANHF: + case BUILT_IN_CBRTF: + case BUILT_IN_COSF: + case BUILT_IN_COSHF: + case BUILT_IN_ERFF: + case BUILT_IN_ERFCF: + case BUILT_IN_EXP2F: + case BUILT_IN_EXPF: + case BUILT_IN_EXPM1F: + case BUILT_IN_LGAMMAF: + case BUILT_IN_LOG10F: + case BUILT_IN_LOG1PF: + case BUILT_IN_LOG2F: + case BUILT_IN_LOGF: + case BUILT_IN_SINF: + case BUILT_IN_SINHF: + case BUILT_IN_SQRTF: + case BUILT_IN_TANF: + case BUILT_IN_TANHF: + bdecl = implicit_built_in_decls[fn]; + suffix = "4"; /* powf -> powf4 */ + if (el_mode != SFmode + || n != 4) + return NULL_TREE; + break; + + default: + return NULL_TREE; + } + } + else + return NULL_TREE; + + gcc_assert (suffix != NULL); + bname = IDENTIFIER_POINTER (DECL_NAME (bdecl)); + strcpy (name, bname + sizeof ("__builtin_") - 1); + strcat (name, suffix); + + if (n_args == 1) + fntype = build_function_type_list (type_out, type_in, NULL); + else if (n_args == 2) + fntype = build_function_type_list (type_out, type_in, type_in, NULL); + else + gcc_unreachable (); + + /* Build a function declaration for the vectorized function. */ + new_fndecl = build_decl (BUILTINS_LOCATION, + FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; +} + /* Returns a function decl for a vectorized version of the builtin function with builtin function code FN and the result vector type TYPE, or NULL_TREE if it is not available. */ @@ -3768,6 +3908,10 @@ rs6000_builtin_vectorized_function (tree } } + /* Generate calls to libmass if appropriate. */ + if (TARGET_MASS) + return rs6000_builtin_vectorized_libmass (fndecl, type_out, type_in); + return NULL_TREE; } Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 163347) +++ gcc/doc/invoke.texi (working copy) @@ -786,7 +786,9 @@ See RS/6000 and PowerPC Options. -mprototype -mno-prototype @gol -msim -mmvme -mads -myellowknife -memb -msdata @gol -msdata=@var{opt} -mvxworks -G @var{num} -pthread @gol --mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision} +-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision +-mno-recip-precision @gol +-mmass} @emph{RX Options} @gccoptlist{-m64bit-doubles -m32bit-doubles -fpu -nofpu@gol @@ -15847,6 +15849,29 @@ automatically selects @option{-mrecip-pr precision square root estimate instructions are not generated by default on low precision machines, since they do not provide an estimate that converges after three steps. + +@item -mmass +@itemx -mno-mass +@opindex mmass +Specifies to use IBM's Mathematical Acceleration Subsystem (MASS) +libraries for vectorizing intrinsics using external libraries. GCC +will currently emit calls to @code{acosd2}, @code{acosf4}, +@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4}, +@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4}, +@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4}, +@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4}, +@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4}, +@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4}, +@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4}, +@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4}, +@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4}, +@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4}, +@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2}, +@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2}, +@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code +for power7. Both @option{-ftree-vectorize} and +@option{-funsafe-math-optimizations} have to be enabled. The MASS +libraries will have to be specified at link time. @end table @node RX Options Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c (revision 0) @@ -0,0 +1,554 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mmass" } */ +/* { dg-final { scan-assembler "bl atan2d2" } } */ +/* { dg-final { scan-assembler "bl atan2f4" } } */ +/* { dg-final { scan-assembler "bl hypotd2" } } */ +/* { dg-final { scan-assembler "bl hypotf4" } } */ +/* { dg-final { scan-assembler "bl powd2" } } */ +/* { dg-final { scan-assembler "bl powf4" } } */ +/* { dg-final { scan-assembler "bl acosd2" } } */ +/* { dg-final { scan-assembler "bl acosf4" } } */ +/* { dg-final { scan-assembler "bl acoshd2" } } */ +/* { dg-final { scan-assembler "bl acoshf4" } } */ +/* { dg-final { scan-assembler "bl asind2" } } */ +/* { dg-final { scan-assembler "bl asinf4" } } */ +/* { dg-final { scan-assembler "bl asinhd2" } } */ +/* { dg-final { scan-assembler "bl asinhf4" } } */ +/* { dg-final { scan-assembler "bl atand2" } } */ +/* { dg-final { scan-assembler "bl atanf4" } } */ +/* { dg-final { scan-assembler "bl atanhd2" } } */ +/* { dg-final { scan-assembler "bl atanhf4" } } */ +/* { dg-final { scan-assembler "bl cbrtd2" } } */ +/* { dg-final { scan-assembler "bl cbrtf4" } } */ +/* { dg-final { scan-assembler "bl cosd2" } } */ +/* { dg-final { scan-assembler "bl cosf4" } } */ +/* { dg-final { scan-assembler "bl coshd2" } } */ +/* { dg-final { scan-assembler "bl coshf4" } } */ +/* { dg-final { scan-assembler "bl erfd2" } } */ +/* { dg-final { scan-assembler "bl erff4" } } */ +/* { dg-final { scan-assembler "bl erfcd2" } } */ +/* { dg-final { scan-assembler "bl erfcf4" } } */ +/* { dg-final { scan-assembler "bl exp2d2" } } */ +/* { dg-final { scan-assembler "bl exp2f4" } } */ +/* { dg-final { scan-assembler "bl expd2" } } */ +/* { dg-final { scan-assembler "bl expf4" } } */ +/* { dg-final { scan-assembler "bl expm1d2" } } */ +/* { dg-final { scan-assembler "bl expm1f4" } } */ +/* { dg-final { scan-assembler "bl lgamma" } } */ +/* { dg-final { scan-assembler "bl lgammaf" } } */ +/* { dg-final { scan-assembler "bl log10d2" } } */ +/* { dg-final { scan-assembler "bl log10f4" } } */ +/* { dg-final { scan-assembler "bl log1pd2" } } */ +/* { dg-final { scan-assembler "bl log1pf4" } } */ +/* { dg-final { scan-assembler "bl log2d2" } } */ +/* { dg-final { scan-assembler "bl log2f4" } } */ +/* { dg-final { scan-assembler "bl logd2" } } */ +/* { dg-final { scan-assembler "bl logf4" } } */ +/* { dg-final { scan-assembler "bl sind2" } } */ +/* { dg-final { scan-assembler "bl sinf4" } } */ +/* { dg-final { scan-assembler "bl sinhd2" } } */ +/* { dg-final { scan-assembler "bl sinhf4" } } */ +/* { dg-final { scan-assembler "bl tand2" } } */ +/* { dg-final { scan-assembler "bl tanf4" } } */ +/* { dg-final { scan-assembler "bl tanhd2" } } */ +/* { dg-final { scan-assembler "bl tanhf4" } } */ + +#ifndef SIZE +#define SIZE 1024 +#endif + +double d1[SIZE] __attribute__((__aligned__(32))); +double d2[SIZE] __attribute__((__aligned__(32))); +double d3[SIZE] __attribute__((__aligned__(32))); + +float f1[SIZE] __attribute__((__aligned__(32))); +float f2[SIZE] __attribute__((__aligned__(32))); +float f3[SIZE] __attribute__((__aligned__(32))); + +void +test_double_atan2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atan2 (d2[i], d3[i]); +} + +void +test_float_atan2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atan2f (f2[i], f3[i]); +} + +void +test_double_hypot (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_hypot (d2[i], d3[i]); +} + +void +test_float_hypot (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_hypotf (f2[i], f3[i]); +} + +void +test_double_pow (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_pow (d2[i], d3[i]); +} + +void +test_float_pow (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_powf (f2[i], f3[i]); +} + +void +test_double_acos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_acos (d2[i]); +} + +void +test_float_acos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_acosf (f2[i]); +} + +void +test_double_acosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_acosh (d2[i]); +} + +void +test_float_acosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_acoshf (f2[i]); +} + +void +test_double_asin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_asin (d2[i]); +} + +void +test_float_asin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_asinf (f2[i]); +} + +void +test_double_asinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_asinh (d2[i]); +} + +void +test_float_asinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_asinhf (f2[i]); +} + +void +test_double_atan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atan (d2[i]); +} + +void +test_float_atan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atanf (f2[i]); +} + +void +test_double_atanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atanh (d2[i]); +} + +void +test_float_atanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atanhf (f2[i]); +} + +void +test_double_cbrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cbrt (d2[i]); +} + +void +test_float_cbrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_cbrtf (f2[i]); +} + +void +test_double_cos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cos (d2[i]); +} + +void +test_float_cos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_cosf (f2[i]); +} + +void +test_double_cosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cosh (d2[i]); +} + +void +test_float_cosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_coshf (f2[i]); +} + +void +test_double_erf (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_erf (d2[i]); +} + +void +test_float_erf (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_erff (f2[i]); +} + +void +test_double_erfc (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_erfc (d2[i]); +} + +void +test_float_erfc (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_erfcf (f2[i]); +} + +void +test_double_exp2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_exp2 (d2[i]); +} + +void +test_float_exp2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_exp2f (f2[i]); +} + +void +test_double_exp (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_exp (d2[i]); +} + +void +test_float_exp (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_expf (f2[i]); +} + +void +test_double_expm1 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_expm1 (d2[i]); +} + +void +test_float_expm1 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_expm1f (f2[i]); +} + +void +test_double_lgamma (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_lgamma (d2[i]); +} + +void +test_float_lgamma (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_lgammaf (f2[i]); +} + +void +test_double_log10 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log10 (d2[i]); +} + +void +test_float_log10 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log10f (f2[i]); +} + +void +test_double_log1p (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log1p (d2[i]); +} + +void +test_float_log1p (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log1pf (f2[i]); +} + +void +test_double_log2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log2 (d2[i]); +} + +void +test_float_log2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log2f (f2[i]); +} + +void +test_double_log (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log (d2[i]); +} + +void +test_float_log (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_logf (f2[i]); +} + +void +test_double_sin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sin (d2[i]); +} + +void +test_float_sin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sinf (f2[i]); +} + +void +test_double_sinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sinh (d2[i]); +} + +void +test_float_sinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sinhf (f2[i]); +} + +void +test_double_sqrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sqrt (d2[i]); +} + +void +test_float_sqrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sqrtf (f2[i]); +} + +void +test_double_tan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_tan (d2[i]); +} + +void +test_float_tan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_tanf (f2[i]); +} + +void +test_double_tanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_tanh (d2[i]); +} + +void +test_float_tanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_tanhf (f2[i]); +}