From patchwork Wed Aug 18 22:04:04 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 62092 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id CA3F3B6EF7 for ; Thu, 19 Aug 2010 08:05:07 +1000 (EST) Received: (qmail 27814 invoked by alias); 18 Aug 2010 22:05:06 -0000 Received: (qmail 27779 invoked by uid 22791); 18 Aug 2010 22:05:03 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, TW_CP, TW_FN, TW_MM, TW_MV, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from e7.ny.us.ibm.com (HELO e7.ny.us.ibm.com) (32.97.182.137) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 18 Aug 2010 22:04:19 +0000 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e7.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o7ILoSKE012451 for ; Wed, 18 Aug 2010 17:50:28 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o7IM4905097528 for ; Wed, 18 Aug 2010 18:04:10 -0400 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o7IM48I7018790 for ; Wed, 18 Aug 2010 16:04:09 -0600 Received: from hungry-tiger.westford.ibm.com (IBM-C78937C9630.westford.ibm.com [9.33.37.152]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id o7IM47RD018730; Wed, 18 Aug 2010 16:04:08 -0600 Received: from hungry-tiger.westford.ibm.com (hungry-tiger.westford.ibm.com [127.0.0.1]) by hungry-tiger.westford.ibm.com (Postfix) with ESMTP id 7B8F03FEE9; Wed, 18 Aug 2010 18:04:07 -0400 (EDT) Received: (from meissner@localhost) by hungry-tiger.westford.ibm.com (8.14.3/8.14.3/Submit) id o7IM44hG027610; Wed, 18 Aug 2010 18:04:04 -0400 Date: Wed, 18 Aug 2010 18:04:04 -0400 From: Michael Meissner To: Richard Guenther Cc: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [PATCH, powerpc] Add -mmass to use XL's MASS vectorization library Message-ID: <20100818220404.GA27010@hungry-tiger.westford.ibm.com> Mail-Followup-To: Michael Meissner , Richard Guenther , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com References: <20100818203218.GA18478@hungry-tiger.westford.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-08-17) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Wed, Aug 18, 2010 at 10:36:13PM +0200, Richard Guenther wrote: > On Wed, Aug 18, 2010 at 10:32 PM, Michael Meissner > wrote: > > This patch was cloned from the i386 -mveclibabi= support, and it adds a > > new switch (-mmass) that says to vectorize various mathematical functions (sin, > > cos, etc.) on power7 systems.  This patch greatly speeds up 3 of the Spec 2006 > > floating point benchmarks (tonto, wrf, GemsFDTD) that heavily use the math > > functions.  I have done bootstraps on my power systems, and comparison tests > > and there were no regressions.  Is it ok to install in the tree? > > In the case that we develop a common library for all archs it would be nice > to have the same switch for ppc as we have for x86, so why didn't you > use -mveclibabi=mass? This revised patch changes the name of the switch to -mveclibabi=mass. Is it ok to apply? [gcc] 2010-08-18 Michael Meissner * config/rs6000/rs6000.opt (-mveclibabi=mass): New option to enable the compiler to autovectorize mathmetical functions for power7 using the Mathematical Acceleration Subsystem library. * config/rs6000/rs6000.c (rs6000_veclib_handler): New variable to handle which vector math library we have. (rs6000_override_options): Add -mveclibabi=mass support. (rs6000_builtin_vectorized_libmass): New function to handle auto vectorizing math functions that are in the MASS library. (rs6000_builtin_vectorized_function): Call it. * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mveclibabi=mass. [gcc/testsuite] 2010-08-18 Michael Meissner * gcc.target/powerpc/vsx-mass-1.c: New file, test -mveclibabi=mass. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk) (revision 163345) +++ gcc/doc/invoke.texi (working copy) @@ -786,7 +786,9 @@ See RS/6000 and PowerPC Options. -mprototype -mno-prototype @gol -msim -mmvme -mads -myellowknife -memb -msdata @gol -msdata=@var{opt} -mvxworks -G @var{num} -pthread @gol --mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision -mno-recip-precision} +-mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision +-mno-recip-precision @gol +-mveclibabi=@var{type}} @emph{RX Options} @gccoptlist{-m64bit-doubles -m32bit-doubles -fpu -nofpu@gol @@ -15847,6 +15849,30 @@ automatically selects @option{-mrecip-pr precision square root estimate instructions are not generated by default on low precision machines, since they do not provide an estimate that converges after three steps. + +@item -mveclibabi=@var{type} +@opindex mveclibabi +Specifies the ABI type to use for vectorizing intrinsics using an +external library. The only type supported at present is @code{mass}, +which specifies to use IBM's Mathematical Acceleration Subsystem +(MASS) libraries for vectorizing intrinsics using external libraries. +GCC will currently emit calls to @code{acosd2}, @code{acosf4}, +@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4}, +@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4}, +@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4}, +@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4}, +@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4}, +@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4}, +@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4}, +@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4}, +@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4}, +@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4}, +@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2}, +@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2}, +@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code +for power7. Both @option{-ftree-vectorize} and +@option{-funsafe-math-optimizations} have to be enabled. The MASS +libraries will have to be specified at link time. @end table @node RX Options Index: gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-mass-1.c (revision 163355) @@ -0,0 +1,554 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O3 -ftree-vectorize -mcpu=power7 -ffast-math -mveclibabi=mass" } */ +/* { dg-final { scan-assembler "bl atan2d2" } } */ +/* { dg-final { scan-assembler "bl atan2f4" } } */ +/* { dg-final { scan-assembler "bl hypotd2" } } */ +/* { dg-final { scan-assembler "bl hypotf4" } } */ +/* { dg-final { scan-assembler "bl powd2" } } */ +/* { dg-final { scan-assembler "bl powf4" } } */ +/* { dg-final { scan-assembler "bl acosd2" } } */ +/* { dg-final { scan-assembler "bl acosf4" } } */ +/* { dg-final { scan-assembler "bl acoshd2" } } */ +/* { dg-final { scan-assembler "bl acoshf4" } } */ +/* { dg-final { scan-assembler "bl asind2" } } */ +/* { dg-final { scan-assembler "bl asinf4" } } */ +/* { dg-final { scan-assembler "bl asinhd2" } } */ +/* { dg-final { scan-assembler "bl asinhf4" } } */ +/* { dg-final { scan-assembler "bl atand2" } } */ +/* { dg-final { scan-assembler "bl atanf4" } } */ +/* { dg-final { scan-assembler "bl atanhd2" } } */ +/* { dg-final { scan-assembler "bl atanhf4" } } */ +/* { dg-final { scan-assembler "bl cbrtd2" } } */ +/* { dg-final { scan-assembler "bl cbrtf4" } } */ +/* { dg-final { scan-assembler "bl cosd2" } } */ +/* { dg-final { scan-assembler "bl cosf4" } } */ +/* { dg-final { scan-assembler "bl coshd2" } } */ +/* { dg-final { scan-assembler "bl coshf4" } } */ +/* { dg-final { scan-assembler "bl erfd2" } } */ +/* { dg-final { scan-assembler "bl erff4" } } */ +/* { dg-final { scan-assembler "bl erfcd2" } } */ +/* { dg-final { scan-assembler "bl erfcf4" } } */ +/* { dg-final { scan-assembler "bl exp2d2" } } */ +/* { dg-final { scan-assembler "bl exp2f4" } } */ +/* { dg-final { scan-assembler "bl expd2" } } */ +/* { dg-final { scan-assembler "bl expf4" } } */ +/* { dg-final { scan-assembler "bl expm1d2" } } */ +/* { dg-final { scan-assembler "bl expm1f4" } } */ +/* { dg-final { scan-assembler "bl lgamma" } } */ +/* { dg-final { scan-assembler "bl lgammaf" } } */ +/* { dg-final { scan-assembler "bl log10d2" } } */ +/* { dg-final { scan-assembler "bl log10f4" } } */ +/* { dg-final { scan-assembler "bl log1pd2" } } */ +/* { dg-final { scan-assembler "bl log1pf4" } } */ +/* { dg-final { scan-assembler "bl log2d2" } } */ +/* { dg-final { scan-assembler "bl log2f4" } } */ +/* { dg-final { scan-assembler "bl logd2" } } */ +/* { dg-final { scan-assembler "bl logf4" } } */ +/* { dg-final { scan-assembler "bl sind2" } } */ +/* { dg-final { scan-assembler "bl sinf4" } } */ +/* { dg-final { scan-assembler "bl sinhd2" } } */ +/* { dg-final { scan-assembler "bl sinhf4" } } */ +/* { dg-final { scan-assembler "bl tand2" } } */ +/* { dg-final { scan-assembler "bl tanf4" } } */ +/* { dg-final { scan-assembler "bl tanhd2" } } */ +/* { dg-final { scan-assembler "bl tanhf4" } } */ + +#ifndef SIZE +#define SIZE 1024 +#endif + +double d1[SIZE] __attribute__((__aligned__(32))); +double d2[SIZE] __attribute__((__aligned__(32))); +double d3[SIZE] __attribute__((__aligned__(32))); + +float f1[SIZE] __attribute__((__aligned__(32))); +float f2[SIZE] __attribute__((__aligned__(32))); +float f3[SIZE] __attribute__((__aligned__(32))); + +void +test_double_atan2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atan2 (d2[i], d3[i]); +} + +void +test_float_atan2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atan2f (f2[i], f3[i]); +} + +void +test_double_hypot (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_hypot (d2[i], d3[i]); +} + +void +test_float_hypot (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_hypotf (f2[i], f3[i]); +} + +void +test_double_pow (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_pow (d2[i], d3[i]); +} + +void +test_float_pow (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_powf (f2[i], f3[i]); +} + +void +test_double_acos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_acos (d2[i]); +} + +void +test_float_acos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_acosf (f2[i]); +} + +void +test_double_acosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_acosh (d2[i]); +} + +void +test_float_acosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_acoshf (f2[i]); +} + +void +test_double_asin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_asin (d2[i]); +} + +void +test_float_asin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_asinf (f2[i]); +} + +void +test_double_asinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_asinh (d2[i]); +} + +void +test_float_asinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_asinhf (f2[i]); +} + +void +test_double_atan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atan (d2[i]); +} + +void +test_float_atan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atanf (f2[i]); +} + +void +test_double_atanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_atanh (d2[i]); +} + +void +test_float_atanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_atanhf (f2[i]); +} + +void +test_double_cbrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cbrt (d2[i]); +} + +void +test_float_cbrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_cbrtf (f2[i]); +} + +void +test_double_cos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cos (d2[i]); +} + +void +test_float_cos (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_cosf (f2[i]); +} + +void +test_double_cosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_cosh (d2[i]); +} + +void +test_float_cosh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_coshf (f2[i]); +} + +void +test_double_erf (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_erf (d2[i]); +} + +void +test_float_erf (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_erff (f2[i]); +} + +void +test_double_erfc (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_erfc (d2[i]); +} + +void +test_float_erfc (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_erfcf (f2[i]); +} + +void +test_double_exp2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_exp2 (d2[i]); +} + +void +test_float_exp2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_exp2f (f2[i]); +} + +void +test_double_exp (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_exp (d2[i]); +} + +void +test_float_exp (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_expf (f2[i]); +} + +void +test_double_expm1 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_expm1 (d2[i]); +} + +void +test_float_expm1 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_expm1f (f2[i]); +} + +void +test_double_lgamma (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_lgamma (d2[i]); +} + +void +test_float_lgamma (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_lgammaf (f2[i]); +} + +void +test_double_log10 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log10 (d2[i]); +} + +void +test_float_log10 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log10f (f2[i]); +} + +void +test_double_log1p (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log1p (d2[i]); +} + +void +test_float_log1p (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log1pf (f2[i]); +} + +void +test_double_log2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log2 (d2[i]); +} + +void +test_float_log2 (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_log2f (f2[i]); +} + +void +test_double_log (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_log (d2[i]); +} + +void +test_float_log (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_logf (f2[i]); +} + +void +test_double_sin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sin (d2[i]); +} + +void +test_float_sin (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sinf (f2[i]); +} + +void +test_double_sinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sinh (d2[i]); +} + +void +test_float_sinh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sinhf (f2[i]); +} + +void +test_double_sqrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_sqrt (d2[i]); +} + +void +test_float_sqrt (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_sqrtf (f2[i]); +} + +void +test_double_tan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_tan (d2[i]); +} + +void +test_float_tan (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_tanf (f2[i]); +} + +void +test_double_tanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + d1[i] = __builtin_tanh (d2[i]); +} + +void +test_float_tanh (void) +{ + int i; + + for (i = 0; i < SIZE; i++) + f1[i] = __builtin_tanhf (f2[i]); +} Index: gcc/config/rs6000/rs6000.opt =================================================================== --- gcc/config/rs6000/rs6000.opt (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk) (revision 163345) +++ gcc/config/rs6000/rs6000.opt (working copy) @@ -115,6 +115,10 @@ mpopcntd Target Report Mask(POPCNTD) Use PowerPC V2.06 popcntd instruction +mveclibabi= +Target RejectNegative Joined Var(rs6000_veclibabi_name) +Vector library ABI to use + mvsx Target Report Mask(VSX) Use vector/scalar (VSX) instructions Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk) (revision 163345) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -949,6 +949,9 @@ static const enum rs6000_btc builtin_cla #undef RS6000_BUILTIN #undef RS6000_BUILTIN_EQUATE +/* Support for -mveclibabi= to control which vector library to use. */ +static tree (*rs6000_veclib_handler) (tree, tree, tree); + static bool rs6000_function_ok_for_sibcall (tree, tree); static const char *rs6000_invalid_within_doloop (const_rtx); @@ -989,6 +992,7 @@ static rtx rs6000_emit_stack_reset (rs60 static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); +static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); static int rs6000_savres_strategy (rs6000_stack_t *, bool, int, int); static void rs6000_restore_saved_cr (rtx, int); @@ -2771,6 +2775,15 @@ rs6000_override_options (const char *def rs6000_traceback_name); } + if (rs6000_veclibabi_name) + { + if (strcmp (rs6000_veclibabi_name, "mass") == 0) + rs6000_veclib_handler = rs6000_builtin_vectorized_libmass; + else + error ("unknown vectorization library ABI type (%s) for " + "-mveclibabi= switch", rs6000_veclibabi_name); + } + if (!rs6000_explicit_options.long_double) rs6000_long_double_type_size = RS6000_DEFAULT_LONG_DOUBLE_SIZE; @@ -3602,6 +3615,145 @@ rs6000_parse_fpu_option (const char *opt return FPU_NONE; } + +/* Handler for the Mathematical Acceleration Subsystem (mass) interface to a + library with vectorized intrinsics. */ + +static tree +rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in) +{ + char name[32]; + const char *suffix = NULL; + tree fntype, new_fndecl, bdecl = NULL_TREE; + int n_args = 1; + const char *bname; + enum machine_mode el_mode, in_mode; + int n, in_n; + + /* Libmass is suitable for unsafe math only as it does not correctly support + parts of IEEE with the required precision such as denormals. Only support + it if we have VSX to use the simd d2 or f4 functions. + XXX: Add variable length support. */ + if (!flag_unsafe_math_optimizations || !TARGET_VSX) + return NULL_TREE; + + el_mode = TYPE_MODE (TREE_TYPE (type_out)); + n = TYPE_VECTOR_SUBPARTS (type_out); + in_mode = TYPE_MODE (TREE_TYPE (type_in)); + in_n = TYPE_VECTOR_SUBPARTS (type_in); + if (el_mode != in_mode + || n != in_n) + return NULL_TREE; + + if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) + { + enum built_in_function fn = DECL_FUNCTION_CODE (fndecl); + switch (fn) + { + case BUILT_IN_ATAN2: + case BUILT_IN_HYPOT: + case BUILT_IN_POW: + n_args = 2; + /* fall through */ + + case BUILT_IN_ACOS: + case BUILT_IN_ACOSH: + case BUILT_IN_ASIN: + case BUILT_IN_ASINH: + case BUILT_IN_ATAN: + case BUILT_IN_ATANH: + case BUILT_IN_CBRT: + case BUILT_IN_COS: + case BUILT_IN_COSH: + case BUILT_IN_ERF: + case BUILT_IN_ERFC: + case BUILT_IN_EXP2: + case BUILT_IN_EXP: + case BUILT_IN_EXPM1: + case BUILT_IN_LGAMMA: + case BUILT_IN_LOG10: + case BUILT_IN_LOG1P: + case BUILT_IN_LOG2: + case BUILT_IN_LOG: + case BUILT_IN_SIN: + case BUILT_IN_SINH: + case BUILT_IN_SQRT: + case BUILT_IN_TAN: + case BUILT_IN_TANH: + bdecl = implicit_built_in_decls[fn]; + suffix = "d2"; /* pow -> powd2 */ + if (el_mode != DFmode + || n != 2) + return NULL_TREE; + break; + + case BUILT_IN_ATAN2F: + case BUILT_IN_HYPOTF: + case BUILT_IN_POWF: + n_args = 2; + /* fall through */ + + case BUILT_IN_ACOSF: + case BUILT_IN_ACOSHF: + case BUILT_IN_ASINF: + case BUILT_IN_ASINHF: + case BUILT_IN_ATANF: + case BUILT_IN_ATANHF: + case BUILT_IN_CBRTF: + case BUILT_IN_COSF: + case BUILT_IN_COSHF: + case BUILT_IN_ERFF: + case BUILT_IN_ERFCF: + case BUILT_IN_EXP2F: + case BUILT_IN_EXPF: + case BUILT_IN_EXPM1F: + case BUILT_IN_LGAMMAF: + case BUILT_IN_LOG10F: + case BUILT_IN_LOG1PF: + case BUILT_IN_LOG2F: + case BUILT_IN_LOGF: + case BUILT_IN_SINF: + case BUILT_IN_SINHF: + case BUILT_IN_SQRTF: + case BUILT_IN_TANF: + case BUILT_IN_TANHF: + bdecl = implicit_built_in_decls[fn]; + suffix = "4"; /* powf -> powf4 */ + if (el_mode != SFmode + || n != 4) + return NULL_TREE; + break; + + default: + return NULL_TREE; + } + } + else + return NULL_TREE; + + gcc_assert (suffix != NULL); + bname = IDENTIFIER_POINTER (DECL_NAME (bdecl)); + strcpy (name, bname + sizeof ("__builtin_") - 1); + strcat (name, suffix); + + if (n_args == 1) + fntype = build_function_type_list (type_out, type_in, NULL); + else if (n_args == 2) + fntype = build_function_type_list (type_out, type_in, type_in, NULL); + else + gcc_unreachable (); + + /* Build a function declaration for the vectorized function. */ + new_fndecl = build_decl (BUILTINS_LOCATION, + FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; +} + /* Returns a function decl for a vectorized version of the builtin function with builtin function code FN and the result vector type TYPE, or NULL_TREE if it is not available. */ @@ -3768,6 +3920,10 @@ rs6000_builtin_vectorized_function (tree } } + /* Generate calls to libmass if appropriate. */ + if (rs6000_veclib_handler) + return rs6000_veclib_handler (fndecl, type_out, type_in); + return NULL_TREE; }