amdgcn: Enable SIMD vectorization of math functions

Message ID	f85990b0-1a7a-bc17-50c1-ef176cbd0dae@codesourcery.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 46B3C3858D33 IronPort-SDR: GOR4QyOhb9nG1umZ7o9qJkQC6D4/544M14F3216cd6f6etKt4lWICJqI64OmqIOmydAuGwTqQu n7MVADz+ZxO0ALU1DIXosiN8eiqBfjKTaTNJnym8HYymPbKP0K/x02h4exyhLy03sfeU+DZzit n9n47i8yorddqdzeJIZLWeqPLQe+c6bWvtVesYF26B+6CztzFIWLUsIS56ACLr3vE7gBJH49vu jYF7hd6hcdVtd79Ta5S72VMLAROXYorSt+jqsm8kUt95UOe8g9pVyBNn24hIHJ+UcugaRnVBEF 5wQ= Content-Type: multipart/mixed; boundary="------------vfzC6fidkej6Iaqjh0y6IvRX" Message-ID: <f85990b0-1a7a-bc17-50c1-ef176cbd0dae@codesourcery.com> Date: Tue, 28 Feb 2023 23:01:20 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 From: Kwok Cheung Yeung <kcy@codesourcery.com> Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions To: gcc-patches <gcc-patches@gcc.gnu.org>, <ams@codesourcery.com> Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	amdgcn: Enable SIMD vectorization of math functions \| expand amdgcn: Enable SIMD vectorization of math functions

Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? Kwok On 01/03/2023 10:01 am, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN architecture, such that when vectorized, >> calls to builtin standard math functions such as asinf, exp, pow etc. >> are converted to calls to the recently added vectorized math functions >> for GCN in Newlib. The -fno-math-errno flag is required in addition to >> the usual vectorization optimization flags for this to occur, and some >> of the math functions (the larger double-precision ones) require a >> large stack size to function properly. >> >> This patch requires the GCN vector math functions in Newlib to >> function - these were included in the recent 4.3.0.20230120 snapshot. >> As this was a minimum requirement starting from the patch 'amdgcn, >> libgomp: Manually allocated stacks', this should not be a problem. >> >> I have added new testcases in the testsuite that compare the output of >> the vectorized math functions against the scalar, passing if they are >> sufficiently close. With the testcase for standalone GCN (without >> libgomp) in gcc.target/gcn/, there is a problem since gcn-run >> currently cannot set the stack size correctly in DejaGnu testing, so I >> have made it a compile test for now - it is still useful to check that >> calls to the correct functions are being made. The runtime correctness >> is still covered by the libgomp test. >> >> Okay for trunk? > > The main part of the patch is OK, with the small changes below. > > Others have pointed out that "omp declare simd" exists, but you and I > have been all through that verbally, long ago, and as Tobias says the > offload compiler cannot rely on markup in the host compiler's header > files to solve this problem. > >> @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl) >> gcn_simd_clone_compute_vecsize_and_simdlen >> #undef TARGET_SIMD_CLONE_USABLE >> #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable >> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ >> + gcn_vectorize_builtin_vectorized_function >> +#undef TARGET_LIBC_HAS_FUNCTION >> +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function >> #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P >> #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ >> gcn_small_register_classes_for_mode_p > > Please keep these in alphabetical order. > >> +/* Ideally this test should be run, but the math routines require a >> large >> + stack and gcn-run currently does not respect the stack-size >> parameter. */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno >> -mstack-size=3000000 -fdump-tree-vect" } */ > > This isn't ideal. The dg-set-target-env-var directive (I think this is > it?) can set GCN_STACK_SIZE, which gcn-run does honour, but I realise > that doesn't work with remote test targets (like ours). > > I suggest adding an additional test that sets the envvar and #includes > the code from this one; one test to scan the dumps, one test to run it. > Like this .... (untested, syntax uncertain). > > /* { dg-do run } */ > /* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ > /* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */ > #include "simd-math-1.c" > > The run test will get skipped in our test environment (and anyone else > using remote), but the libgomp test should make up for that. > > Andrew From 0b43ef3c2d6afd4aecfc03fd1d2df675626e017b Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung <kcy@codesourcery.com> Date: Tue, 28 Feb 2023 14:15:47 +0000 Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions Calls to vectorized versions of routines in the math library will now be inserted when vectorizing code containing supported math functions. 2023-02-28 Kwok Cheung Yeung <kcy@codesourcery.com> Paul-Antoine Arras <pa@codesourcery.com> gcc/ * builtins.cc (mathfn_built_in_explicit): New. * config/gcn/gcn.cc: Include case-cfn-macros.h. (mathfn_built_in_explicit): Add prototype. (gcn_vectorize_builtin_vectorized_function): New. (gcn_libc_has_function): New. (TARGET_LIBC_HAS_FUNCTION): Define. (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define. gcc/testsuite/ * gcc.target/gcn/simd-math-1.c: New testcase. * gcc.target/gcn/simd-math-2.c: New testcase. libgomp/ * testsuite/libgomp.c/simd-math-1.c: New testcase. --- gcc/builtins.cc | 8 + gcc/config/gcn/gcn.cc | 110 +++++++++++ gcc/testsuite/gcc.target/gcn/simd-math-1.c | 206 +++++++++++++++++++ gcc/testsuite/gcc.target/gcn/simd-math-2.c | 8 + libgomp/testsuite/libgomp.c/simd-math-1.c | 217 +++++++++++++++++++++ 5 files changed, 549 insertions(+) create mode 100644 gcc/testsuite/gcc.target/gcn/simd-math-1.c create mode 100644 gcc/testsuite/gcc.target/gcn/simd-math-2.c create mode 100644 libgomp/testsuite/libgomp.c/simd-math-1.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 4d467c8c5c1..305c65c29be 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2089,6 +2089,14 @@ mathfn_built_in (tree type, combined_fn fn) return mathfn_built_in_1 (type, fn, /*implicit=*/ 1); } +/* Like mathfn_built_in_1, but always use the explicit array. */ + +tree +mathfn_built_in_explicit (tree type, combined_fn fn) +{ + return mathfn_built_in_1 (type, fn, /*implicit=*/ 0); +} + /* Like mathfn_built_in_1, but take a built_in_function and always use the implicit array. */ diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 23ab01e75d8..6f0a90a4904 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -53,6 +53,7 @@ #include "dwarf2.h" #include "gimple.h" #include "cgraph.h" +#include "case-cfn-macros.h" /* This file should be included last. */ #include "target-def.h" @@ -5240,6 +5241,110 @@ gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) return 0; } +tree mathfn_built_in_explicit (tree, combined_fn); + +/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. + Return the function declaration of the vectorized version of the builtin + in the math library if available. */ + +tree +gcn_vectorize_builtin_vectorized_function (unsigned int fn, tree type_out, + tree type_in) +{ + if (TREE_CODE (type_out) != VECTOR_TYPE + || TREE_CODE (type_in) != VECTOR_TYPE) + return NULL_TREE; + + machine_mode out_mode = TYPE_MODE (TREE_TYPE (type_out)); + int out_n = TYPE_VECTOR_SUBPARTS (type_out); + machine_mode in_mode = TYPE_MODE (TREE_TYPE (type_in)); + int in_n = TYPE_VECTOR_SUBPARTS (type_in); + combined_fn cfn = combined_fn (fn); + + /* Keep this consistent with the list of vectorized math routines. */ + int implicit_p; + switch (fn) + { + CASE_CFN_ACOS: + CASE_CFN_ACOSH: + CASE_CFN_ASIN: + CASE_CFN_ASINH: + CASE_CFN_ATAN: + CASE_CFN_ATAN2: + CASE_CFN_ATANH: + CASE_CFN_COPYSIGN: + CASE_CFN_COS: + CASE_CFN_COSH: + CASE_CFN_ERF: + CASE_CFN_EXP: + CASE_CFN_EXP2: + CASE_CFN_FINITE: + CASE_CFN_FMOD: + CASE_CFN_GAMMA: + CASE_CFN_HYPOT: + CASE_CFN_ISNAN: + CASE_CFN_LGAMMA: + CASE_CFN_LOG: + CASE_CFN_LOG10: + CASE_CFN_LOG2: + CASE_CFN_POW: + CASE_CFN_REMAINDER: + CASE_CFN_RINT: + CASE_CFN_SIN: + CASE_CFN_SINH: + CASE_CFN_SQRT: + CASE_CFN_TAN: + CASE_CFN_TANH: + CASE_CFN_TGAMMA: + implicit_p = 1; + break; + + CASE_CFN_SCALB: + CASE_CFN_SIGNIFICAND: + implicit_p = 0; + break; + + default: + return NULL_TREE; + } + + tree out_t_node = (out_mode == DFmode) ? double_type_node : float_type_node; + tree fndecl = implicit_p ? mathfn_built_in (out_t_node, cfn) + : mathfn_built_in_explicit (out_t_node, cfn); + + const char *bname = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + char name[20]; + sprintf (name, out_mode == DFmode ? "v%ddf_%s" : "v%dsf_%s", + out_n, bname + 10); + + unsigned arity = 0; + for (tree args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args)) + arity++; + + tree fntype = (arity == 1) + ? build_function_type_list (type_out, type_in, NULL) + : build_function_type_list (type_out, type_in, type_in, NULL); + + /* Build a function declaration for the vectorized function. */ + tree new_fndecl = build_decl (BUILTINS_LOCATION, + FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; +} + +/* Implement TARGET_LIBC_HAS_FUNCTION. */ + +bool +gcn_libc_has_function (enum function_class fn_class, + tree type) +{ + return bsd_libc_has_function (fn_class, type); +} + /* }}} */ /* {{{ md_reorg pass. */ @@ -7290,6 +7395,8 @@ gcn_dwarf_register_span (rtx rtl) gcn_ira_change_pseudo_allocno_class #undef TARGET_LEGITIMATE_CONSTANT_P #define TARGET_LEGITIMATE_CONSTANT_P gcn_legitimate_constant_p +#undef TARGET_LIBC_HAS_FUNCTION +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function #undef TARGET_LRA_P #define TARGET_LRA_P hook_bool_void_true #undef TARGET_MACHINE_DEPENDENT_REORG @@ -7337,6 +7444,9 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_TRULY_NOOP_TRUNCATION gcn_truly_noop_truncation #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST gcn_vectorization_cost +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ + gcn_vectorize_builtin_vectorized_function #undef TARGET_VECTORIZE_GET_MASK_MODE #define TARGET_VECTORIZE_GET_MASK_MODE gcn_vectorize_get_mask_mode #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-1.c b/gcc/testsuite/gcc.target/gcn/simd-math-1.c new file mode 100644 index 00000000000..6868ccb2c54 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/simd-math-1.c @@ -0,0 +1,206 @@ +/* Check that calls to the vectorized math functions are actually emitted. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */ + + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include <math.h> +#include <stdlib.h> + +#ifdef PRINT_RESULT + #include <stdio.h> + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); /* { dg-final { scan-tree-dump "v64sf_acosf" "vect" } }*/ + TEST_FUN (float, -10, 10, acoshf); /* { dg-final { scan-tree-dump "v64sf_acoshf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, asinf); /* { dg-final { scan-tree-dump "v64sf_asinf" "vect" } }*/ + TEST_FUN (float, -10, 10, asinhf); /* { dg-final { scan-tree-dump "v64sf_asinhf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, atanf); /* { dg-final { scan-tree-dump "v64sf_atanf" "vect" } }*/ + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); /* { dg-final { scan-tree-dump "v64sf_atan2f" "vect" } }*/ + TEST_FUN (float, -2.0, 2.0, atanhf); /* { dg-final { scan-tree-dump "v64sf_atanhf" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); /* { dg-final { scan-tree-dump "v64sf_copysignf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); /* { dg-final { scan-tree-dump "v64sf_cosf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); /* { dg-final { scan-tree-dump "v64sf_coshf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, erff); /* { dg-final { scan-tree-dump "v64sf_erff" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, expf); /* { dg-final { scan-tree-dump "v64sf_expf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, exp2f); /* { dg-final { scan-tree-dump "v64sf_exp2f" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); /* { dg-final { scan-tree-dump "v64sf_fmodf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, gammaf); /* { dg-final { scan-tree-dump "v64sf_gammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); /* { dg-final { scan-tree-dump "v64sf_hypotf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, lgammaf); /* { dg-final { scan-tree-dump "v64sf_lgammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN (float, -1.0, 50.0, logf); /* { dg-final { scan-tree-dump "v64sf_logf" "vect" } }*/ + TEST_FUN (float, -1.0, 500.0, log10f); /* { dg-final { scan-tree-dump "v64sf_log10f" "vect" } }*/ + TEST_FUN (float, -1.0, 64.0, log2f); /* { dg-final { scan-tree-dump "v64sf_log2f" "vect" } }*/ + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); /* { dg-final { scan-tree-dump "v64sf_powf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); /* { dg-final { scan-tree-dump "v64sf_remainderf" "vect" } }*/ + TEST_FUN (float, -50.0, 50.0, rintf); /* { dg-final { scan-tree-dump "v64sf_rintf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); /* { dg-final { scan-tree-dump "v64sf_scalbf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); /* { dg-final { scan-tree-dump "v64sf_significandf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); /* { dg-final { scan-tree-dump "v64sf_sinf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); /* { dg-final { scan-tree-dump "v64sf_sinhf" "vect" } }*/ + TEST_FUN (float, -0.1, 10000.0, sqrtf); /* { dg-final { scan-tree-dump "v64sf_sqrtf" "vect" } }*/ + TEST_FUN (float, -5.0, 5.0, tanf); /* { dg-final { scan-tree-dump "v64sf_tanf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); /* { dg-final { scan-tree-dump "v64sf_tanhf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, tgammaf); /* { dg-final { scan-tree-dump "v64sf_tgammaf" "vect" } }*/ + + TEST_FUN (double, -1.1, 1.1, acos); /* { dg-final { scan-tree-dump "v64df_acos" "vect" } }*/ + TEST_FUN (double, -10, 10, acosh); /* { dg-final { scan-tree-dump "v64df_acosh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, asin); /* { dg-final { scan-tree-dump "v64df_asin" "vect" } }*/ + TEST_FUN (double, -10, 10, asinh); /* { dg-final { scan-tree-dump "v64df_asinh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, atan); /* { dg-final { scan-tree-dump "v64df_atan" "vect" } }*/ + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); /* { dg-final { scan-tree-dump "v64df_atan2" "vect" } }*/ + TEST_FUN (double, -2.0, 2.0, atanh); /* { dg-final { scan-tree-dump "v64df_atanh" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); /* { dg-final { scan-tree-dump "v64df_copysign" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); /* { dg-final { scan-tree-dump "v64df_cos" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); /* { dg-final { scan-tree-dump "v64df_cosh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, erf); /* { dg-final { scan-tree-dump "v64df_erf" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp); /* { dg-final { scan-tree-dump "v64df_exp" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp2); /* { dg-final { scan-tree-dump "v64df_exp2" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); /* { dg-final { scan-tree-dump "v64df_fmod" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, gamma); /* { dg-final { scan-tree-dump "v64df_gamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); /* { dg-final { scan-tree-dump "v64df_hypot" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, lgamma); /* { dg-final { scan-tree-dump "v64df_lgamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN (double, -1.0, 50.0, log); /* { dg-final { scan-tree-dump "v64df_log" "vect" } }*/ + TEST_FUN (double, -1.0, 500.0, log10); /* { dg-final { scan-tree-dump "v64df_log10" "vect" } }*/ + TEST_FUN (double, -1.0, 64.0, log2); /* { dg-final { scan-tree-dump "v64df_log2" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); /* { dg-final { scan-tree-dump "v64df_pow" "vect" } }*/ + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); /* { dg-final { scan-tree-dump "v64df_remainder" "vect" } }*/ + TEST_FUN (double, -50.0, 50.0, rint); /* { dg-final { scan-tree-dump "v64df_rint" "vect" } }*/ + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); /* { dg-final { scan-tree-dump "v64df_scalb" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, __builtin_significand); /* { dg-final { scan-tree-dump "v64df_significand" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); /* { dg-final { scan-tree-dump "v64df_sin" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); /* { dg-final { scan-tree-dump "v64df_sinh" "vect" } }*/ + TEST_FUN (double, -0.1, 10000.0, sqrt); /* { dg-final { scan-tree-dump "v64df_sqrt" "vect" } }*/ + TEST_FUN (double, -5.0, 5.0, tan); /* { dg-final { scan-tree-dump "v64df_tan" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); /* { dg-final { scan-tree-dump "v64df_tanh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, tgamma); /* { dg-final { scan-tree-dump "v64df_tgamma" "vect" } }*/ + + return failed; +} diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-2.c b/gcc/testsuite/gcc.target/gcn/simd-math-2.c new file mode 100644 index 00000000000..375a2ad9263 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/simd-math-2.c @@ -0,0 +1,8 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents. */ + +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ +/* { dg-set-target-env-var "GCN_STACK_SIZE" "3000000" } */ + +#include "simd-math-1.c" diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c b/libgomp/testsuite/libgomp.c/simd-math-1.c new file mode 100644 index 00000000000..947bf606e36 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/simd-math-1.c @@ -0,0 +1,217 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents. */ + +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ +/* { dg-additional-options -foffload-options=amdgcn-amdhsa=-mstack-size=3000000 { target offload_target_amdgcn } } */ +/* { dg-additional-options -foffload-options=-lm } */ + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include <math.h> +#include <stdlib.h> + +#ifdef PRINT_RESULT + #include <stdio.h> + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int xfail = 0; +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN_XFAIL(TFLOAT, LOW, HIGH, FUN) \ + xfail = 1; \ + TEST_FUN (TFLOAT, LOW, HIGH, FUN); \ + xfail = 0; + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + failed |= (fail && !xfail); \ + if (EARLY_EXIT && failed) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); + TEST_FUN (float, -10, 10, acoshf); + TEST_FUN (float, -1.1, 1.1, asinf); + TEST_FUN (float, -10, 10, asinhf); + TEST_FUN (float, -1.1, 1.1, atanf); + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); + TEST_FUN (float, -2.0, 2.0, atanhf); + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); + TEST_FUN (float, -10.0, 10.0, erff); + TEST_FUN (float, -10.0, 10.0, expf); + TEST_FUN (float, -10.0, 10.0, exp2f); + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); + TEST_FUN (float, -10.0, 10.0, gammaf); + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); + TEST_FUN (float, -10.0, 10.0, lgammaf); + TEST_FUN (float, -1.0, 50.0, logf); + TEST_FUN (float, -1.0, 500.0, log10f); + TEST_FUN (float, -1.0, 64.0, log2f); + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); + TEST_FUN (float, -50.0, 50.0, rintf); + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); + TEST_FUN (float, -0.1, 10000.0, sqrtf); + TEST_FUN (float, -5.0, 5.0, tanf); + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); + /* Newlib's version of tgammaf is known to have poor accuracy. */ + TEST_FUN_XFAIL (float, -10.0, 10.0, tgammaf); + + TEST_FUN (double, -1.1, 1.1, acos); + TEST_FUN (double, -10, 10, acosh); + TEST_FUN (double, -1.1, 1.1, asin); + TEST_FUN (double, -10, 10, asinh); + TEST_FUN (double, -1.1, 1.1, atan); + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); + TEST_FUN (double, -2.0, 2.0, atanh); + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); + TEST_FUN (double, -10.0, 10.0, erf); + TEST_FUN (double, -10.0, 10.0, exp); + TEST_FUN (double, -10.0, 10.0, exp2); + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); + TEST_FUN (double, -10.0, 10.0, gamma); + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); + TEST_FUN (double, -10.0, 10.0, lgamma); + TEST_FUN (double, -1.0, 50.0, log); + TEST_FUN (double, -1.0, 500.0, log10); + TEST_FUN (double, -1.0, 64.0, log2); + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); + TEST_FUN (double, -50.0, 50.0, rint); + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); + TEST_FUN (double, -10.0, 10.0, __builtin_significand); + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); + TEST_FUN (double, -0.1, 10000.0, sqrt); + TEST_FUN (double, -5.0, 5.0, tan); + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); + /* Newlib's version of tgamma is known to have poor accuracy. */ + TEST_FUN_XFAIL (double, -10.0, 10.0, tgamma); + + return failed; +}

diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 4d467c8c5c1..305c65c29be 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2089,6 +2089,14 @@ mathfn_built_in (tree type, combined_fn fn) return mathfn_built_in_1 (type, fn, /*implicit=*/ 1); } +/* Like mathfn_built_in_1, but always use the explicit array. */ + +tree +mathfn_built_in_explicit (tree type, combined_fn fn) +{ + return mathfn_built_in_1 (type, fn, /*implicit=*/ 0); +} + /* Like mathfn_built_in_1, but take a built_in_function and always use the implicit array. */ diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 23ab01e75d8..d99bb63d4c0 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -53,6 +53,7 @@ #include "dwarf2.h" #include "gimple.h" #include "cgraph.h" +#include "case-cfn-macros.h" /* This file should be included last. */ #include "target-def.h" @@ -5240,6 +5241,110 @@ gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) return 0; } +tree mathfn_built_in_explicit (tree, combined_fn); + +/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. + Return the function declaration of the vectorized version of the builtin + in the math library if available. */ + +tree +gcn_vectorize_builtin_vectorized_function (unsigned int fn, tree type_out, + tree type_in) +{ + if (TREE_CODE (type_out) != VECTOR_TYPE + || TREE_CODE (type_in) != VECTOR_TYPE) + return NULL_TREE; + + machine_mode out_mode = TYPE_MODE (TREE_TYPE (type_out)); + int out_n = TYPE_VECTOR_SUBPARTS (type_out); + machine_mode in_mode = TYPE_MODE (TREE_TYPE (type_in)); + int in_n = TYPE_VECTOR_SUBPARTS (type_in); + combined_fn cfn = combined_fn (fn); + + /* Keep this consistent with the list of vectorized math routines. */ + int implicit_p; + switch (fn) + { + CASE_CFN_ACOS: + CASE_CFN_ACOSH: + CASE_CFN_ASIN: + CASE_CFN_ASINH: + CASE_CFN_ATAN: + CASE_CFN_ATAN2: + CASE_CFN_ATANH: + CASE_CFN_COPYSIGN: + CASE_CFN_COS: + CASE_CFN_COSH: + CASE_CFN_ERF: + CASE_CFN_EXP: + CASE_CFN_EXP2: + CASE_CFN_FINITE: + CASE_CFN_FMOD: + CASE_CFN_GAMMA: + CASE_CFN_HYPOT: + CASE_CFN_ISNAN: + CASE_CFN_LGAMMA: + CASE_CFN_LOG: + CASE_CFN_LOG10: + CASE_CFN_LOG2: + CASE_CFN_POW: + CASE_CFN_REMAINDER: + CASE_CFN_RINT: + CASE_CFN_SIN: + CASE_CFN_SINH: + CASE_CFN_SQRT: + CASE_CFN_TAN: + CASE_CFN_TANH: + CASE_CFN_TGAMMA: + implicit_p = 1; + break; + + CASE_CFN_SCALB: + CASE_CFN_SIGNIFICAND: + implicit_p = 0; + break; + + default: + return NULL_TREE; + } + + tree out_t_node = (out_mode == DFmode) ? double_type_node : float_type_node; + tree fndecl = implicit_p ? mathfn_built_in (out_t_node, cfn) + : mathfn_built_in_explicit (out_t_node, cfn); + + const char *bname = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + char name[20]; + sprintf (name, out_mode == DFmode ? "v%ddf_%s" : "v%dsf_%s", + out_n, bname + 10); + + unsigned arity = 0; + for (tree args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args)) + arity++; + + tree fntype = (arity == 1) + ? build_function_type_list (type_out, type_in, NULL) + : build_function_type_list (type_out, type_in, type_in, NULL); + + /* Build a function declaration for the vectorized function. */ + tree new_fndecl = build_decl (BUILTINS_LOCATION, + FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; +} + +/* Implement TARGET_LIBC_HAS_FUNCTION. */ + +bool +gcn_libc_has_function (enum function_class fn_class, + tree type) +{ + return bsd_libc_has_function (fn_class, type); +} + /* }}} */ /* {{{ md_reorg pass. */ @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl) gcn_simd_clone_compute_vecsize_and_simdlen #undef TARGET_SIMD_CLONE_USABLE #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ + gcn_vectorize_builtin_vectorized_function +#undef TARGET_LIBC_HAS_FUNCTION +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ gcn_small_register_classes_for_mode_p diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-1.c b/gcc/testsuite/gcc.target/gcn/simd-math-1.c new file mode 100644 index 00000000000..54e8761f720 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/simd-math-1.c @@ -0,0 +1,210 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents, and that the + calls to the vectorized math functions are actually emitted. */ + +/* Ideally this test should be run, but the math routines require a large + stack and gcn-run currently does not respect the stack-size parameter. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */ + + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include <math.h> +#include <stdlib.h> + +#ifdef PRINT_RESULT + #include <stdio.h> + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); /* { dg-final { scan-tree-dump "v64sf_acosf" "vect" } }*/ + TEST_FUN (float, -10, 10, acoshf); /* { dg-final { scan-tree-dump "v64sf_acoshf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, asinf); /* { dg-final { scan-tree-dump "v64sf_asinf" "vect" } }*/ + TEST_FUN (float, -10, 10, asinhf); /* { dg-final { scan-tree-dump "v64sf_asinhf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, atanf); /* { dg-final { scan-tree-dump "v64sf_atanf" "vect" } }*/ + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); /* { dg-final { scan-tree-dump "v64sf_atan2f" "vect" } }*/ + TEST_FUN (float, -2.0, 2.0, atanhf); /* { dg-final { scan-tree-dump "v64sf_atanhf" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); /* { dg-final { scan-tree-dump "v64sf_copysignf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); /* { dg-final { scan-tree-dump "v64sf_cosf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); /* { dg-final { scan-tree-dump "v64sf_coshf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, erff); /* { dg-final { scan-tree-dump "v64sf_erff" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, expf); /* { dg-final { scan-tree-dump "v64sf_expf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, exp2f); /* { dg-final { scan-tree-dump "v64sf_exp2f" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); /* { dg-final { scan-tree-dump "v64sf_fmodf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, gammaf); /* { dg-final { scan-tree-dump "v64sf_gammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); /* { dg-final { scan-tree-dump "v64sf_hypotf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, lgammaf); /* { dg-final { scan-tree-dump "v64sf_lgammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN (float, -1.0, 50.0, logf); /* { dg-final { scan-tree-dump "v64sf_logf" "vect" } }*/ + TEST_FUN (float, -1.0, 500.0, log10f); /* { dg-final { scan-tree-dump "v64sf_log10f" "vect" } }*/ + TEST_FUN (float, -1.0, 64.0, log2f); /* { dg-final { scan-tree-dump "v64sf_log2f" "vect" } }*/ + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); /* { dg-final { scan-tree-dump "v64sf_powf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); /* { dg-final { scan-tree-dump "v64sf_remainderf" "vect" } }*/ + TEST_FUN (float, -50.0, 50.0, rintf); /* { dg-final { scan-tree-dump "v64sf_rintf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); /* { dg-final { scan-tree-dump "v64sf_scalbf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); /* { dg-final { scan-tree-dump "v64sf_significandf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); /* { dg-final { scan-tree-dump "v64sf_sinf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); /* { dg-final { scan-tree-dump "v64sf_sinhf" "vect" } }*/ + TEST_FUN (float, -0.1, 10000.0, sqrtf); /* { dg-final { scan-tree-dump "v64sf_sqrtf" "vect" } }*/ + TEST_FUN (float, -5.0, 5.0, tanf); /* { dg-final { scan-tree-dump "v64sf_tanf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); /* { dg-final { scan-tree-dump "v64sf_tanhf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, tgammaf); /* { dg-final { scan-tree-dump "v64sf_tgammaf" "vect" } }*/ + + TEST_FUN (double, -1.1, 1.1, acos); /* { dg-final { scan-tree-dump "v64df_acos" "vect" } }*/ + TEST_FUN (double, -10, 10, acosh); /* { dg-final { scan-tree-dump "v64df_acosh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, asin); /* { dg-final { scan-tree-dump "v64df_asin" "vect" } }*/ + TEST_FUN (double, -10, 10, asinh); /* { dg-final { scan-tree-dump "v64df_asinh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, atan); /* { dg-final { scan-tree-dump "v64df_atan" "vect" } }*/ + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); /* { dg-final { scan-tree-dump "v64df_atan2" "vect" } }*/ + TEST_FUN (double, -2.0, 2.0, atanh); /* { dg-final { scan-tree-dump "v64df_atanh" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); /* { dg-final { scan-tree-dump "v64df_copysign" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); /* { dg-final { scan-tree-dump "v64df_cos" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); /* { dg-final { scan-tree-dump "v64df_cosh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, erf); /* { dg-final { scan-tree-dump "v64df_erf" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp); /* { dg-final { scan-tree-dump "v64df_exp" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp2); /* { dg-final { scan-tree-dump "v64df_exp2" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); /* { dg-final { scan-tree-dump "v64df_fmod" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, gamma); /* { dg-final { scan-tree-dump "v64df_gamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); /* { dg-final { scan-tree-dump "v64df_hypot" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, lgamma); /* { dg-final { scan-tree-dump "v64df_lgamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN (double, -1.0, 50.0, log); /* { dg-final { scan-tree-dump "v64df_log" "vect" } }*/ + TEST_FUN (double, -1.0, 500.0, log10); /* { dg-final { scan-tree-dump "v64df_log10" "vect" } }*/ + TEST_FUN (double, -1.0, 64.0, log2); /* { dg-final { scan-tree-dump "v64df_log2" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); /* { dg-final { scan-tree-dump "v64df_pow" "vect" } }*/ + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); /* { dg-final { scan-tree-dump "v64df_remainder" "vect" } }*/ + TEST_FUN (double, -50.0, 50.0, rint); /* { dg-final { scan-tree-dump "v64df_rint" "vect" } }*/ + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); /* { dg-final { scan-tree-dump "v64df_scalb" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, __builtin_significand); /* { dg-final { scan-tree-dump "v64df_significand" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); /* { dg-final { scan-tree-dump "v64df_sin" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); /* { dg-final { scan-tree-dump "v64df_sinh" "vect" } }*/ + TEST_FUN (double, -0.1, 10000.0, sqrt); /* { dg-final { scan-tree-dump "v64df_sqrt" "vect" } }*/ + TEST_FUN (double, -5.0, 5.0, tan); /* { dg-final { scan-tree-dump "v64df_tan" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); /* { dg-final { scan-tree-dump "v64df_tanh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, tgamma); /* { dg-final { scan-tree-dump "v64df_tgamma" "vect" } }*/ + + return failed; +} diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c b/libgomp/testsuite/libgomp.c/simd-math-1.c new file mode 100644 index 00000000000..947bf606e36 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/simd-math-1.c @@ -0,0 +1,217 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents. */ + +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ +/* { dg-additional-options -foffload-options=amdgcn-amdhsa=-mstack-size=3000000 { target offload_target_amdgcn } } */ +/* { dg-additional-options -foffload-options=-lm } */ + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include <math.h> +#include <stdlib.h> + +#ifdef PRINT_RESULT + #include <stdio.h> + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int xfail = 0; +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN_XFAIL(TFLOAT, LOW, HIGH, FUN) \ + xfail = 1; \ + TEST_FUN (TFLOAT, LOW, HIGH, FUN); \ + xfail = 0; + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + failed |= (fail && !xfail); \ + if (EARLY_EXIT && failed) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); + TEST_FUN (float, -10, 10, acoshf); + TEST_FUN (float, -1.1, 1.1, asinf); + TEST_FUN (float, -10, 10, asinhf); + TEST_FUN (float, -1.1, 1.1, atanf); + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); + TEST_FUN (float, -2.0, 2.0, atanhf); + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); + TEST_FUN (float, -10.0, 10.0, erff); + TEST_FUN (float, -10.0, 10.0, expf); + TEST_FUN (float, -10.0, 10.0, exp2f); + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); + TEST_FUN (float, -10.0, 10.0, gammaf); + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); + TEST_FUN (float, -10.0, 10.0, lgammaf); + TEST_FUN (float, -1.0, 50.0, logf); + TEST_FUN (float, -1.0, 500.0, log10f); + TEST_FUN (float, -1.0, 64.0, log2f); + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); + TEST_FUN (float, -50.0, 50.0, rintf); + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); + TEST_FUN (float, -0.1, 10000.0, sqrtf); + TEST_FUN (float, -5.0, 5.0, tanf); + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); + /* Newlib's version of tgammaf is known to have poor accuracy. */ + TEST_FUN_XFAIL (float, -10.0, 10.0, tgammaf); + + TEST_FUN (double, -1.1, 1.1, acos); + TEST_FUN (double, -10, 10, acosh); + TEST_FUN (double, -1.1, 1.1, asin); + TEST_FUN (double, -10, 10, asinh); + TEST_FUN (double, -1.1, 1.1, atan); + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); + TEST_FUN (double, -2.0, 2.0, atanh); + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); + TEST_FUN (double, -10.0, 10.0, erf); + TEST_FUN (double, -10.0, 10.0, exp); + TEST_FUN (double, -10.0, 10.0, exp2); + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); + TEST_FUN (double, -10.0, 10.0, gamma); + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); + TEST_FUN (double, -10.0, 10.0, lgamma); + TEST_FUN (double, -1.0, 50.0, log); + TEST_FUN (double, -1.0, 500.0, log10); + TEST_FUN (double, -1.0, 64.0, log2); + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); + TEST_FUN (double, -50.0, 50.0, rint); + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); + TEST_FUN (double, -10.0, 10.0, __builtin_significand); + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); + TEST_FUN (double, -0.1, 10000.0, sqrt); + TEST_FUN (double, -5.0, 5.0, tan); + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); + /* Newlib's version of tgamma is known to have poor accuracy. */ + TEST_FUN_XFAIL (double, -10.0, 10.0, tgamma); + + return failed; +}

amdgcn: Enable SIMD vectorization of math functions

Commit Message

Comments

Patch