From patchwork Tue Feb 28 23:01:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kwok Cheung Yeung X-Patchwork-Id: 1749707 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PRCYB3pFLz1yWy for ; Wed, 1 Mar 2023 10:01:49 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 402EF385802F for ; Tue, 28 Feb 2023 23:01:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 46B3C3858D33 for ; Tue, 28 Feb 2023 23:01:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 46B3C3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.98,223,1673942400"; d="scan'208,223";a="99858896" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 28 Feb 2023 15:01:26 -0800 IronPort-SDR: GOR4QyOhb9nG1umZ7o9qJkQC6D4/544M14F3216cd6f6etKt4lWICJqI64OmqIOmydAuGwTqQu n7MVADz+ZxO0ALU1DIXosiN8eiqBfjKTaTNJnym8HYymPbKP0K/x02h4exyhLy03sfeU+DZzit n9n47i8yorddqdzeJIZLWeqPLQe+c6bWvtVesYF26B+6CztzFIWLUsIS56ACLr3vE7gBJH49vu jYF7hd6hcdVtd79Ta5S72VMLAROXYorSt+jqsm8kUt95UOe8g9pVyBNn24hIHJ+UcugaRnVBEF 5wQ= Message-ID: Date: Tue, 28 Feb 2023 23:01:20 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 From: Kwok Cheung Yeung Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions To: gcc-patches , X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hello This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION target hook for the AMD GCN architecture, such that when vectorized, calls to builtin standard math functions such as asinf, exp, pow etc. are converted to calls to the recently added vectorized math functions for GCN in Newlib. The -fno-math-errno flag is required in addition to the usual vectorization optimization flags for this to occur, and some of the math functions (the larger double-precision ones) require a large stack size to function properly. This patch requires the GCN vector math functions in Newlib to function - these were included in the recent 4.3.0.20230120 snapshot. As this was a minimum requirement starting from the patch 'amdgcn, libgomp: Manually allocated stacks', this should not be a problem. I have added new testcases in the testsuite that compare the output of the vectorized math functions against the scalar, passing if they are sufficiently close. With the testcase for standalone GCN (without libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently cannot set the stack size correctly in DejaGnu testing, so I have made it a compile test for now - it is still useful to check that calls to the correct functions are being made. The runtime correctness is still covered by the libgomp test. Okay for trunk? Thanks Kwok From 69d13dc898ff7c70e80299a92dc895a89a9e679b Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Tue, 28 Feb 2023 14:15:47 +0000 Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions Calls to vectorized versions of routines in the math library will now be inserted when vectorizing code containing supported math functions. 2023-02-28 Kwok Cheung Yeung Paul-Antoine Arras gcc/ * builtins.cc (mathfn_built_in_explicit): New. * config/gcn/gcn.cc: Include case-cfn-macros.h. (mathfn_built_in_explicit): Add prototype. (gcn_vectorize_builtin_vectorized_function): New. (gcn_libc_has_function): New. (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define. (TARGET_LIBC_HAS_FUNCTION): Define. gcc/testsuite/ * gcc.target/gcn/simd-math-1.c: New testcase. libgomp/ * testsuite/libgomp.c/simd-math-1.c: New testcase. --- gcc/builtins.cc | 8 + gcc/config/gcn/gcn.cc | 110 +++++++++++ gcc/testsuite/gcc.target/gcn/simd-math-1.c | 210 ++++++++++++++++++++ libgomp/testsuite/libgomp.c/simd-math-1.c | 217 +++++++++++++++++++++ 4 files changed, 545 insertions(+) create mode 100644 gcc/testsuite/gcc.target/gcn/simd-math-1.c create mode 100644 libgomp/testsuite/libgomp.c/simd-math-1.c diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 4d467c8c5c1..305c65c29be 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -2089,6 +2089,14 @@ mathfn_built_in (tree type, combined_fn fn) return mathfn_built_in_1 (type, fn, /*implicit=*/ 1); } +/* Like mathfn_built_in_1, but always use the explicit array. */ + +tree +mathfn_built_in_explicit (tree type, combined_fn fn) +{ + return mathfn_built_in_1 (type, fn, /*implicit=*/ 0); +} + /* Like mathfn_built_in_1, but take a built_in_function and always use the implicit array. */ diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 23ab01e75d8..d99bb63d4c0 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -53,6 +53,7 @@ #include "dwarf2.h" #include "gimple.h" #include "cgraph.h" +#include "case-cfn-macros.h" /* This file should be included last. */ #include "target-def.h" @@ -5240,6 +5241,110 @@ gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) return 0; } +tree mathfn_built_in_explicit (tree, combined_fn); + +/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. + Return the function declaration of the vectorized version of the builtin + in the math library if available. */ + +tree +gcn_vectorize_builtin_vectorized_function (unsigned int fn, tree type_out, + tree type_in) +{ + if (TREE_CODE (type_out) != VECTOR_TYPE + || TREE_CODE (type_in) != VECTOR_TYPE) + return NULL_TREE; + + machine_mode out_mode = TYPE_MODE (TREE_TYPE (type_out)); + int out_n = TYPE_VECTOR_SUBPARTS (type_out); + machine_mode in_mode = TYPE_MODE (TREE_TYPE (type_in)); + int in_n = TYPE_VECTOR_SUBPARTS (type_in); + combined_fn cfn = combined_fn (fn); + + /* Keep this consistent with the list of vectorized math routines. */ + int implicit_p; + switch (fn) + { + CASE_CFN_ACOS: + CASE_CFN_ACOSH: + CASE_CFN_ASIN: + CASE_CFN_ASINH: + CASE_CFN_ATAN: + CASE_CFN_ATAN2: + CASE_CFN_ATANH: + CASE_CFN_COPYSIGN: + CASE_CFN_COS: + CASE_CFN_COSH: + CASE_CFN_ERF: + CASE_CFN_EXP: + CASE_CFN_EXP2: + CASE_CFN_FINITE: + CASE_CFN_FMOD: + CASE_CFN_GAMMA: + CASE_CFN_HYPOT: + CASE_CFN_ISNAN: + CASE_CFN_LGAMMA: + CASE_CFN_LOG: + CASE_CFN_LOG10: + CASE_CFN_LOG2: + CASE_CFN_POW: + CASE_CFN_REMAINDER: + CASE_CFN_RINT: + CASE_CFN_SIN: + CASE_CFN_SINH: + CASE_CFN_SQRT: + CASE_CFN_TAN: + CASE_CFN_TANH: + CASE_CFN_TGAMMA: + implicit_p = 1; + break; + + CASE_CFN_SCALB: + CASE_CFN_SIGNIFICAND: + implicit_p = 0; + break; + + default: + return NULL_TREE; + } + + tree out_t_node = (out_mode == DFmode) ? double_type_node : float_type_node; + tree fndecl = implicit_p ? mathfn_built_in (out_t_node, cfn) + : mathfn_built_in_explicit (out_t_node, cfn); + + const char *bname = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + char name[20]; + sprintf (name, out_mode == DFmode ? "v%ddf_%s" : "v%dsf_%s", + out_n, bname + 10); + + unsigned arity = 0; + for (tree args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args)) + arity++; + + tree fntype = (arity == 1) + ? build_function_type_list (type_out, type_in, NULL) + : build_function_type_list (type_out, type_in, type_in, NULL); + + /* Build a function declaration for the vectorized function. */ + tree new_fndecl = build_decl (BUILTINS_LOCATION, + FUNCTION_DECL, get_identifier (name), fntype); + TREE_PUBLIC (new_fndecl) = 1; + DECL_EXTERNAL (new_fndecl) = 1; + DECL_IS_NOVOPS (new_fndecl) = 1; + TREE_READONLY (new_fndecl) = 1; + + return new_fndecl; +} + +/* Implement TARGET_LIBC_HAS_FUNCTION. */ + +bool +gcn_libc_has_function (enum function_class fn_class, + tree type) +{ + return bsd_libc_has_function (fn_class, type); +} + /* }}} */ /* {{{ md_reorg pass. */ @@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl) gcn_simd_clone_compute_vecsize_and_simdlen #undef TARGET_SIMD_CLONE_USABLE #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \ + gcn_vectorize_builtin_vectorized_function +#undef TARGET_LIBC_HAS_FUNCTION +#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ gcn_small_register_classes_for_mode_p diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-1.c b/gcc/testsuite/gcc.target/gcn/simd-math-1.c new file mode 100644 index 00000000000..54e8761f720 --- /dev/null +++ b/gcc/testsuite/gcc.target/gcn/simd-math-1.c @@ -0,0 +1,210 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents, and that the + calls to the vectorized math functions are actually emitted. */ + +/* Ideally this test should be run, but the math routines require a large + stack and gcn-run currently does not respect the stack-size parameter. */ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */ + + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include +#include + +#ifdef PRINT_RESULT + #include + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); /* { dg-final { scan-tree-dump "v64sf_acosf" "vect" } }*/ + TEST_FUN (float, -10, 10, acoshf); /* { dg-final { scan-tree-dump "v64sf_acoshf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, asinf); /* { dg-final { scan-tree-dump "v64sf_asinf" "vect" } }*/ + TEST_FUN (float, -10, 10, asinhf); /* { dg-final { scan-tree-dump "v64sf_asinhf" "vect" } }*/ + TEST_FUN (float, -1.1, 1.1, atanf); /* { dg-final { scan-tree-dump "v64sf_atanf" "vect" } }*/ + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); /* { dg-final { scan-tree-dump "v64sf_atan2f" "vect" } }*/ + TEST_FUN (float, -2.0, 2.0, atanhf); /* { dg-final { scan-tree-dump "v64sf_atanhf" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); /* { dg-final { scan-tree-dump "v64sf_copysignf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); /* { dg-final { scan-tree-dump "v64sf_cosf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); /* { dg-final { scan-tree-dump "v64sf_coshf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, erff); /* { dg-final { scan-tree-dump "v64sf_erff" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, expf); /* { dg-final { scan-tree-dump "v64sf_expf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, exp2f); /* { dg-final { scan-tree-dump "v64sf_exp2f" "vect" } }*/ + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); /* { dg-final { scan-tree-dump "v64sf_fmodf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, gammaf); /* { dg-final { scan-tree-dump "v64sf_gammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); /* { dg-final { scan-tree-dump "v64sf_hypotf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, lgammaf); /* { dg-final { scan-tree-dump "v64sf_lgammaf" "vect" { xfail *-*-*} } }*/ + TEST_FUN (float, -1.0, 50.0, logf); /* { dg-final { scan-tree-dump "v64sf_logf" "vect" } }*/ + TEST_FUN (float, -1.0, 500.0, log10f); /* { dg-final { scan-tree-dump "v64sf_log10f" "vect" } }*/ + TEST_FUN (float, -1.0, 64.0, log2f); /* { dg-final { scan-tree-dump "v64sf_log2f" "vect" } }*/ + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); /* { dg-final { scan-tree-dump "v64sf_powf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); /* { dg-final { scan-tree-dump "v64sf_remainderf" "vect" } }*/ + TEST_FUN (float, -50.0, 50.0, rintf); /* { dg-final { scan-tree-dump "v64sf_rintf" "vect" } }*/ + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); /* { dg-final { scan-tree-dump "v64sf_scalbf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); /* { dg-final { scan-tree-dump "v64sf_significandf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); /* { dg-final { scan-tree-dump "v64sf_sinf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); /* { dg-final { scan-tree-dump "v64sf_sinhf" "vect" } }*/ + TEST_FUN (float, -0.1, 10000.0, sqrtf); /* { dg-final { scan-tree-dump "v64sf_sqrtf" "vect" } }*/ + TEST_FUN (float, -5.0, 5.0, tanf); /* { dg-final { scan-tree-dump "v64sf_tanf" "vect" } }*/ + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); /* { dg-final { scan-tree-dump "v64sf_tanhf" "vect" } }*/ + TEST_FUN (float, -10.0, 10.0, tgammaf); /* { dg-final { scan-tree-dump "v64sf_tgammaf" "vect" } }*/ + + TEST_FUN (double, -1.1, 1.1, acos); /* { dg-final { scan-tree-dump "v64df_acos" "vect" } }*/ + TEST_FUN (double, -10, 10, acosh); /* { dg-final { scan-tree-dump "v64df_acosh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, asin); /* { dg-final { scan-tree-dump "v64df_asin" "vect" } }*/ + TEST_FUN (double, -10, 10, asinh); /* { dg-final { scan-tree-dump "v64df_asinh" "vect" } }*/ + TEST_FUN (double, -1.1, 1.1, atan); /* { dg-final { scan-tree-dump "v64df_atan" "vect" } }*/ + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); /* { dg-final { scan-tree-dump "v64df_atan2" "vect" } }*/ + TEST_FUN (double, -2.0, 2.0, atanh); /* { dg-final { scan-tree-dump "v64df_atanh" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); /* { dg-final { scan-tree-dump "v64df_copysign" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); /* { dg-final { scan-tree-dump "v64df_cos" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); /* { dg-final { scan-tree-dump "v64df_cosh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, erf); /* { dg-final { scan-tree-dump "v64df_erf" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp); /* { dg-final { scan-tree-dump "v64df_exp" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, exp2); /* { dg-final { scan-tree-dump "v64df_exp2" "vect" } }*/ + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); /* { dg-final { scan-tree-dump "v64df_fmod" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, gamma); /* { dg-final { scan-tree-dump "v64df_gamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); /* { dg-final { scan-tree-dump "v64df_hypot" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, lgamma); /* { dg-final { scan-tree-dump "v64df_lgamma" "vect" { xfail *-*-*} } }*/ + TEST_FUN (double, -1.0, 50.0, log); /* { dg-final { scan-tree-dump "v64df_log" "vect" } }*/ + TEST_FUN (double, -1.0, 500.0, log10); /* { dg-final { scan-tree-dump "v64df_log10" "vect" } }*/ + TEST_FUN (double, -1.0, 64.0, log2); /* { dg-final { scan-tree-dump "v64df_log2" "vect" { xfail *-*-*} } }*/ + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); /* { dg-final { scan-tree-dump "v64df_pow" "vect" } }*/ + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); /* { dg-final { scan-tree-dump "v64df_remainder" "vect" } }*/ + TEST_FUN (double, -50.0, 50.0, rint); /* { dg-final { scan-tree-dump "v64df_rint" "vect" } }*/ + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); /* { dg-final { scan-tree-dump "v64df_scalb" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, __builtin_significand); /* { dg-final { scan-tree-dump "v64df_significand" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); /* { dg-final { scan-tree-dump "v64df_sin" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); /* { dg-final { scan-tree-dump "v64df_sinh" "vect" } }*/ + TEST_FUN (double, -0.1, 10000.0, sqrt); /* { dg-final { scan-tree-dump "v64df_sqrt" "vect" } }*/ + TEST_FUN (double, -5.0, 5.0, tan); /* { dg-final { scan-tree-dump "v64df_tan" "vect" } }*/ + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); /* { dg-final { scan-tree-dump "v64df_tanh" "vect" } }*/ + TEST_FUN (double, -10.0, 10.0, tgamma); /* { dg-final { scan-tree-dump "v64df_tgamma" "vect" } }*/ + + return failed; +} diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c b/libgomp/testsuite/libgomp.c/simd-math-1.c new file mode 100644 index 00000000000..947bf606e36 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/simd-math-1.c @@ -0,0 +1,217 @@ +/* Check that the SIMD versions of math routines give the same (or + sufficiently close) results as their scalar equivalents. */ + +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */ +/* { dg-additional-options -foffload-options=amdgcn-amdhsa=-mstack-size=3000000 { target offload_target_amdgcn } } */ +/* { dg-additional-options -foffload-options=-lm } */ + +#undef PRINT_RESULT +#define VERBOSE 0 +#define EARLY_EXIT 1 + +#include +#include + +#ifdef PRINT_RESULT + #include + #define PRINTF printf +#else + static void null_printf (const char *f, ...) { } + + #define PRINTF null_printf +#endif + +#define N 512 +#define EPSILON_float 1e-5 +#define EPSILON_double 1e-10 + +static int xfail = 0; +static int failed = 0; + +int deviation_float (float x, float y) +{ + union { + float f; + unsigned u; + } u, v; + + u.f = x; + v.f = y; + + unsigned mask = 0x80000000U; + int i; + + for (i = 32; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +int deviation_double (double x, double y) +{ + union { + double d; + unsigned long long u; + } u, v; + + u.d = x; + v.d = y; + + unsigned long long mask = 0x8000000000000000ULL; + int i; + + for (i = 64; i > 0; i--) + if ((u.u ^ v.u) & mask) + break; + else + mask >>= 1; + + return i; +} + +#define TEST_FUN_XFAIL(TFLOAT, LOW, HIGH, FUN) \ + xfail = 1; \ + TEST_FUN (TFLOAT, LOW, HIGH, FUN); \ + xfail = 0; + +#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \ +{ \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + failed |= (fail && !xfail); \ + if (EARLY_EXIT && failed) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N]; \ + for (int i = 0; i < N; i++) \ + a[i] = LOW + ((HIGH - LOW) / N) * i; \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i]); \ + check_##FUN (res, a); \ +}\ +test_##FUN (); + +#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \ +__attribute__((optimize("no-tree-vectorize"))) \ +__attribute__((optimize("no-unsafe-math-optimizations"))) \ +void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \ +{ \ + int failed = 0; \ + for (int i = 0; i < N; i++) { \ + TFLOAT expected = FUN (a[i], b[i]); \ + TFLOAT diff = __builtin_fabs (expected - res[i]); \ + int deviation = deviation_##TFLOAT (expected, res[i]); \ + int fail = isnan (res[i]) != isnan (expected) \ + || isinf (res[i]) != isinf (expected) \ + || (diff > EPSILON_##TFLOAT && deviation > 10); \ + failed |= fail; \ + if (VERBOSE || fail) \ + PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \ + a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \ + if (EARLY_EXIT && fail) \ + exit (1); \ + } \ +} \ +void test_##FUN (void) \ +{ \ + TFLOAT res[N], a[N], b[N]; \ + for (int i = 0; i < N; i++) { \ + a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \ + b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \ + } \ + _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \ + for (int i = 0; i < N; i++) \ + res[i] = FUN (a[i], b[i]); \ + check_##FUN (res, a, b); \ +}\ +test_##FUN (); + +int main (void) +{ + TEST_FUN (float, -1.1, 1.1, acosf); + TEST_FUN (float, -10, 10, acoshf); + TEST_FUN (float, -1.1, 1.1, asinf); + TEST_FUN (float, -10, 10, asinhf); + TEST_FUN (float, -1.1, 1.1, atanf); + TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); + TEST_FUN (float, -2.0, 2.0, atanhf); + TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); + TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); + TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); + TEST_FUN (float, -10.0, 10.0, erff); + TEST_FUN (float, -10.0, 10.0, expf); + TEST_FUN (float, -10.0, 10.0, exp2f); + TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); + TEST_FUN (float, -10.0, 10.0, gammaf); + TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); + TEST_FUN (float, -10.0, 10.0, lgammaf); + TEST_FUN (float, -1.0, 50.0, logf); + TEST_FUN (float, -1.0, 500.0, log10f); + TEST_FUN (float, -1.0, 64.0, log2f); + TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); + TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); + TEST_FUN (float, -50.0, 50.0, rintf); + TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); + TEST_FUN (float, -10.0, 10.0, __builtin_significandf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); + TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); + TEST_FUN (float, -0.1, 10000.0, sqrtf); + TEST_FUN (float, -5.0, 5.0, tanf); + TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); + /* Newlib's version of tgammaf is known to have poor accuracy. */ + TEST_FUN_XFAIL (float, -10.0, 10.0, tgammaf); + + TEST_FUN (double, -1.1, 1.1, acos); + TEST_FUN (double, -10, 10, acosh); + TEST_FUN (double, -1.1, 1.1, asin); + TEST_FUN (double, -10, 10, asinh); + TEST_FUN (double, -1.1, 1.1, atan); + TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); + TEST_FUN (double, -2.0, 2.0, atanh); + TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); + TEST_FUN (double, -3.14159265359, 3.14159265359, cos); + TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); + TEST_FUN (double, -10.0, 10.0, erf); + TEST_FUN (double, -10.0, 10.0, exp); + TEST_FUN (double, -10.0, 10.0, exp2); + TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); + TEST_FUN (double, -10.0, 10.0, gamma); + TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); + TEST_FUN (double, -10.0, 10.0, lgamma); + TEST_FUN (double, -1.0, 50.0, log); + TEST_FUN (double, -1.0, 500.0, log10); + TEST_FUN (double, -1.0, 64.0, log2); + TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); + TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); + TEST_FUN (double, -50.0, 50.0, rint); + TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); + TEST_FUN (double, -10.0, 10.0, __builtin_significand); + TEST_FUN (double, -3.14159265359, 3.14159265359, sin); + TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); + TEST_FUN (double, -0.1, 10000.0, sqrt); + TEST_FUN (double, -5.0, 5.0, tan); + TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); + /* Newlib's version of tgamma is known to have poor accuracy. */ + TEST_FUN_XFAIL (double, -10.0, 10.0, tgamma); + + return failed; +}