From patchwork Thu Apr 26 11:49:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 905027 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-91869-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="hKLWpinM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40WwNH4M2Pz9rxs for ; Thu, 26 Apr 2018 21:49:51 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:from:date:message-id:subject:to :cc:content-type; q=dns; s=default; b=hodgVrkuOEymdXt4usQLbkFYxm rOvARlNdit8uPTZwXhpy8p02QUvIn2geaTfmaDdWnVkxiiaXP5PHqM6Kowp7IAlk AI7HqIrQuiGCuQzttTCKIuYUzJCusC67Ef0RDCNHPSv7je4P0IFQc7yFzLjxHqdO yCxqrkSJkKjNO+/jU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:from:date:message-id:subject:to :cc:content-type; s=default; bh=r8D6nTaVOcf618YoUc4LMGMrRmE=; b= hKLWpinMFP8RYNWb+VttKdQkm7H92LLiPFjpG+eZFAi2Sp3G8Hbuz+4Bmp6Dp866 ETI0EXhYnJ11i/4vH6SiFZOGJmjQYA4gHUQ5bUahvyJg++Psaq48WotQRcPJVk6n qGXs2VdPC46OtWuRflsEvPOsVvm9a8RfBOmWkj9iPk4= Received: (qmail 6254 invoked by alias); 26 Apr 2018 11:49:45 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 6226 invoked by uid 89); 26 Apr 2018 11:49:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=homegrown, hello!, HX-Google-Smtp-Source:AB8JxZr, sinh X-HELO: mail-io0-f196.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=XNz1HEne+LXmPajl6u7Z2ILg964nwjHZcOROsxG4huU=; b=tgOxWzZizc7fl/p1x2bTyQSsOqpcMuyYzQO2PjmGkjQT6AORF5BDzwt+KZ3vtq2Wzx WCIgXw90gHCTl9V9j8TvoI0V4IstwoqhcDAiQCG9HJKTpGLsJzc8pPqCcB6WJOfeUgIV Eukbr/zgCKkYIO3cnkc6I7T8uMZcIBT++hvr7YA+y3c9B2b2lutURJJR4bQjaHnLkSzw iEqCYdoVL0ZuEYbAXT7rb2dHSOCsbifhYe0dyBNV2CMF6lIWdQ363yMGB77Gny1mQqNo 88TK8lNyQ4wtogGOB1w8TMXnVFG7FAB5YzmNblYOcYNxgPQtt7vlNvM5BI3n4U2gzUlO k8cg== X-Gm-Message-State: ALQs6tBCNR0y9y7DiEegpQrMru1bfyo+9XvCaPrP5IWWV3e4D1LqDCN7 YQ8F18fK36IT7qbiJSv+sghzGotg5UkXTVToykwXpxig X-Google-Smtp-Source: AB8JxZr/uONEC3RDzLbXI34cnF26jh+zjQMxNNq8Jq60aU2m8w71fRqHvRnL/6iua+nIHAebbmZ5xP5LucxzOH6P3a0= X-Received: by 2002:a6b:dcd:: with SMTP id 196-v6mr33323866ion.92.1524743381469; Thu, 26 Apr 2018 04:49:41 -0700 (PDT) MIME-Version: 1.0 From: Uros Bizjak Date: Thu, 26 Apr 2018 13:49:40 +0200 Message-ID: Subject: [PATCH] Cleanup FAST_MATH misc functions from sysdeps/x86/fpu/bits/mathinline.h To: libc-alpha@sourceware.org Cc: "Joseph S. Myers" Hello! Attached patch cleans up FAST_MATH misc functions from sysdeps/x86/fpu/bits/mathinline.h. The patch uses x87 __builtin functions that are always defined for gcc-4.6+, uses __builtin_copysign instead of homegrown __sgn function to avoid partial memory stalls. Since we are in FAST_MATH, we don't need to handle the sign of zero, so the patch removes - return __temp ? __temp : __x from the definition of expm1. Patch was tested on x86_64-linux-gnu, and the resulting mathinline.h was used to calculate test results of all changed functions. Please note I have no write access to glibc repository. Uros. 2017-04-26 Uros Bizjak * sysdeps/x86/fpu/bits/mathinline.h [__FAST_MATH__] (__expm1_code): Remove define and undefine. [__FAST_MATH__] (__expm1l): Remove inline function. [__FAST_MATH__] (__exp_code): Remove define and undefine. [__FAST_MATH__] (exp): Remove inline function. [__FAST_MATH__] (__expl): Remove inline function. [__FAST_MATH__] (__libc_sqrtl): Remove define. (fabs): Remove define. (fabsf): Remove define. (fabsl): Remove define. (__fabsl): Remove define. (__sgn1l): Remove define. [__FAST_MATH__] (sinh): Rewrite using __builtin functions. Use __builtin_copysign to calculate sign of the result. [__FAST_MATH__] (cosh): Rewrite using __builtin functions. [__FAST_MATH__] (tanh): Rewrite using __builtin functions. Use __builtin_copysign to calculate sign of the result. [__USE_ISOC99 && __FAST_MATH__] (expm1): Remove inline function. [__USE_ISOC99 && __FAST_MATH__] (asinh): Rewrite using __builtin functions. Use __builtin_copysign to calculate sign of the result. [__USE_ISOC99 && __FAST_MATH__] (acosh): Rewrite using __builtin functions. [__USE_ISOC99 && __FAST_MATH__] (atanh): Rewrite using __builtin functions. Use __builtin_copysign to calculate sign of the result. [__USE_ISOC99 && __FAST_MATH__] (hypot): Rewrite using __builtin functions. Signed-off-by: Uros Bizjak diff --git a/sysdeps/x86/fpu/bits/mathinline.h b/sysdeps/x86/fpu/bits/mathinline.h index 91ece8dfb87..124004494cd 100644 --- a/sysdeps/x86/fpu/bits/mathinline.h +++ b/sysdeps/x86/fpu/bits/mathinline.h @@ -170,149 +170,60 @@ # if !defined __NO_MATH_INLINES && defined __OPTIMIZE__ + /* Miscellaneous functions */ /* __FAST_MATH__ is defined by gcc -ffast-math. */ # ifdef __FAST_MATH__ + /* Optimized inline implementation, sometimes with reduced precision and/or argument range. */ -# if __GNUC_PREREQ (3, 5) -# define __expm1_code \ - register long double __temp; \ - __temp = __builtin_expm1l (__x); \ - return __temp ? __temp : __x -# else -# define __expm1_code \ - register long double __value; \ - register long double __exponent; \ - register long double __temp; \ - __asm __volatile__ \ - ("fldl2e # e^x - 1 = 2^(x * log2(e)) - 1\n\t" \ - "fmul %%st(1) # x * log2(e)\n\t" \ - "fst %%st(1)\n\t" \ - "frndint # int(x * log2(e))\n\t" \ - "fxch\n\t" \ - "fsub %%st(1) # fract(x * log2(e))\n\t" \ - "f2xm1 # 2^(fract(x * log2(e))) - 1\n\t" \ - "fscale # 2^(x * log2(e)) - 2^(int(x * log2(e)))\n\t" \ - : "=t" (__value), "=u" (__exponent) : "0" (__x)); \ - __asm __volatile__ \ - ("fscale # 2^int(x * log2(e))\n\t" \ - : "=t" (__temp) : "0" (1.0), "u" (__exponent)); \ - __temp -= 1.0; \ - __temp += __value; \ - return __temp ? __temp : __x -# endif -__inline_mathcodeNP_ (long double, __expm1l, __x, __expm1_code) - -# if __GNUC_PREREQ (3, 4) -__inline_mathcodeNP_ (long double, __expl, __x, return __builtin_expl (__x)) -# else -# define __exp_code \ - register long double __value; \ - register long double __exponent; \ - __asm __volatile__ \ - ("fldl2e # e^x = 2^(x * log2(e))\n\t" \ - "fmul %%st(1) # x * log2(e)\n\t" \ - "fst %%st(1)\n\t" \ - "frndint # int(x * log2(e))\n\t" \ - "fxch\n\t" \ - "fsub %%st(1) # fract(x * log2(e))\n\t" \ - "f2xm1 # 2^(fract(x * log2(e))) - 1\n\t" \ - : "=t" (__value), "=u" (__exponent) : "0" (__x)); \ - __value += 1.0; \ - __asm __volatile__ \ - ("fscale" \ - : "=t" (__value) : "0" (__value), "u" (__exponent)); \ - return __value -__inline_mathcodeNP (exp, __x, __exp_code) -__inline_mathcodeNP_ (long double, __expl, __x, __exp_code) -# endif -# endif /* __FAST_MATH__ */ - - -# ifdef __FAST_MATH__ -# if !__GNUC_PREREQ (3,3) -__inline_mathopNP (sqrt, "fsqrt") -__inline_mathopNP_ (long double, __sqrtl, "fsqrt") -# define __libc_sqrtl(n) __sqrtl (n) -# else -# define __libc_sqrtl(n) __builtin_sqrtl (n) -# endif -# endif - -# if __GNUC_PREREQ (2, 8) -__inline_mathcodeNP_ (double, fabs, __x, return __builtin_fabs (__x)) -# ifdef __USE_ISOC99 -__inline_mathcodeNP_ (float, fabsf, __x, return __builtin_fabsf (__x)) -__inline_mathcodeNP_ (long double, fabsl, __x, return __builtin_fabsl (__x)) -# endif -__inline_mathcodeNP_ (long double, __fabsl, __x, return __builtin_fabsl (__x)) -# else -__inline_mathop (fabs, "fabs") -__inline_mathop_ (long double, __fabsl, "fabs") -# endif - -__inline_mathcode_ (long double, __sgn1l, __x, \ - __extension__ union { long double __xld; unsigned int __xi[3]; } __n = \ - { __xld: __x }; \ - __n.__xi[2] = (__n.__xi[2] & 0x8000) | 0x3fff; \ - __n.__xi[1] = 0x80000000; \ - __n.__xi[0] = 0; \ - return __n.__xld) - - -# ifdef __FAST_MATH__ /* The argument range of the inline version of sinhl is slightly reduced. */ -__inline_mathcodeNP (sinh, __x, \ - register long double __exm1 = __expm1l (__fabsl (__x)); \ - return 0.5 * (__exm1 / (__exm1 + 1.0) + __exm1) * __sgn1l (__x)) +__inline_mathcodeNP (sinh, __x, \ + long double __exm1 = __builtin_expm1l (__builtin_fabsl (__x)); \ + long double __temp = 0.5l * (__exm1 / (__exm1 + 1.0l) + __exm1); \ + return __builtin_copysignl (__temp, __x)) __inline_mathcodeNP (cosh, __x, \ - register long double __ex = __expl (__x); \ - return 0.5 * (__ex + 1.0 / __ex)) + long double __ex = __builtin_expl (__x); \ + return 0.5l * (__ex + 1.0l / __ex)) __inline_mathcodeNP (tanh, __x, \ - register long double __exm1 = __expm1l (-__fabsl (__x + __x)); \ - return __exm1 / (__exm1 + 2.0) * __sgn1l (-__x)) -# endif + long double __exm1 = __builtin_expm1l (-__builtin_fabsl (__x + __x)); \ + long double __temp = __exm1 / (__exm1 + 2.0l); \ + return __builtin_copysignl (__temp, __x)) -/* Optimized versions for some non-standardized functions. */ -# ifdef __USE_ISOC99 - -# ifdef __FAST_MATH__ -__inline_mathcodeNP (expm1, __x, __expm1_code) +# ifdef __USE_ISOC99 /* The argument range of the inline version of asinhl is slightly reduced. */ __inline_mathcodeNP (asinh, __x, \ - register long double __y = __fabsl (__x); \ - return (log1pl (__y * __y / (__libc_sqrtl (__y * __y + 1.0) + 1.0) + __y) \ - * __sgn1l (__x))) + long double __y = __builtin_fabsl (__x); \ + long double __y2 = __y * __y; + long double __temp \ + = __builtin_log1pl (__y2 / (__builtin_sqrtl (__y2 + 1.0l) + 1.0l) + __y); \ + return __builtin_copysignl (__temp, __x)) __inline_mathcodeNP (acosh, __x, \ - return logl (__x + __libc_sqrtl (__x - 1.0) * __libc_sqrtl (__x + 1.0))) + long double __temp \ + = __builtin_sqrtl (__x - 1.0l) * __builtin_sqrtl (__x + 1.0l); \ + return __builtin_logl (__x + __temp)) __inline_mathcodeNP (atanh, __x, \ - register long double __y = __fabsl (__x); \ - return -0.5 * log1pl (-(__y + __y) / (1.0 + __y)) * __sgn1l (__x)) + long double __y = __builtin_fabsl (__x); \ + long double __temp = -0.5l * __builtin_log1pl (-(__y + __y) / (1.0l + __y)); \ + return __builtin_copysignl (__temp, __x)) /* The argument range of the inline version of hypotl is slightly reduced. */ -__inline_mathcodeNP2 (hypot, __x, __y, - return __libc_sqrtl (__x * __x + __y * __y)) +__inline_mathcodeNP2 (hypot, __x, __y, \ + return __builtin_sqrtl (__x * __x + __y * __y)) -# endif -# endif +# endif /* __USE_ISOC99 */ - -/* Undefine some of the large macros which are not used anymore. */ -# ifdef __FAST_MATH__ -# undef __expm1_code -# undef __exp_code # endif /* __FAST_MATH__ */ -# endif /* __NO_MATH_INLINES */ +# endif /* __NO_MATH_INLINES */ /* This code is used internally in the GNU libc. */