From patchwork Tue Apr 25 09:35:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnaud Charlet X-Patchwork-Id: 754664 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wByl61KrVz9s82 for ; Tue, 25 Apr 2017 19:36:18 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="tbUiMHyB"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=oeogiI52PJTZQFBfzliAvQ2uQIjDiCkP+0LQup9RrQ1WDGj2g3 GTcAcz/9iGsLysIrqjpBRld4+Oi4jqPozNi1APnIJABQOv8AeyuBFh6UbHE8j/Jn idAFZqheSoqQt0NjnwQNSTNEuyrNuzR9/Gia+eEak8es/nMZP8RYoJbgM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=jvDAEIET0qN71EfGsFhAAxBz7KA=; b=tbUiMHyB6AbYMLgl6QyH 0jPHiuYcB0B1ztd8CUmvx3TR4cesOe8mu5ntOc0vHQB9OCOhXSx4FdrHNiqPQBfC 49FezajYNPxN+kcVtH3zne4dzJ4ud3kHtJ/xfhqevHoL25Eet72dnGxnGH3eByea nMLbBhrQZqSR7ob5FG2rgXo= Received: (qmail 32499 invoked by alias); 25 Apr 2017 09:36:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32396 invoked by uid 89); 25 Apr 2017 09:36:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=H*Ad:U*ebotcazou, processors, nan, NaN X-HELO: rock.gnat.com Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 25 Apr 2017 09:35:59 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id C93D0355A; Tue, 25 Apr 2017 05:35:59 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id B8AKGlQQdc0Z; Tue, 25 Apr 2017 05:35:59 -0400 (EDT) Received: from tron.gnat.com (tron.gnat.com [205.232.38.10]) by rock.gnat.com (Postfix) with ESMTP id B557B3534; Tue, 25 Apr 2017 05:35:59 -0400 (EDT) Received: by tron.gnat.com (Postfix, from userid 4192) id B136C521; Tue, 25 Apr 2017 05:35:59 -0400 (EDT) Date: Tue, 25 Apr 2017 05:35:59 -0400 From: Arnaud Charlet To: gcc-patches@gcc.gnu.org Cc: Eric Botcazou Subject: [Ada] Reduce rounding overhead in sin/cos/tan functions on x86 Message-ID: <20170425093559.GA100959@adacore.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) The trigonometric functions of children of Ada.Numerics are implemented by inline assembly statements on the x86 architecture, and for sin/cos/tan a special range reduction algorithm is used to avoid a loss of accuracy in range reduction implemented in hardware on x86 processors. This algorithm contains a rounding step and it was implemented inefficiently by a call to a routine of the runtime. This patch changes it to using a more efficient inline sequence of machine instructions instead. The same change is applied to the range reduction algorithm used prior to calling the libc routines on PowerPC/Darwin. The patch also changes the definition of the Double type used to interface the libc routines in other implementations so as to use the built-in type corresponding to the C type (Long_Float, except for Long_Long_Float on x86). Compiling the following package at -O2 -gnatpgn must yield no calls to the rounding routines of the runtime on x86: with Ada.Numerics.Generic_Elementary_Functions; package P is new Ada.Numerics.Generic_Elementary_Functions (Long_Float); Tested on x86_64-pc-linux-gnu, committed on trunk 2017-04-25 Eric Botcazou * a-numaux.ads: Fix description of a-numaux-darwin and a-numaux-x86. (Double): Define to Long_Float. * a-numaux-vxworks.ads (Double): Likewise. * a-numaux-darwin.ads (Double): Likewise. * a-numaux-libc-x86.ads (Double): Define to Long_Long_Float. * a-numaux-x86.ads: Fix package description. * a-numaux-x86.adb (Is_Nan): Minor tweak. (Reduce): Adjust and complete description. Call Is_Nan instead of testing manually. Use an integer temporary to hold rounded value. * a-numaux-darwin.adb (Reduce): Likewise. (Is_Nan): New function. Index: a-numaux-vxworks.ads =================================================================== --- a-numaux-vxworks.ads (revision 247135) +++ a-numaux-vxworks.ads (working copy) @@ -36,7 +36,7 @@ package Ada.Numerics.Aux is pragma Pure; - type Double is digits 15; + type Double is new Long_Float; -- Type Double is the type used to call the C routines -- We import these functions directly from C. Note that we label them Index: a-numaux-x86.adb =================================================================== --- a-numaux-x86.adb (revision 247135) +++ a-numaux-x86.adb (working copy) @@ -49,8 +49,11 @@ -- for values of Y in the open interval (-0.25, 0.25) procedure Reduce (X : in out Double; Q : out Natural); - -- Implements reduction of X by Pi/2. Q is the quadrant of the final - -- result in the range 0 .. 3. The absolute value of X is at most Pi. + -- Implement reduction of X by Pi/2. Q is the quadrant of the final + -- result in the range 0..3. The absolute value of X is at most Pi/4. + -- It is needed to avoid a loss of accuracy for sin near Pi and cos + -- near Pi/2 due to the use of an insufficiently precise value of Pi + -- in the range reduction. pragma Inline (Is_Nan); pragma Inline (Reduce); @@ -117,7 +120,7 @@ begin -- The IEEE NaN values are the only ones that do not equal themselves - return not (X = X); + return X /= X; end Is_Nan; --------- @@ -154,32 +157,36 @@ P5 : constant Double := Double'Leading_Part (Half_Pi - P1 - P2 - P3 - P4, HM); P6 : constant Double := Double'Model (Half_Pi - P1 - P2 - P3 - P4 - P5); - K : Double := X * Two_Over_Pi; + K : Double; + R : Integer; + begin - -- For X < 2.0**32, all products below are computed exactly. + -- For X < 2.0**HM, all products below are computed exactly. -- Due to cancellation effects all subtractions are exact as well. -- As no double extended floating-point number has more than 75 -- zeros after the binary point, the result will be the correctly -- rounded result of X - K * (Pi / 2.0). + K := X * Two_Over_Pi; while abs K >= 2.0**HM loop K := K * M - (K * M - K); - X := (((((X - K * P1) - K * P2) - K * P3) - - K * P4) - K * P5) - K * P6; + X := + (((((X - K * P1) - K * P2) - K * P3) - K * P4) - K * P5) - K * P6; K := X * Two_Over_Pi; end loop; - if K /= K then + -- If K is not a number (because X was not finite) raise exception - -- K is not a number, because X was not finite - + if Is_Nan (K) then raise Constraint_Error; end if; - K := Double'Rounding (K); - Q := Integer (K) mod 4; - X := (((((X - K * P1) - K * P2) - K * P3) - - K * P4) - K * P5) - K * P6; + -- Go through an integer temporary so as to use machine instructions + + R := Integer (Double'Rounding (K)); + Q := R mod 4; + K := Double (R); + X := (((((X - K * P1) - K * P2) - K * P3) - K * P4) - K * P5) - K * P6; end Reduce; ---------- Index: a-numaux-x86.ads =================================================================== --- a-numaux-x86.ads (revision 247135) +++ a-numaux-x86.ads (working copy) @@ -30,7 +30,8 @@ -- -- ------------------------------------------------------------------------------ --- Version for the x86, using 64-bit IEEE format with inline asm statements +-- This version is for the x86 using the 80-bit x86 long double format with +-- inline asm statements. package Ada.Numerics.Aux is pragma Pure; Index: a-numaux-darwin.adb =================================================================== --- a-numaux-darwin.adb (revision 247135) +++ a-numaux-darwin.adb (working copy) @@ -36,11 +36,17 @@ -- Local subprograms -- ----------------------- + function Is_Nan (X : Double) return Boolean; + -- Return True iff X is a IEEE NaN value + procedure Reduce (X : in out Double; Q : out Natural); - -- Implements reduction of X by Pi/2. Q is the quadrant of the final - -- result in the range 0 .. 3. The absolute value of X is at most Pi/4. + -- Implement reduction of X by Pi/2. Q is the quadrant of the final + -- result in the range 0..3. The absolute value of X is at most Pi/4. + -- It is needed to avoid a loss of accuracy for sin near Pi and cos + -- near Pi/2 due to the use of an insufficiently precise value of Pi + -- in the range reduction. - -- The following three functions implement Chebishev approximations + -- The following two functions implement Chebishev approximations -- of the trigonometric functions in their reduced domain. -- These approximations have been computed using Maple. @@ -51,6 +57,10 @@ pragma Inline (Sine_Approx); pragma Inline (Cosine_Approx); + ------------------- + -- Cosine_Approx -- + ------------------- + function Cosine_Approx (X : Double) return Double is XX : constant Double := X * X; begin @@ -63,6 +73,10 @@ - 16#3.655E64869ECCE#E-14 + 1.0; end Cosine_Approx; + ----------------- + -- Sine_Approx -- + ----------------- + function Sine_Approx (X : Double) return Double is XX : constant Double := X * X; begin @@ -75,6 +89,17 @@ end Sine_Approx; ------------ + -- Is_Nan -- + ------------ + + function Is_Nan (X : Double) return Boolean is + begin + -- The IEEE NaN values are the only ones that do not equal themselves + + return X /= X; + end Is_Nan; + + ------------ -- Reduce -- ------------ @@ -92,6 +117,7 @@ - P4, HM); P6 : constant Double := Double'Model (Half_Pi - P1 - P2 - P3 - P4 - P5); K : Double; + R : Integer; begin -- For X < 2.0**HM, all products below are computed exactly. @@ -101,7 +127,7 @@ -- rounded result of X - K * (Pi / 2.0). K := X * Two_Over_Pi; - while abs K >= 2.0 ** HM loop + while abs K >= 2.0**HM loop K := K * M - (K * M - K); X := (((((X - K * P1) - K * P2) - K * P3) - K * P4) - K * P5) - K * P6; @@ -110,14 +136,16 @@ -- If K is not a number (because X was not finite) raise exception - if K /= K then + if Is_Nan (K) then raise Constraint_Error; end if; - K := Double'Rounding (K); - Q := Integer (K) mod 4; - X := (((((X - K * P1) - K * P2) - K * P3) - - K * P4) - K * P5) - K * P6; + -- Go through an integer temporary so as to use machine instructions + + R := Integer (Double'Rounding (K)); + Q := R mod 4; + K := Double (R); + X := (((((X - K * P1) - K * P2) - K * P3) - K * P4) - K * P5) - K * P6; end Reduce; --------- Index: a-numaux-darwin.ads =================================================================== --- a-numaux-darwin.ads (revision 247135) +++ a-numaux-darwin.ads (working copy) @@ -39,7 +39,7 @@ pragma Linker_Options ("-lm"); - type Double is digits 15; + type Double is new Long_Float; -- Type Double is the type used to call the C routines -- The following functions have been implemented in Ada, since Index: a-numaux.ads =================================================================== --- a-numaux.ads (revision 247135) +++ a-numaux.ads (working copy) @@ -40,9 +40,10 @@ -- This version here is for use with normal Unix math functions. Alternative -- versions are provided for special situations: --- a-numaux-darwin For OS/X (special handling of sin/cos for accuracy) +-- a-numaux-darwin For PowerPC/Darwin (special handling of sin/cos) -- a-numaux-libc-x86 For the x86, using 80-bit long double format --- a-numaux-x86 For the x86, using 64-bit IEEE (inline asm statements) +-- a-numaux-x86 For the x86, using 80-bit long double format with +-- inline asm statements -- a-numaux-vxworks For use on VxWorks (where we have no libm.a library) package Ada.Numerics.Aux is @@ -50,7 +51,7 @@ pragma Linker_Options ("-lm"); - type Double is digits 15; + type Double is new Long_Float; -- Type Double is the type used to call the C routines -- We import these functions directly from C. Note that we label them Index: a-numaux-libc-x86.ads =================================================================== --- a-numaux-libc-x86.ads (revision 247135) +++ a-numaux-libc-x86.ads (working copy) @@ -37,7 +37,7 @@ pragma Linker_Options ("-lm"); - type Double is digits 18; + type Double is new Long_Long_Float; -- We import these functions directly from C. Note that we label them -- all as pure functions, because indeed all of them are in fact pure.