From patchwork Wed Jul 1 16:30:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick McGehearty X-Patchwork-Id: 1320693 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ZAK+EQyi; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xmtH5CPQz9sV8 for ; Thu, 2 Jul 2020 02:30:30 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A49533861870; Wed, 1 Jul 2020 16:30:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A49533861870 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593621025; bh=BRrHCT5CEQJL+3U608aLgSgmuw7Vzo5qZXrFot1/QMs=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ZAK+EQyiIjKKyXrIBjJ5McNx62ryPu4URXHeqBzZT6hqdqkp3YpSdKC1xUQEBG2G/ I0vRVbSx8aH51j+qQfaijhm8Q/3lKvIpai41yTnxDGfH8aVUOz7Wbu9XEnePz0Uy6B N5MeFwWmKMdm+UVS1pN7iaxjSP/A56Xa3gBlzwDA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by sourceware.org (Postfix) with ESMTPS id 11DB53857007 for ; Wed, 1 Jul 2020 16:30:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 11DB53857007 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 061GSMhf167431 for ; Wed, 1 Jul 2020 16:30:19 GMT Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 31wxrnbhgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Wed, 01 Jul 2020 16:30:19 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 061GSc82070003 for ; Wed, 1 Jul 2020 16:30:18 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 31xfvu7p2r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 01 Jul 2020 16:30:18 +0000 Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 061GUHIN006892 for ; Wed, 1 Jul 2020 16:30:18 GMT Received: from localhost.us.oracle.com (/10.211.14.34) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 01 Jul 2020 16:30:17 +0000 To: gcc-patches@gcc.gnu.org Subject: [PATCH V3] Practical Improvement to libgcc Complex Divide Date: Wed, 1 Jul 2020 16:30:12 +0000 Message-Id: <1593621012-15058-1-git-send-email-patrick.mcgehearty@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9669 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 adultscore=0 spamscore=0 phishscore=0 malwarescore=0 mlxlogscore=999 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007010117 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9669 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 mlxlogscore=999 priorityscore=1501 impostorscore=0 bulkscore=0 clxscore=1015 malwarescore=0 phishscore=0 adultscore=0 cotscore=-2147483648 lowpriorityscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2007010117 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Patrick McGehearty via Gcc-patches From: Patrick McGehearty Reply-To: Patrick McGehearty Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" (Version 3) (Added in version 3) Support for half, float, extended, and long double precision has been added to the prior work for double precision. Since half precision is computed with float precision as per current libgcc practice, the enhanced underflow/overflow tests provide no benefit for half precision and would cost performance. Therefore half precision is left unchanged. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. Additional investigation showed that testing for when both parts of the denominator had exponents roughly small enough to allow shifting any subnormal values to normal values, all input values could be scaled up without risking unnecessary overflow and gaining a clear improvement in accuracy. The test and scaling values used all fit within the allowed exponent range for each precision required by the C standard. The remaining number of troubling results in version 3 is measurably smaller than in versions 1 and 2. The timing and precision tables below have been revised appropriately to match the algorithms used in this version for double precision and additional tables added to include results for other precisions. In prior versions, I omitted mention of the bug report that started me on this project: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714 complex division is surprising on targets with FMA (was: on aarch64) With the proposed method, whether using FMA or not, dividing 1.0+3.0i by 1.0+3.0i correctly returns 1.0+0.0i. I also have added a reference to Beebe's "The Mathematical Function Computation Handbook" [4] which was my starting point for research into better complex divide methods. (Added for Version 2) In my initial research, I missed Elen Kalda's proposed patch https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html [3] Thanks to Joseph Myers for providing me with the pointer. This version includes performance and accuracy comparisions between Elen's proposed patch and my latest patch version for double precision. (from earlier Versions) The following patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [1] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. NOTATION For all of the following, the notation is: Input complex values: a+bi (a= real part, b= imaginary part) c+di Output complex value: e+fi = (a+bi)/(c+di) For the result tables: current = current method (SMITH) b1div = method proposed by Elen Kalda b2div = alternate method considered by Elen Kalda new1 = new method using 1 divide and 2 multiplies new = new method proposed by this patch DESCRIPTIONS of different complex divide methods: NAIVE COMPUTATION (-fcx-limited-range): e = (a*c + b*d)/(c*c + d*d) f = (b*c - a*d)/(c*c + d*d) Note that c*c and d*d will overflow or underflow if either c or d is outside the range 2^-538 to 2^512. This method is available in gcc when the switch -fcx-limited-range is used. That switch is also enabled by -ffast-math. Only one who has a clear understanding of the maximum range of intermediate values generated by a computation should consider using this switch. SMITH's METHOD (current libgcc): if(fabs(c) RBIG) || (FABS (a) > RBIG) || (FABS (b) > RBIG) ) { a = a * 0.5; b = b * 0.5; c = c * 0.5; d = d * 0.5; } /* minimize overflow/underflow issues when c and d are small */ else if (FABS (d) < RMIN2) { a = a * RMINSCAL; b = b * RMINSCAL; c = c * RMINSCAL; d = d * RMINSCAL; } r = c/d; denom = (c*r) + d; if( r > RMIN ) { e = (a*r + b) / denom ; f = (b*r - a) / denom } else { e = (c * (a/d) + b) / denom; f = (c * (b/d) - a) / denom; } [ only presenting the fabs(c) < fabs(d) case here, full code in patch. ] Before any computation of the answer, the code checks for any input values near maximum to allow down scaling to avoid overflow. These scalings almost never harm the accuracy since they are by 2. Values that are over RBIG are relatively rare but it is easy to test for them and allow aviodance of overflows. Testing for RMIN2 reveals when both c and d are less than 2^-53 (for double precision, see patch for other values). By scaling all values by 2^51, the code avoids many underflows in intermediate computations that otherwise might occur. If scaling a and b by 2^51 causes either to overflow, then the computation will overflow whatever method is used. Next, r (the ratio of c to d) is checked for being near zero. Baudin and Smith checked r for zero. Checking for values less than DBL_MIN (subnormal) covers roughly 10 times as many cases and improves overall accuracy. If r is too small, then when it is used in a multiplication, there is a high chance that the result will underflow to zero, losing significant accuracy. That underflow is avoided by reordering the computation. When r is subnormal, the code replaces a*r (= a*(c/d)) with ((a/d)*c) which is mathematically the same but avoids the unnecessary underflow. TEST Data Two sets of data are presented to test these methods. Both sets contain 10 million pairs of complex values. The exponents and mantissas are generated using multiple calls to random() and then combining the results. Only values which give results to complex divide that are representable in the appropriate precision after being computed in quad precision are used. The first data set is labeled "moderate exponents". are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are The exponent range is limited to -DBL_MAX_EXP/2 to DBL_MAX_EXP/2 or FLT_MAX_EXP or LDBL_MAX_EXP for the appropriate precisions. The second data set is labeled "full exponents". The exponent range for these cases is the full exponent range for the precision. ACCURACY Test results: Note: The following accuracy tests are based on IEEE-754 arithmetic. Note: All results are based on use of fused multiply-add. If fused multiply-add is not used, the error rate increases slightly for the 1 and 2 ulp cases for both current and new complex divide. Differences at 8 ulp are barely measurable. The complex divide methods are evaluated by determining what percentage of values exceed different ulp (units in last place) levels. If a "2 ulp" test results show 1%, that would mean that 1% of 10,000,000 values (100,000) have either a real or imaginary part that had a greater than or equal to a 2 bit difference from the quad precision result. Results are reported for differences greater than or equal to 1 ulp, 2 ulp, 8 ulp, 16 ulp, 24 ulp, and 52 ulp for double precision. Even when the patch avoids overflows and underflows, some input values are expected to have errors due to the potential for catastrophic roundoff from floating point subtraction. For example, when b*c and a*d are nearly equal, the result of subtraction may lose several places of accuracy. This patch does not attempt to detect or minimize this type of error, but neither does it increase them. I only show the results for Elen Kalda's method (with both 1 and 2 divides) and the new method for only 1 divide in the double precision table. In the following charts, lower values are better. current - current complex divide in libgcc b1div - Elen Kalda's method from Baudin & Smith with one divide b2div - Elen Kalda's method from Baudin & Smith with two divides new1 - This patch except with 1 divide and 2 multiplies new - This patch which uses 2 divides =============================================================== Errors Moderate Dataset gtr eq current b1div b2div new1 new ====== ======== ======== ======== ======== ======== 1 ulp 0.24707% 0.92986% 0.24707% 0.92986% 0.24707% 2 ulp 0.01762% 0.01770% 0.01762% 0.01770% 0.01762% 8 ulp 0.00026% 0.00026% 0.00026% 0.00026% 0.00026% 16 ulp 0.00000% 0.00000% 0.00000% 0.00000% 0.00000% 24 ulp 0% 0% 0% 0% 0% 52 ulp 0% 0% 0% 0% 0% =============================================================== Table 1: Errors with Moderate Dataset Note in Table 1 that both the old and new methods give identical error rates for data with moderate exponents. Errors exceeding 16 ulp are exceedingly rare. There are substantial increases in the 1 ulp error rates for the 1 divide/2 multiply methods as compared to the 2 divides methods. These differences are minimal for 2 ulp and larger error measurements. I chose to use the more accurate method of two divides to avoid loss of accuracy relative to current behavior where current behavior is not at risk of under/overflow. =============================================================== Errors Full Dataset gtr eq current b1div b2div new1 new ====== ======== ======== ======== ======== ======== 1 ulp 2.05% 1.23842% 0.67130% 0.71774% 0.17100% 2 ulp 1.88% 0.51615% 0.50354% 0.02052% 0.01319% 8 ulp 1.79% 0.43227% 0.42531% 0.00358% 0.00350% 16 ulp 1.69% 0.34503% 0.33542% 0.00212% 0.00212% 24 ulp 1.59% 0.26367% 0.25189% 0.00138% 0.00138% 52 ulp 1.28% 0.01914% 0.00378% 0.00001% 0.00001% =============================================================== Table 2: Errors with Full Dataset (double) Table 2 shows significant differences in error rates, especially between the current method and the new method. With the current code, 1.13% of test values have results where no bits of the mantissa match the correct answer. That level of error is extremely rare with the new code. Consider the results for errors greater than or equal to 24 ulp. The current code sees those errors at a rate of 1.6%. Most would agree that such answers are "substantially different" from the answers calculated using extended precision. The b1div and b2div still shows errors about about one-sixth of that rate. The new code reduces errors at that level by more than a factor of 1000. These improvements are important for serious computation of complex numbers. Correctness for float ============================= Errors Moderate Dataset gtr eq current new ====== ======== ======== 1 ulp 28.68070% 28.68070% 2 ulp 0.64386% 0.64386% 8 ulp 0.00403% 0.00403% 16 ulp 0.00001% 0.00001% 24 ulp 0% 0% ============================= Table 3: Errors with Moderate Dataset (float) We see again in Table 3 as in Table 1, there is no change in accuracy between the current and new methods when the input data does not contain very large or very small exponents with potential for under/overflow. ============================= Errors Full Dataset gtr eq current new ====== ======== ======== 1 ulp 19.97% 17.85193% 2 ulp 3.20% 0.43153% 8 ulp 2.16% 0.04139% 16 ulp 1.40% 0.00818% (99.4% better) 24 ulp 0.98% 0.00016% ============================= Table 4: Errors with Full Dataset (float) In Table 4, we see as in Table 2, when extreme exponents are encountered, the new method provides a substantial reduction in unacceptable answers. The improvements are not quite as much as for double precision due to the inherent limitations of 32-bit floats. There is no change in accuracy for half-precision since the code is unchanged. libgcc computes half-precision functions in float precision allowing the existing methods to avoid overflow/underflow issues for the allowed range of exponents for half-precision. Extended precision (using x87 80-bit format on x86) and Long double (using IEEE-754 128-bit on x86 and aarch64) both have 15-bit exponents as compared to 11-bit exponents in double precision. We note that the C standard also allows Long Double to be implemented in the equivalent range of Double. The RMIN2 and RMINSCAL constants are selected to work within the Double range as well as with extended and 128-bit ranges. We will limit our performance and accurancy discussions to the 80-bit and 128-bit formats as seen on x86 here. The extended and long double precision investigations were more limited. Aarch64 does not supported extended precision but does support the software implementation of 128-bit long double precision. For x86, long double defaults to the 80-bit precision but using the -mlong-double-128 flag switches to using the software implementation of 128-bit precision. Both 80-bit and 128-bit precisions have the same exponent range, with the 128-bit precision has extended mantissas. Since this change is only aimed at avoiding underflow/overflow for extreme exponents, I studied the extended precision results on x86 for 100,000 values. The limited exponent dataset showed no differences. For the dataset with full exponent range, the current and new values showed major differences (greater than 32 ulp) in 567 cases out of 100,000 (0.56%). In every one of these cases, the ratio of c/d or d/c (as appropriate) was zero or subnormal, indicating the advantage of the new method and its continued correctness where needed. PERFORMANCE Test results In order for a library change to be practical, it is necessary to show the slowdown is tolerable. The slowdowns observed are much less than would be seen by (for example) switching from hardware double precison to a software quad precision, which on the tested machines causes a slowdown of around 100x). The actual slowdown depends on the machine architecture. It also depends on the nature of the input data. If underflow/overflow is rare, then implementations that have strong branch prediction will only slowdown by a few cycles. If underflow/overflow is common, then the branch predictors will be less successful and the cost will be higher. Results from two machines are presented as examples of the overhead for the new method. The one labeled x86 is a 5 year old Intel x86 processor and the one labeled aarch64 is a 3 year old arm64 processor. In the following chart, the times are averaged over a one million value data set. All values are scaled to set the time of current method to be 1.0. Lower values are better. A value of less than 1.0 would be faster than the current method and a value greater than 1.0 would be slower than the current method. ================================================ Moderate set full set x86 aarch64 x86 aarch64 ======== =============== =============== float 1.15 1.15 1.96 1.96 double 1.07 1.40 1.31 1.78 long double 1.08 1.23 1.08 1.24 ================================================ Table 5: Performance Comparisons (ratio new/current) The above tables omit the timing for the 1 divide and 2 multiply comparison with the 2 divide approach. I consider that optimization vs accuracy issue separate from the rest of this proposal. Roughly speaking the 1 divide and 2 multiply approach provides a performance gain of 10 to 15% at a cost of 1 or 2 ulp about 0.5% of the time for double precison. The loss of accurancy is somewhat greater for float than for double precision. For the proposed change, the moderate dataset shows less overhead for the newer methods than the full dataset. That's because the moderate dataset does not ever take the new branches which protect from under/overflow. The better the branch predictor, the lower the cost for these untaken branches. Both platforms are somewhat dated, with the x86 having a better branch predictor which reduces the cost of the additional branches in the new code. Of course, the relative slowdown may be greater for some architectures, especially those with limited branch prediction combined with a high cost of misprediction. This cost is claimed to be tolerable on the grounds that: (a) most applications will only spend a small fraction of their time calculating complex divide. (b) it is much less than the cost of extended precision (c) users are not forced to use it (as described below) (d) the cost is worthwhile considering the accuracy improvement shown in Table 2 and 4. Those users who find this degree of slowdown unsatisfactory may use the switch -fcx-fortran-rules which does not use the library routine, instead inlining Smith's method without the C99 requirement for dealing with NaN results. The proposed patch for libgcc complex divide does not affect the code generated by -fcx-fortran-rules. As a side note, double precision is not slower than float for the aarch64 and x64 platforms tested. A future change might investigate the performance of promoting the float values of complex divide to double precision and using the current Smith's method. Using double precision for float values will avoid the round off effects and using the larger mantissa will give better answers in the case of catastrophic subtraction. Better accuracy with no loss of performance seems a win all around, but would require performance testing on a wide range of platforms to confirm the "no loss of performance" observed on the two implementions tested so far. SUMMARY When input data to complex divide has exponents whose absolute value is half than half of *_MAX_EXP, this patch makes no changes in accuracy and has only a modest effect on performance. When input data contains values outside those ranges, the patch eliminates more than 99.5% of major errors with a tolerable cost in performance. In comparison to Elen Kalda's method, this patch introduces more performance overhead but reduces overflow/underflow more than 100 times as often. There is an intermediate design choice which would catch 13x fewer errors while introducing roughly 20% lower overhead. That could be analyzed in depth if the overhead of the current proposal is unacceptable. REFERENCES [1] Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962. [2] Michael Baudin and Robert L. Smith. "A robust complex division in Scilab," October 2012, available at http://arxiv.org/abs/1210.4539. [3] Elen Kalda: Complex division improvements in libgcc https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html [4] Nelson H.F. Beebe, "The Mathematical-Function Computation Handbook. Springer International Publishing AG, 2017. --- libgcc/ChangeLog | 6 ++++ libgcc/libgcc2.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 107 insertions(+), 4 deletions(-) diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog index d6367e2..d1db004 100644 --- a/libgcc/ChangeLog +++ b/libgcc/ChangeLog @@ -1,3 +1,9 @@ +2020-06-24 Patrick McGehearty + + * libgcc2.c (__divdc3): Enhance accuracy of complex divide by + avoiding underflow/overflow when input values have very large + or very small exponents. + 2020-06-25 Martin Liska * libgcov-driver.c (merge_summary): Remove function as its name diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c index e0a9fd7..feffb13 100644 --- a/libgcc/libgcc2.c +++ b/libgcc/libgcc2.c @@ -2036,13 +2036,35 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) CTYPE CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) { +#if defined(L_divsc3) +#define RBIG ((FLT_MAX)/2.0) +#define RMIN (FLT_MIN) +#define RMIN2 (0x1.0p-21) +#define RMINSCAL (0x1.0p+19) +#endif + +#if defined(L_divdc3) +#define RBIG ((DBL_MAX)/2.0) +#define RMIN (DBL_MIN) +#define RMIN2 (0x1.0p-53) +#define RMINSCAL (0x1.0p+51) +#endif + +#if (defined(L_divxc3) || defined(L_divtc3)) +#define RBIG ((LDBL_MAX)/2.0) +#define RMIN (LDBL_MIN) +#define RMIN2 (0x1.0p-53) +#define RMINSCAL (0x1.0p+51) +#endif + MTYPE denom, ratio, x, y; CTYPE res; - /* ??? We can get better behavior from logarithmic scaling instead of - the division. But that would mean starting to link libgcc against - libm. We could implement something akin to ldexp/frexp as gcc builtins - fairly easily... */ +#if defined(L_divhc3) + /* half precision is handled with float precision. + no extra measures are needed to avoid overflow/underflow */ + + /* Scale by max(c,d) to reduce chances of denominator overflowing */ if (FABS (c) < FABS (d)) { ratio = c / d; @@ -2057,6 +2079,81 @@ CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d) x = ((b * ratio) + a) / denom; y = (b - (a * ratio)) / denom; } +#else + /* float, double, extended, long double have significant potential + underflow/overflow errors than can be greatly reduced with + a limited number of adjustments */ + + /* Scale by max(c,d) to reduce chances of denominator overflowing */ + if (FABS (c) < FABS (d)) + { + /* prevent underflow when denominator is near max representable */ + if (FABS (d) >= RBIG) + { + a = a * 0.5; + b = b * 0.5; + c = c * 0.5; + d = d * 0.5; + } + /* avoid overflow/underflow issues when c and d are small */ + /* scaling up helps avoid some underflows */ + /* No new overflow possible since c&d < RMIN2 */ + if (FABS (d) < RMIN2) + { + a = a * RMINSCAL; + b = b * RMINSCAL; + c = c * RMINSCAL; + d = d * RMINSCAL; + } + ratio = c / d; + denom = (c * ratio) + d; + /* choose alternate order of computation if ratio is subnormal */ + if (FABS (ratio) > RMIN) + { + x = ((a * ratio) + b) / denom; + y = ((b * ratio) - a) / denom; + } + else + { + x = ((c * (a / d)) + b) / denom; + y = ((c * (b / d)) - a) / denom; + } + } + else + { + /* prevent underflow when denominator is near max representable */ + if (FABS(c) >= RBIG) + { + a = a * 0.5; + b = b * 0.5; + c = c * 0.5; + d = d * 0.5; + } + /* avoid overflow/underflow issues when both c and d are small */ + /* scaling up helps avoid some underflows */ + /* No new overflow possible since both c&d are less than RMIN2 */ + if (FABS(c) < RMIN2) + { + a = a * RMINSCAL; + b = b * RMINSCAL; + c = c * RMINSCAL; + d = d * RMINSCAL; + } + ratio = d / c; + denom = (d * ratio) + c; + /* choose alternate order of computation if ratio is subnormal */ + if (FABS(ratio) > RMIN) + { + x = ((b * ratio) + a) / denom; + y = (b - (a * ratio)) / denom; + } + else + { + x = (a + (d * (b / c))) / denom; + y = (b - (d * (a / c))) / denom; + } + } +#endif /* Recover infinities and zeros that computed as NaN+iNaN; the only cases are nonzero/zero, infinite/finite, and finite/infinite. */