From patchwork Mon Sep 28 22:12:15 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Myers X-Patchwork-Id: 523621 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0EE451402D8 for ; Tue, 29 Sep 2015 08:12:33 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=kc8eHi8W; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; q=dns; s=default; b=TENr3pNu2hPh35JT31DgO1qlvknqd uW7gPLxhLrdnHuWKHIyLXOSeAlToUtDkz3CP3OXAIMIV+xNKMfv9d8k/Y0JQ25Pc foaLCedk6LvRVe3H4tMo/3aksVJNbxGZ2GsM69IoU+G94mzDcOV7Lmh4aXIUzzZ3 zRJDb15qoE2Nf8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; s=default; bh=D6VhbZZYHc2lp1x1lvBXXJWZp1I=; b=kc8 eHi8WP/LViC/c+SOnKpeNo83mz8dU/y1CVr/z6kwQTDNQ1vWU6PyPCPmsXlaxqzn /9opXF6pid4Wr1qZ2/cMi53b1XtrYcgJQ5GpktRX+g96etnsPHt/Sy5UZx3jEnYz /7zzWzCRxb0W/EXhhASHckSBWoPtpCIFrRnwxcKI= Received: (qmail 12898 invoked by alias); 28 Sep 2015 22:12:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 12886 invoked by uid 89); 28 Sep 2015 22:12:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Date: Mon, 28 Sep 2015 22:12:15 +0000 From: Joseph Myers To: Subject: Fix clog, clog10 inaccuracy (bug 19016) [committed] Message-ID: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids cancellation error and then using log1p. However, the thresholds for using that approach still result in log being used on argument as large as sqrt(13/16) > 0.9, leading to significant errors, in some cases above the 9ulp maximum allowed in glibc libm. This patch arranges for the approach using log1p to be used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the existing allowance for cases where one of X and Y is very small), adjusting the __x2y2m1 functions to work with the wider range of inputs. This way, log only gets used on arguments below sqrt(1/2) (or substantially above 1), where the error involved is much less. Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration I removed the existing clog and clog10 ulps before regenerating to allow any reduced ulps to appear. Tests added include those found by random test generation to produce large ulps either before or after the patch, and some found by trying inputs close to the (0.75, 0.5) threshold where the potential errors from using log are largest. Committed. (auto-libm-test-out diffs omitted below.) 2015-09-28 Joseph Myers [BZ #19016] * sysdeps/generic/math_private.h (__x2y2m1f): Update comment to allow more cases with X^2 + Y^2 >= 0.5. * sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment. * sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0] (__x2y2m1): Update comment. * sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * math/s_clog.c (__clog): Handle more cases using log1p without hypot. * math/s_clog10.c (__clog10): Likewise. * math/s_clog10f.c (__clog10f): Likewise. * math/s_clog10l.c (__clog10l): Likewise. * math/s_clogf.c (__clogf): Likewise. * math/s_clogl.c (__clogl): Likewise. * math/auto-libm-test-in: Add more tests of clog and clog10. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise. diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index e86be23..5648965 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -660,6 +660,51 @@ clog -0xa.7ac41a0b417cb8fp-4 -0x6.c5a32eaeedd4p-4 clog 0x3.c16p-136 0x8p-152 clog -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4 +clog -0x2.7bdep-4 0x5.ab7a4p-4 +clog -0xb.e1d3d0ff44358p-4 -0x7.54785e1b143f8p-4 +clog 0x3.ba473p+0 0x7.eea9ap-4 +clog 0x9.d02220baee4ep+36 0x2.b9a29cp+0 +clog -0x5.1a5cf8p-4 -0xb.73012p-4 +clog -0xa.ff292a609dbb8p-4 0x6.f73d4cp-4 +clog -0x5.1a5cfc2301114p-4 -0xb.730118p-4 +clog 0xb.ffffcp-4 0x7.ffff1p-4 +clog 0xb.ffffp-4 0x7.ffffap-4 +clog 0xb.ffffp-4 0x7.fffff8p-4 +clog 0xb.ffffp-4 0x7.ffffp-4 +clog 0xb.fffffp-4 0x7.ffff68p-4 +clog 0xb.fffffp-4 0x7.ffffp-4 +clog 0xb.ffff8p-4 0x7.ffffcp-4 +clog 0xb.ffffp-4 0x7.ffffcp-4 +clog 0xb.ffffp-4 0x7.ffffb8p-4 +clog 0xb.ffffp-4 0x7.ffff7p-4 +clog 0xb.ffffp-4 0x7.ffff5p-4 +clog 0xb.fffffffffff7p-4 0x7.fffff8p-4 +clog 0xb.fffffffffff08p-4 0x7.fffffffffffdp-4 +clog 0xb.fffffffffff08p-4 0x7.fffffffffff9p-4 +clog 0xb.fffffffffffp-4 0x7.fffffffffffdcp-4 +clog 0xb.fffffp-4 0x7.ffffffffffff4p-4 +clog 0xb.fffffffffffp-4 0x7.fffffffffffecp-4 +clog 0xb.fffffffffff8p-4 0x7.fffff8p-4 +clog 0x8p-152 -0x1.10233ap+0 +clog 0xa.03634p-4 -0x4.7bb918p-20 +clog -0x5.e23d2p-4 0x8.525df889c21ap-4 +clog 0x9.8ce58p-4 -0x8p-152 +clog 0x8p-152 0x9.2af75p-4 +clog 0x9.97a15de8e59d8p-4 -0 +clog -0x4.74556ec92eb4746p-4 0x1.1e7aa1d936f6efe6p+0 +clog 0x9.97a15de8e59d8p-4 -0 +clog -0x9.7f1d7p-64 0x9.db37dp-4 +clog -0x8.5efc4p-4 -0x5.40310cp-4 +clog -0x9.0b459p-4 0 +clog -0x6.a9419e9b30e68p-4 -0x6.262c7p-4 +clog 0x5.2767cdfdfbf2p-4 0x7.69ee98p-4 +clog -0x9.f5563cb3227d8p-4 0 +clog -0x9.5a284p-4 0x6.899578p-8 +clog 0xa.3e62bp-4 0x1.18c03p-100 +clog 0 -0x9.22a99p-4 +clog 0 0x9.7915bp-4 +clog 0x3.00d1ap-12 0x1.23ff6ap+0 + clog 0x1.fffffep+127 0x1.fffffep+127 clog 0x1.fffffep+127 1.0 clog 0x1p-149 0x1p-149 @@ -808,6 +853,51 @@ clog10 -0xa.7ac41a0b417cb8fp-4 -0x6.c5a32eaeedd4p-4 clog10 0x3.c16p-136 0x8p-152 clog10 -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4 +clog10 -0x2.7bdep-4 0x5.ab7a4p-4 +clog10 -0xb.e1d3d0ff44358p-4 -0x7.54785e1b143f8p-4 +clog10 0x3.ba473p+0 0x7.eea9ap-4 +clog10 0x9.d02220baee4ep+36 0x2.b9a29cp+0 +clog10 -0x5.1a5cf8p-4 -0xb.73012p-4 +clog10 -0xa.ff292a609dbb8p-4 0x6.f73d4cp-4 +clog10 -0x5.1a5cfc2301114p-4 -0xb.730118p-4 +clog10 0xb.ffffcp-4 0x7.ffff1p-4 +clog10 0xb.ffffp-4 0x7.ffffap-4 +clog10 0xb.ffffp-4 0x7.fffff8p-4 +clog10 0xb.ffffp-4 0x7.ffffp-4 +clog10 0xb.fffffp-4 0x7.ffff68p-4 +clog10 0xb.fffffp-4 0x7.ffffp-4 +clog10 0xb.ffff8p-4 0x7.ffffcp-4 +clog10 0xb.ffffp-4 0x7.ffffcp-4 +clog10 0xb.ffffp-4 0x7.ffffb8p-4 +clog10 0xb.ffffp-4 0x7.ffff7p-4 +clog10 0xb.ffffp-4 0x7.ffff5p-4 +clog10 0xb.fffffffffff7p-4 0x7.fffff8p-4 +clog10 0xb.fffffffffff08p-4 0x7.fffffffffffdp-4 +clog10 0xb.fffffffffff08p-4 0x7.fffffffffff9p-4 +clog10 0xb.fffffffffffp-4 0x7.fffffffffffdcp-4 +clog10 0xb.fffffp-4 0x7.ffffffffffff4p-4 +clog10 0xb.fffffffffffp-4 0x7.fffffffffffecp-4 +clog10 0xb.fffffffffff8p-4 0x7.fffff8p-4 +clog10 0x8p-152 -0x1.10233ap+0 +clog10 0xa.03634p-4 -0x4.7bb918p-20 +clog10 -0x5.e23d2p-4 0x8.525df889c21ap-4 +clog10 0x9.8ce58p-4 -0x8p-152 +clog10 0x8p-152 0x9.2af75p-4 +clog10 0x9.97a15de8e59d8p-4 -0 +clog10 -0x4.74556ec92eb4746p-4 0x1.1e7aa1d936f6efe6p+0 +clog10 0x9.97a15de8e59d8p-4 -0 +clog10 -0x9.7f1d7p-64 0x9.db37dp-4 +clog10 -0x8.5efc4p-4 -0x5.40310cp-4 +clog10 -0x9.0b459p-4 0 +clog10 -0x6.a9419e9b30e68p-4 -0x6.262c7p-4 +clog10 0x5.2767cdfdfbf2p-4 0x7.69ee98p-4 +clog10 -0x9.f5563cb3227d8p-4 0 +clog10 -0x9.5a284p-4 0x6.899578p-8 +clog10 0xa.3e62bp-4 0x1.18c03p-100 +clog10 0 -0x9.22a99p-4 +clog10 0 0x9.7915bp-4 +clog10 0x3.00d1ap-12 0x1.23ff6ap+0 + clog10 0x1.fffffep+127 0x1.fffffep+127 clog10 0x1.fffffep+127 1.0 clog10 0x1p-149 0x1p-149 diff --git a/math/s_clog.c b/math/s_clog.c index b010e89..2ca8ca4 100644 --- a/math/s_clog.c +++ b/math/s_clog.c @@ -76,14 +76,17 @@ __clog (__complex__ double x) __real__ result = __log1p (d2m1) / 2.0; } else if (absx < 1.0 - && absx >= 0.75 + && absx >= 0.5 && absy < DBL_EPSILON / 2.0 && scale == 0) { double d2m1 = (absx - 1.0) * (absx + 1.0); __real__ result = __log1p (d2m1) / 2.0; } - else if (absx < 1.0 && (absx >= 0.75 || absy >= 0.5) && scale == 0) + else if (absx < 1.0 + && absx >= 0.5 + && scale == 0 + && absx * absx + absy * absy >= 0.5) { double d2m1 = __x2y2m1 (absx, absy); __real__ result = __log1p (d2m1) / 2.0; diff --git a/math/s_clog10.c b/math/s_clog10.c index b6a4342..d8f52af 100644 --- a/math/s_clog10.c +++ b/math/s_clog10.c @@ -82,14 +82,17 @@ __clog10 (__complex__ double x) __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); } else if (absx < 1.0 - && absx >= 0.75 + && absx >= 0.5 && absy < DBL_EPSILON / 2.0 && scale == 0) { double d2m1 = (absx - 1.0) * (absx + 1.0); __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); } - else if (absx < 1.0 && (absx >= 0.75 || absy >= 0.5) && scale == 0) + else if (absx < 1.0 + && absx >= 0.5 + && scale == 0 + && absx * absx + absy * absy >= 0.5) { double d2m1 = __x2y2m1 (absx, absy); __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); diff --git a/math/s_clog10f.c b/math/s_clog10f.c index b77a849..1e6ebbb 100644 --- a/math/s_clog10f.c +++ b/math/s_clog10f.c @@ -82,14 +82,17 @@ __clog10f (__complex__ float x) __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); } else if (absx < 1.0f - && absx >= 0.75f + && absx >= 0.5f && absy < FLT_EPSILON / 2.0f && scale == 0) { float d2m1 = (absx - 1.0f) * (absx + 1.0f); __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); } - else if (absx < 1.0f && (absx >= 0.75f || absy >= 0.5f) && scale == 0) + else if (absx < 1.0f + && absx >= 0.5f + && scale == 0 + && absx * absx + absy * absy >= 0.5f) { float d2m1 = __x2y2m1f (absx, absy); __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); diff --git a/math/s_clog10l.c b/math/s_clog10l.c index 86ec512..d3da399 100644 --- a/math/s_clog10l.c +++ b/math/s_clog10l.c @@ -89,14 +89,17 @@ __clog10l (__complex__ long double x) __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); } else if (absx < 1.0L - && absx >= 0.75L + && absx >= 0.5L && absy < LDBL_EPSILON / 2.0L && scale == 0) { long double d2m1 = (absx - 1.0L) * (absx + 1.0L); __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); } - else if (absx < 1.0L && (absx >= 0.75L || absy >= 0.5L) && scale == 0) + else if (absx < 1.0L + && absx >= 0.5L + && scale == 0 + && absx * absx + absy * absy >= 0.5L) { long double d2m1 = __x2y2m1l (absx, absy); __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); diff --git a/math/s_clogf.c b/math/s_clogf.c index ffec7ce..c351cab 100644 --- a/math/s_clogf.c +++ b/math/s_clogf.c @@ -76,14 +76,17 @@ __clogf (__complex__ float x) __real__ result = __log1pf (d2m1) / 2.0f; } else if (absx < 1.0f - && absx >= 0.75f + && absx >= 0.5f && absy < FLT_EPSILON / 2.0f && scale == 0) { float d2m1 = (absx - 1.0f) * (absx + 1.0f); __real__ result = __log1pf (d2m1) / 2.0f; } - else if (absx < 1.0f && (absx >= 0.75f || absy >= 0.5f) && scale == 0) + else if (absx < 1.0f + && absx >= 0.5f + && scale == 0 + && absx * absx + absy * absy >= 0.5f) { float d2m1 = __x2y2m1f (absx, absy); __real__ result = __log1pf (d2m1) / 2.0f; diff --git a/math/s_clogl.c b/math/s_clogl.c index 6325df4..4ec4c80 100644 --- a/math/s_clogl.c +++ b/math/s_clogl.c @@ -83,14 +83,17 @@ __clogl (__complex__ long double x) __real__ result = __log1pl (d2m1) / 2.0L; } else if (absx < 1.0L - && absx >= 0.75L + && absx >= 0.5L && absy < LDBL_EPSILON / 2.0L && scale == 0) { long double d2m1 = (absx - 1.0L) * (absx + 1.0L); __real__ result = __log1pl (d2m1) / 2.0L; } - else if (absx < 1.0L && (absx >= 0.75L || absy >= 0.5L) && scale == 0) + else if (absx < 1.0L + && absx >= 0.5L + && scale == 0 + && absx * absx + absy * absy >= 0.5L) { long double d2m1 = __x2y2m1l (absx, absy); __real__ result = __log1pl (d2m1) / 2.0L; diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h index a8f1a8e..cf1865d 100644 --- a/sysdeps/generic/math_private.h +++ b/sysdeps/generic/math_private.h @@ -365,8 +365,8 @@ extern double __slowpow (double __x, double __y, double __z); extern void __docos (double __x, double __dx, double __v[]); /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ extern float __x2y2m1f (float x, float y); extern double __x2y2m1 (double x, double y); extern long double __x2y2m1l (long double x, long double y); diff --git a/sysdeps/i386/fpu/libm-test-ulps b/sysdeps/i386/fpu/libm-test-ulps index 32f24d0..438a390 100644 --- a/sysdeps/i386/fpu/libm-test-ulps +++ b/sysdeps/i386/fpu/libm-test-ulps @@ -836,12 +836,12 @@ ildouble: 3 ldouble: 3 Function: Real part of "clog": -double: 3 -float: 2 -idouble: 3 -ifloat: 2 -ildouble: 4 -ldouble: 4 +double: 2 +float: 1 +idouble: 2 +ifloat: 1 +ildouble: 3 +ldouble: 3 Function: Imaginary part of "clog": double: 1 @@ -864,10 +864,10 @@ ildouble: 2 ldouble: 2 Function: Real part of "clog10_downward": -double: 5 -float: 4 -idouble: 5 -ifloat: 4 +double: 3 +float: 3 +idouble: 3 +ifloat: 3 ildouble: 8 ldouble: 8 @@ -876,14 +876,14 @@ double: 1 float: 1 idouble: 1 ifloat: 1 -ildouble: 2 -ldouble: 2 +ildouble: 3 +ldouble: 3 Function: Real part of "clog10_towardzero": -double: 5 -float: 4 -idouble: 5 -ifloat: 4 +double: 3 +float: 3 +idouble: 3 +ifloat: 3 ildouble: 8 ldouble: 8 @@ -896,12 +896,12 @@ ildouble: 3 ldouble: 3 Function: Real part of "clog10_upward": -double: 5 -float: 5 -idouble: 5 -ifloat: 5 -ildouble: 6 -ldouble: 6 +double: 3 +float: 3 +idouble: 3 +ifloat: 3 +ildouble: 7 +ldouble: 7 Function: Imaginary part of "clog10_upward": double: 1 @@ -912,12 +912,12 @@ ildouble: 3 ldouble: 3 Function: Real part of "clog_downward": -double: 5 -float: 5 -idouble: 5 -ifloat: 5 -ildouble: 7 -ldouble: 7 +double: 3 +float: 3 +idouble: 3 +ifloat: 3 +ildouble: 5 +ldouble: 5 Function: Imaginary part of "clog_downward": double: 1 @@ -928,12 +928,12 @@ ildouble: 1 ldouble: 1 Function: Real part of "clog_towardzero": -double: 5 -float: 5 -idouble: 5 -ifloat: 5 -ildouble: 8 -ldouble: 8 +double: 3 +float: 3 +idouble: 3 +ifloat: 3 +ildouble: 5 +ldouble: 5 Function: Imaginary part of "clog_towardzero": double: 1 @@ -944,12 +944,12 @@ ildouble: 1 ldouble: 1 Function: Real part of "clog_upward": -double: 5 -float: 5 -idouble: 5 -ifloat: 5 -ildouble: 6 -ldouble: 6 +double: 2 +float: 3 +idouble: 2 +ifloat: 3 +ildouble: 4 +ldouble: 4 Function: Imaginary part of "clog_upward": double: 1 diff --git a/sysdeps/ieee754/dbl-64/x2y2m1.c b/sysdeps/ieee754/dbl-64/x2y2m1.c index c96dae5..b040097 100644 --- a/sysdeps/ieee754/dbl-64/x2y2m1.c +++ b/sysdeps/ieee754/dbl-64/x2y2m1.c @@ -80,32 +80,26 @@ compare (const void *p, const void *q) } /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ double __x2y2m1 (double x, double y) { - double vals[4]; + double vals[5]; SET_RESTORE_ROUND (FE_TONEAREST); mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[3], &vals[2], y, y); - if (x >= 0.75) - vals[1] -= 1.0; - else - { - vals[1] -= 0.5; - vals[3] -= 0.5; - } - qsort (vals, 4, sizeof (double), compare); + vals[4] = -1.0; + qsort (vals, 5, sizeof (double), compare); /* Add up the values so that each element of VALS has absolute value at most equal to the last set bit of the next nonzero element. */ - for (size_t i = 0; i <= 2; i++) + for (size_t i = 0; i <= 3; i++) { add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); - qsort (vals + i + 1, 3 - i, sizeof (double), compare); + qsort (vals + i + 1, 4 - i, sizeof (double), compare); } /* Now any error from this addition will be small. */ - return vals[3] + vals[2] + vals[1] + vals[0]; + return vals[4] + vals[3] + vals[2] + vals[1] + vals[0]; } diff --git a/sysdeps/ieee754/dbl-64/x2y2m1f.c b/sysdeps/ieee754/dbl-64/x2y2m1f.c index 43a8acf..835f6a0 100644 --- a/sysdeps/ieee754/dbl-64/x2y2m1f.c +++ b/sysdeps/ieee754/dbl-64/x2y2m1f.c @@ -21,8 +21,8 @@ #include /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ float __x2y2m1f (float x, float y) diff --git a/sysdeps/ieee754/ldbl-128/x2y2m1l.c b/sysdeps/ieee754/ldbl-128/x2y2m1l.c index 11757c6..a0498c3 100644 --- a/sysdeps/ieee754/ldbl-128/x2y2m1l.c +++ b/sysdeps/ieee754/ldbl-128/x2y2m1l.c @@ -80,32 +80,26 @@ compare (const void *p, const void *q) } /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ long double __x2y2m1l (long double x, long double y) { - long double vals[4]; + long double vals[5]; SET_RESTORE_ROUNDL (FE_TONEAREST); mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[3], &vals[2], y, y); - if (x >= 0.75L) - vals[1] -= 1.0L; - else - { - vals[1] -= 0.5L; - vals[3] -= 0.5L; - } - qsort (vals, 4, sizeof (long double), compare); + vals[4] = -1.0L; + qsort (vals, 5, sizeof (long double), compare); /* Add up the values so that each element of VALS has absolute value at most equal to the last set bit of the next nonzero element. */ - for (size_t i = 0; i <= 2; i++) + for (size_t i = 0; i <= 3; i++) { add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); - qsort (vals + i + 1, 3 - i, sizeof (long double), compare); + qsort (vals + i + 1, 4 - i, sizeof (long double), compare); } /* Now any error from this addition will be small. */ - return vals[3] + vals[2] + vals[1] + vals[0]; + return vals[4] + vals[3] + vals[2] + vals[1] + vals[0]; } diff --git a/sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c b/sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c index a001b58..081fb98 100644 --- a/sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c +++ b/sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c @@ -80,13 +80,13 @@ compare (const void *p, const void *q) } /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ long double __x2y2m1l (long double x, long double y) { - double vals[12]; + double vals[13]; SET_RESTORE_ROUND (FE_TONEAREST); union ibm_extended_long_double xu, yu; xu.ld = x; @@ -105,25 +105,19 @@ __x2y2m1l (long double x, long double y) vals[8] *= 2.0; vals[9] *= 2.0; mul_split (&vals[11], &vals[10], yu.d[1].d, yu.d[1].d); - if (xu.d[0].d >= 0.75) - vals[1] -= 1.0; - else - { - vals[1] -= 0.5; - vals[7] -= 0.5; - } - qsort (vals, 12, sizeof (double), compare); + vals[12] = -1.0; + qsort (vals, 13, sizeof (double), compare); /* Add up the values so that each element of VALS has absolute value at most equal to the last set bit of the next nonzero element. */ - for (size_t i = 0; i <= 10; i++) + for (size_t i = 0; i <= 11; i++) { add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); - qsort (vals + i + 1, 11 - i, sizeof (double), compare); + qsort (vals + i + 1, 12 - i, sizeof (double), compare); } /* Now any error from this addition will be small. */ - long double retval = (long double) vals[11]; - for (size_t i = 10; i != (size_t) -1; i--) + long double retval = (long double) vals[12]; + for (size_t i = 11; i != (size_t) -1; i--) retval += (long double) vals[i]; return retval; } diff --git a/sysdeps/ieee754/ldbl-96/x2y2m1.c b/sysdeps/ieee754/ldbl-96/x2y2m1.c index a6cc82c..2f6b0be 100644 --- a/sysdeps/ieee754/ldbl-96/x2y2m1.c +++ b/sysdeps/ieee754/ldbl-96/x2y2m1.c @@ -27,8 +27,8 @@ #else /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ double __x2y2m1 (double x, double y) diff --git a/sysdeps/ieee754/ldbl-96/x2y2m1l.c b/sysdeps/ieee754/ldbl-96/x2y2m1l.c index 11757c6..a0498c3 100644 --- a/sysdeps/ieee754/ldbl-96/x2y2m1l.c +++ b/sysdeps/ieee754/ldbl-96/x2y2m1l.c @@ -80,32 +80,26 @@ compare (const void *p, const void *q) } /* Return X^2 + Y^2 - 1, computed without large cancellation error. - It is given that 1 > X >= Y >= epsilon / 2, and that either X >= - 0.75 or Y >= 0.5. */ + It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >= + 0.5. */ long double __x2y2m1l (long double x, long double y) { - long double vals[4]; + long double vals[5]; SET_RESTORE_ROUNDL (FE_TONEAREST); mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[3], &vals[2], y, y); - if (x >= 0.75L) - vals[1] -= 1.0L; - else - { - vals[1] -= 0.5L; - vals[3] -= 0.5L; - } - qsort (vals, 4, sizeof (long double), compare); + vals[4] = -1.0L; + qsort (vals, 5, sizeof (long double), compare); /* Add up the values so that each element of VALS has absolute value at most equal to the last set bit of the next nonzero element. */ - for (size_t i = 0; i <= 2; i++) + for (size_t i = 0; i <= 3; i++) { add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); - qsort (vals + i + 1, 3 - i, sizeof (long double), compare); + qsort (vals + i + 1, 4 - i, sizeof (long double), compare); } /* Now any error from this addition will be small. */ - return vals[3] + vals[2] + vals[1] + vals[0]; + return vals[4] + vals[3] + vals[2] + vals[1] + vals[0]; } diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps index 12c3dd1..ef3ab70 100644 --- a/sysdeps/x86_64/fpu/libm-test-ulps +++ b/sysdeps/x86_64/fpu/libm-test-ulps @@ -869,11 +869,11 @@ ldouble: 3 Function: Real part of "clog": double: 3 -float: 2 +float: 3 idouble: 3 -ifloat: 2 -ildouble: 4 -ldouble: 4 +ifloat: 3 +ildouble: 3 +ldouble: 3 Function: Imaginary part of "clog": float: 1 @@ -883,9 +883,9 @@ ldouble: 1 Function: Real part of "clog10": double: 3 -float: 3 +float: 4 idouble: 3 -ifloat: 3 +ifloat: 4 ildouble: 4 ldouble: 4 @@ -898,10 +898,10 @@ ildouble: 2 ldouble: 2 Function: Real part of "clog10_downward": -double: 6 -float: 6 -idouble: 6 -ifloat: 6 +double: 5 +float: 4 +idouble: 5 +ifloat: 4 ildouble: 8 ldouble: 8 @@ -910,14 +910,14 @@ double: 2 float: 4 idouble: 2 ifloat: 4 -ildouble: 2 -ldouble: 2 +ildouble: 3 +ldouble: 3 Function: Real part of "clog10_towardzero": double: 5 -float: 4 +float: 5 idouble: 5 -ifloat: 4 +ifloat: 5 ildouble: 8 ldouble: 8 @@ -930,28 +930,28 @@ ildouble: 3 ldouble: 3 Function: Real part of "clog10_upward": -double: 8 +double: 6 float: 5 -idouble: 8 +idouble: 6 ifloat: 5 -ildouble: 6 -ldouble: 6 +ildouble: 7 +ldouble: 7 Function: Imaginary part of "clog10_upward": double: 2 -float: 3 +float: 4 idouble: 2 -ifloat: 3 +ifloat: 4 ildouble: 3 ldouble: 3 Function: Real part of "clog_downward": -double: 7 -float: 5 -idouble: 7 -ifloat: 5 -ildouble: 7 -ldouble: 7 +double: 4 +float: 3 +idouble: 4 +ifloat: 3 +ildouble: 5 +ldouble: 5 Function: Imaginary part of "clog_downward": double: 1 @@ -962,28 +962,28 @@ ildouble: 1 ldouble: 1 Function: Real part of "clog_towardzero": -double: 7 -float: 5 -idouble: 7 -ifloat: 5 -ildouble: 8 -ldouble: 8 +double: 4 +float: 4 +idouble: 4 +ifloat: 4 +ildouble: 5 +ldouble: 5 Function: Imaginary part of "clog_towardzero": double: 1 -float: 2 +float: 3 idouble: 1 -ifloat: 2 +ifloat: 3 ildouble: 1 ldouble: 1 Function: Real part of "clog_upward": -double: 8 -float: 5 -idouble: 8 -ifloat: 5 -ildouble: 6 -ldouble: 6 +double: 4 +float: 3 +idouble: 4 +ifloat: 3 +ildouble: 4 +ldouble: 4 Function: Imaginary part of "clog_upward": double: 1