From patchwork Thu Oct 5 13:05:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 1843897 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gjlay.de header.i=@gjlay.de header.a=rsa-sha256 header.s=strato-dkim-0002 header.b=oojnWjhj; dkim=pass header.d=gjlay.de header.i=@gjlay.de header.a=ed25519-sha256 header.s=strato-dkim-0003 header.b=mTmD1XXt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4S1Wyl0TrZz1yqD for ; Fri, 6 Oct 2023 00:06:06 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 92F763882AD9 for ; Thu, 5 Oct 2023 13:06:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.162]) by sourceware.org (Postfix) with ESMTPS id 2733F388201C for ; Thu, 5 Oct 2023 13:05:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2733F388201C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gjlay.de Authentication-Results: sourceware.org; spf=none smtp.mailfrom=gjlay.de ARC-Seal: i=1; a=rsa-sha256; t=1696511149; cv=none; d=strato.com; s=strato-dkim-0002; b=NF3JaD2XcaPELPQay67Jtwc+7ITy3/T+g8Dhs0w56/AE/z4iMqCbS4TXKuhmzmLGRb R21sZ9EfWSo0e31nRpYT354IN3Jf4ZQef4Ez1xs5BuNqGgGYBQVJrp2M6d5diwY0nPFV qhCQMtQmDvlVINYFM8YQVQuRgA1mFVVhH0WB30HnUGJLbhQ9Q7H3eQWdq4TYRqZe7BLw eMs2l1ChpUOrr7RXnW7Hu7NRfku8OQODtWEWZJT8jsxVGh3VSOLwI8VUp40QdyBUaCDE w/YrY0V2D1Z3ejjJOr6zJD+Jb6f6/7m0riVw0i2i1H8S12nq4oyVsUG+B6FBaeOs9dPw wuBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1696511149; s=strato-dkim-0002; d=strato.com; h=Subject:From:To:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=Q+tyPg/hhfmhZm8IRpIt05H1Z0dl1L5E1p20IXQLgJU=; b=dIXTtPDMXocvBT0WkzQNt6+T2gd8AD7XWgFjz/r3IzijDkXKNRzjrDyEgcjCjWI00N Dsu2GGnvP+7z4eYpD8qZkRkd8zwJDFxdh/EJikuzGkBdhcc5P8JBa3uU/9xYhb8vtwV/ cVGOvf8fGthue9xH2UtyFsodl8VCBejg0r7CL8qUZXJh/fwEh7+xPqEyquhhHh8wWcP0 9TsL5uJl+CPNDOdOTpvSY2Ehk9t8Oscw4//5eW45iC/+ml5Ta7TrKnR3FBj4Zn9H6Rbn AkjEHf4Sc9QnQ7Ve89SeOAH5GFYdLrGEjcc3v59heDLSFXSqfwgFwO9rxOoGgVGyxFB2 BNTw== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1696511149; s=strato-dkim-0002; d=gjlay.de; h=Subject:From:To:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=Q+tyPg/hhfmhZm8IRpIt05H1Z0dl1L5E1p20IXQLgJU=; b=oojnWjhjjzpgWW9vB4cxx7HHx1L1Y8alT6Cgxifs5r+Es/inkq1DRBxqapox1oLtfz 1P/3w9zgPtzpp9JO7tvMiLnqo6p092kTSAzaTUfRL1DM4TScnpOBdWzeTw2sFo/KjNQH PmWoDqRTLjRdmbmqdFNH961bDskO4Wq9p+YCZNicueX/eE6NrkK1WKlfovG6B7zDo71H TiU4SfbkNHZG1REjeoP0MuT0Qxx+LAzHLVsA0A/iBNNW23XMRr7xeKf2xBjbLBfdmHci FGwiMqDbEfClvtVlnByjaQ/XmhuhNytNGbWzNqHHjQUNJ7HQD48Scn9ogE+3o7Nb08vy O/bg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1696511149; s=strato-dkim-0003; d=gjlay.de; h=Subject:From:To:Date:Message-ID:Cc:Date:From:Subject:Sender; bh=Q+tyPg/hhfmhZm8IRpIt05H1Z0dl1L5E1p20IXQLgJU=; b=mTmD1XXtmwRcKemOzIoKCBsMwALvBjRc5Dzu+Bww0uDlnharA8yJdCmHU/yuSjCXsO 5jjS3j0vHa/DMjFZgvAA== X-RZG-AUTH: ":LXoWVUeid/7A29J/hMvvT3koxZnKT7Qq0xotTetVnKkRmM69o2y+Liq3MepKTA==" Received: from [192.168.2.102] by smtp.strato.de (RZmta 49.8.2 DYNA|AUTH) with ESMTPSA id a0474az95D5niVF (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate) for ; Thu, 5 Oct 2023 15:05:49 +0200 (CEST) Message-ID: <8bf79b39-f852-747b-7a35-60a74e15b4e8@gjlay.de> Date: Thu, 5 Oct 2023 15:05:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Content-Language: en-US To: gcc-patches@gcc.gnu.org From: Georg-Johann Lay Subject: [avr,committed] Use monic denominator polynomials to save a multiplication. X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This is a small tweak in LibF7 to save one multiplication in computation of denominator polynomials. The polynomials are monic now, and f7_horner needs one multiplication less. Johann --- LibF7: Use monic denominator polynomials to save a multiplication. libgcc/config/avr/libf7/ * libf7.h (F7_FLAGNO_plusx, F7_FLAG_plusx): New macros. * libf7.c (f7_horner): Handle F7_FLAG_plusx in highest coefficient. * libf7-const.def [F7MOD_atan_]: Denominator: Set F7_FLAG_plusx and omit highest term. [F7MOD_asinacos_]: Use rational function with normalized denominator. diff --git a/libgcc/config/avr/libf7/libf7-const.def b/libgcc/config/avr/libf7/libf7-const.def index 8764c81ffa4..0e4c4d8701e 100644 --- a/libgcc/config/avr/libf7/libf7-const.def +++ b/libgcc/config/avr/libf7/libf7-const.def @@ -121,8 +121,7 @@ F7_CONST_DEF (X, 0, 0xd6,0xa5,0x2d,0x73,0x34,0xd8,0x60, 11) F7_CONST_DEF (X, 0, 0xe5,0x08,0xb8,0x24,0x20,0x81,0xe7, 11) F7_CONST_DEF (X, 0, 0xe3,0xb3,0x35,0xfa,0xbf,0x1f,0x81, 10) F7_CONST_DEF (X, 0, 0xd3,0x89,0x2b,0xb6,0x3e,0x2e,0x05, 8) -F7_CONST_DEF (X, 0, 0x9f,0xab,0xe9,0xd9,0x35,0xed,0x27, 5) -F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0) +F7_CONST_DEF (X, 8, 0x9f,0xab,0xe9,0xd9,0x35,0xed,0x27, 5) #endif #elif defined (SWIFT_3_4) @@ -147,24 +146,22 @@ F7_CONST_DEF (pi_6, 0, 0x86,0x0a,0x91,0xc1,0x6b,0x9b,0x2c, -1) #endif // which MiniMax #elif defined (F7MOD_asinacos_) -// Relative error < 5.6E-18, quality = 1.00000037 (ideal = 1). +// f(x) = asin(w) / w, w = sqrt(x/2), w in [0, 0.5]. +// Relative error < 4.9E-18, Q10 = 21.7 #if defined (FOR_NUMERATOR) -// 0.99999999999999999442491073135027586203 - 1.035234033892197627842731209x + 0.35290206232981519813422591897720574012x^2 - 0.04333483170641685705612351801x^3 + 0.0012557428614630796315205218507940285622x^4 + 0.0000084705471128435769021718764878041684288x^5 -// p = Poly ([Decimal('0.99999999999999999442491073135027586203'), Decimal('-1.0352340338921976278427312087167692142'), Decimal('0.35290206232981519813422591897720574012'), Decimal('-0.043334831706416857056123518013656946650'), Decimal('0.0012557428614630796315205218507940285622'), Decimal('0.0000084705471128435769021718764878041684288')]) -F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0) -F7_CONST_DEF (X, 1, 0x84,0x82,0x8c,0x7f,0xa2,0xf6,0x65, 0) -F7_CONST_DEF (X, 0, 0xb4,0xaf,0x94,0x40,0xcb,0x86,0x69, -2) -F7_CONST_DEF (X, 1, 0xb1,0x7f,0xdd,0x4f,0x4e,0xbe,0x1d, -5) -F7_CONST_DEF (X, 0, 0xa4,0x97,0xbd,0x0b,0x59,0xc9,0x25, -10) -F7_CONST_DEF (X, 0, 0x8e,0x1c,0xb9,0x0b,0x50,0x6c,0xce, -17) +// -41050.4389591195072042579 + 43293.8985171424974364797 x - 15230.0535110759003163511 x^2 + 1996.35047839480810448269 x^3 - 72.2973010025603956782375 x^4 +F7_CONST_DEF (X, 1, 0xa0,0x5a,0x70,0x5f,0x9f,0xf6,0x90, 15) +F7_CONST_DEF (X, 0, 0xa9,0x1d,0xe6,0x05,0x38,0x2d,0xec, 15) +F7_CONST_DEF (X, 1, 0xed,0xf8,0x36,0xcb,0x9b,0x83,0xdd, 13) +F7_CONST_DEF (X, 0, 0xf9,0x8b,0x37,0x1e,0x77,0x74,0xf9, 10) +F7_CONST_DEF (X, 1, 0x90,0x98,0x37,0xd6,0x46,0x21,0x3c, 6) #elif defined (FOR_DENOMINATOR) -// 1 - 1.118567367225532923662371649x + 0.42736600959872448854098334016758333519x^2 - 0.06355588484963171659942148390x^3 + 0.0028820878185134035637440105959294542908x^4 -// q = Poly ([Decimal('1'), Decimal('-1.1185673672255329236623716486696411533'), Decimal('0.42736600959872448854098334016758333519'), Decimal('-0.063555884849631716599421483898013782858'), Decimal('0.0028820878185134035637440105959294542908')]) -F7_CONST_DEF (X, 0, 0x80,0x00,0x00,0x00,0x00,0x00,0x00, 0) -F7_CONST_DEF (X, 1, 0x8f,0x2d,0x37,0x2a,0x4d,0xa1,0x57, 0) -F7_CONST_DEF (X, 0, 0xda,0xcf,0xb7,0xb5,0x4c,0x0d,0xee, -2) -F7_CONST_DEF (X, 1, 0x82,0x29,0x96,0x77,0x2e,0x19,0xc7, -4) -F7_CONST_DEF (X, 0, 0xbc,0xe1,0x68,0xec,0xba,0x20,0x29, -9) +// -41050.4389591195074048679 + 46714.7684304025268691353 x - 18353.2551497967388796235 x^2 + 2878.9626098308300020834 x^3 - 150.822900775648362380508 x^4 + x^5 +F7_CONST_DEF (X, 1, 0xa0,0x5a,0x70,0x5f,0x9f,0xf6,0x91, 15) +F7_CONST_DEF (X, 0, 0xb6,0x7a,0xc4,0xb7,0xda,0xd8,0x1b, 15) +F7_CONST_DEF (X, 1, 0x8f,0x62,0x82,0xa2,0xfe,0x81,0x26, 14) +F7_CONST_DEF (X, 0, 0xb3,0xef,0x66,0xd9,0x90,0xe3,0x91, 11) +F7_CONST_DEF (X, 9, 0x96,0xd2,0xa9,0xa0,0x0f,0x43,0x44, 7) #endif #elif defined (F7MOD_sincos_) diff --git a/libgcc/config/avr/libf7/libf7.c b/libgcc/config/avr/libf7/libf7.c index 8fb57ef90cc..373a8a55d90 100644 --- a/libgcc/config/avr/libf7/libf7.c +++ b/libgcc/config/avr/libf7/libf7.c @@ -1527,6 +1527,9 @@ void f7_horner (f7_t *cc, const f7_t *xx, uint8_t n_coeff, const f7_t *coeff, f7_copy_flash (yy, pcoeff); + if (yy->flags & F7_FLAG_plusx) + f7_Iadd (yy, xx); + while (1) { --pcoeff; diff --git a/libgcc/config/avr/libf7/libf7.h b/libgcc/config/avr/libf7/libf7.h index 03fe6abe839..3f81b5f1f88 100644 --- a/libgcc/config/avr/libf7/libf7.h +++ b/libgcc/config/avr/libf7/libf7.h @@ -47,6 +47,11 @@ -- f7_t.is_nan (NaN) -- f7_t.is_inf (+Inf or -Inf) -- f7_t.sign (negative or -Inf). + -- _plusx: This flag is used by f7_horner. Is is set in some + polynomial coefficients from libf7-const.def to indicate that + the respective polynomial has a leading coefficient of 1. + The flag is set in the second-highest coefficient, and the leading + coefficient is omitted. B) The flags that are returned by f7_classify(). This are the flags from A) together with @@ -56,6 +61,7 @@ #define F7_FLAGNO_sign 0 #define F7_FLAGNO_zero 1 #define F7_FLAGNO_nan 2 +#define F7_FLAGNO_plusx 3 #define F7_FLAGNO_inf 7 #define F7_HAVE_Inf 1 @@ -64,6 +70,7 @@ #define F7_FLAG_sign (1 << F7_FLAGNO_sign) #define F7_FLAG_zero (1 << F7_FLAGNO_zero) #define F7_FLAG_nan (1 << F7_FLAGNO_nan) +#define F7_FLAG_plusx (1 << F7_FLAGNO_plusx) #define F7_FLAG_inf (F7_HAVE_Inf << F7_FLAGNO_inf) // Flags that might be set in f7_t.flags.