From patchwork Wed Mar 19 19:33:02 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 331853 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id CBDB92C0096 for ; Thu, 20 Mar 2014 06:34:39 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=yWP q5x+bFamrSA+lYuNk9HXK73zzfRnTFupmlZxJEI0EE6wx43qJ7emumEWg0bhcaUz 8F9c/Y1m6BdrID7KeWf/3Xpq5e75uXikpkx3iyTaELXgkEuI57SRZMIoBpzIXxfR guutiMPgbDo//+7IcrWYiyOaloZBzXeGGDVwmcC4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type :content-transfer-encoding:mime-version; s=default; bh=097cD++HD xo5PMKczdXXYgdU2bM=; b=uK+9JfOdwYivt3j3WChbpOodf0Cqqp45qr/Mvm6TT q3DqGpnI8HPtVglhThKWFLw5ItzAeo45gxFJX+L0Ef+ppFP0pipxUyJbcD0f7FxW 5AM5C59pMF8r5UqMvuWD8yTRrRrTAXNg+5mpuv3k/LKgBldmEgDqCjZaW8vlYpRH 6E= Received: (qmail 27452 invoked by alias); 19 Mar 2014 19:33:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27415 invoked by uid 89); 19 Mar 2014 19:33:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, T_RP_MATCHES_RCVD, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: e28smtp03.in.ibm.com Received: from e28smtp03.in.ibm.com (HELO e28smtp03.in.ibm.com) (122.248.162.3) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 19 Mar 2014 19:32:58 +0000 Received: from /spool/local by e28smtp03.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Mar 2014 01:02:53 +0530 Received: from d28dlp02.in.ibm.com (9.184.220.127) by e28smtp03.in.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 20 Mar 2014 01:02:52 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id 822B63940049 for ; Thu, 20 Mar 2014 01:02:51 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s2JJWj8B1966394 for ; Thu, 20 Mar 2014 01:02:45 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s2JJWpLV027141 for ; Thu, 20 Mar 2014 01:02:51 +0530 Received: from [9.50.16.86] (dyn9050016086.mts.ibm.com [9.50.16.86] (may be forged)) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s2JJWmiw027021; Thu, 20 Mar 2014 01:02:50 +0530 Message-ID: <1395257582.17148.18.camel@gnopaine> Subject: [4.8, PATCH 16/26] Backport Power8 and LE support: PR56843 From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Wed, 19 Mar 2014 14:33:02 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14031919-3864-0000-0000-00000D1085C6 X-IsSubscribed: yes Hi, This patch (diff-pr56843) backports the fix for PR56843. Thanks, Bill [gcc] 2014-03-19 Bill Schmidt Backport from mainline 2013-04-05 Bill Schmidt PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. [gcc/testsuite] 2014-03-19 Bill Schmidt Backport from mainline 2013-04-05 Bill Schmidt PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -29417,54 +29417,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx emit_insn (gen_rtx_SET (VOIDmode, dst, r)); } -/* Newton-Raphson approximation of floating point divide with just 2 passes - (either single precision floating point, or newer machines with higher - accuracy estimates). Support both scalar and vector divide. Assumes no - trapping math and finite arguments. */ +/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, + add a reg_note saying that this was a division. Support both scalar and + vector divide. Assumes no trapping math and finite arguments. */ -static void -rs6000_emit_swdiv_high_precision (rtx dst, rtx n, rtx d) +void +rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) { enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, y1, u0, v0; - enum insn_code code = optab_handler (smul_optab, mode); - insn_gen_fn gen_mul = GEN_FCN (code); - rtx one = rs6000_load_constant_and_splat (mode, dconst1); - - gcc_assert (code != CODE_FOR_nothing); - - /* x0 = 1./d estimate */ - x0 = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (VOIDmode, x0, - gen_rtx_UNSPEC (mode, gen_rtvec (1, d), - UNSPEC_FRES))); - - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - (d * x0) */ - - e1 = gen_reg_rtx (mode); - rs6000_emit_madd (e1, e0, e0, e0); /* e1 = (e0 * e0) + e0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e1, x0, x0); /* y1 = (e1 * x0) + x0 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y1)); /* u0 = n * y1 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n); /* v0 = n - (d * u0) */ - - rs6000_emit_madd (dst, v0, y1, u0); /* dst = (v0 * y1) + u0 */ -} + rtx one, x0, e0, x1, xprev, eprev, xnext, enext, u, v; + int i; -/* Newton-Raphson approximation of floating point divide that has a low - precision estimate. Assumes no trapping math and finite arguments. */ + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) + passes++; -static void -rs6000_emit_swdiv_low_precision (rtx dst, rtx n, rtx d) -{ - enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, e2, y1, y2, y3, u0, v0, one; enum insn_code code = optab_handler (smul_optab, mode); insn_gen_fn gen_mul = GEN_FCN (code); @@ -29478,46 +29450,44 @@ rs6000_emit_swdiv_low_precision (rtx dst gen_rtx_UNSPEC (mode, gen_rtvec (1, d), UNSPEC_FRES))); - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - d * x0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e0, x0, x0); /* y1 = x0 + e0 * x0 */ - - e1 = gen_reg_rtx (mode); - emit_insn (gen_mul (e1, e0, e0)); /* e1 = e0 * e0 */ - - y2 = gen_reg_rtx (mode); - rs6000_emit_madd (y2, e1, y1, y1); /* y2 = y1 + e1 * y1 */ - - e2 = gen_reg_rtx (mode); - emit_insn (gen_mul (e2, e1, e1)); /* e2 = e1 * e1 */ - - y3 = gen_reg_rtx (mode); - rs6000_emit_madd (y3, e2, y2, y2); /* y3 = y2 + e2 * y2 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y3)); /* u0 = n * y3 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n); /* v0 = n - d * u0 */ - - rs6000_emit_madd (dst, v0, y3, u0); /* dst = u0 + v0 * y3 */ -} + /* Each iteration but the last calculates x_(i+1) = x_i * (2 - d * x_i). */ + if (passes > 1) { -/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, - add a reg_note saying that this was a division. Support both scalar and - vector divide. Assumes no trapping math and finite arguments. */ + /* e0 = 1. - d * x0 */ + e0 = gen_reg_rtx (mode); + rs6000_emit_nmsub (e0, d, x0, one); + + /* x1 = x0 + e0 * x0 */ + x1 = gen_reg_rtx (mode); + rs6000_emit_madd (x1, e0, x0, x0); + + for (i = 0, xprev = x1, eprev = e0; i < passes - 2; + ++i, xprev = xnext, eprev = enext) { + + /* enext = eprev * eprev */ + enext = gen_reg_rtx (mode); + emit_insn (gen_mul (enext, eprev, eprev)); + + /* xnext = xprev + enext * xprev */ + xnext = gen_reg_rtx (mode); + rs6000_emit_madd (xnext, enext, xprev, xprev); + } + + } else + xprev = x0; + + /* The last iteration calculates x_(i+1) = n * x_i * (2 - d * x_i). */ + + /* u = n * xprev */ + u = gen_reg_rtx (mode); + emit_insn (gen_mul (u, n, xprev)); + + /* v = n - (d * u) */ + v = gen_reg_rtx (mode); + rs6000_emit_nmsub (v, d, u, n); -void -rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) -{ - enum machine_mode mode = GET_MODE (dst); - - if (RS6000_RECIP_HIGH_PRECISION_P (mode)) - rs6000_emit_swdiv_high_precision (dst, n, d); - else - rs6000_emit_swdiv_low_precision (dst, n, d); + /* dst = (v * xprev) + u */ + rs6000_emit_madd (dst, v, xprev, u); if (note_p) add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d)); @@ -29532,7 +29502,16 @@ rs6000_emit_swrsqrt (rtx dst, rtx src) enum machine_mode mode = GET_MODE (src); rtx x0 = gen_reg_rtx (mode); rtx y = gen_reg_rtx (mode); - int passes = (TARGET_RECIP_PRECISION) ? 2 : 3; + + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) + passes++; + REAL_VALUE_TYPE dconst3_2; int i; rtx halfthree; Index: gcc-4_8-test/gcc/config/rs6000/rs6000.h =================================================================== --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.h +++ gcc-4_8-test/gcc/config/rs6000/rs6000.h @@ -673,9 +673,6 @@ extern unsigned char rs6000_recip_bits[] #define RS6000_RECIP_AUTO_RSQRTE_P(MODE) \ (rs6000_recip_bits[(int)(MODE)] & RS6000_RECIP_MASK_AUTO_RSQRTE) -#define RS6000_RECIP_HIGH_PRECISION_P(MODE) \ - ((MODE) == SFmode || (MODE) == V4SFmode || TARGET_RECIP_PRECISION) - /* The default CPU for TARGET_OPTION_OVERRIDE. */ #define OPTION_TARGET_CPU_DEFAULT TARGET_CPU_DEFAULT Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-1.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/recip-1.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-1.c @@ -3,8 +3,8 @@ /* { dg-options "-O2 -mrecip -ffast-math -mcpu=power6" } */ /* { dg-final { scan-assembler-times "frsqrte" 2 } } */ /* { dg-final { scan-assembler-times "fmsub" 2 } } */ -/* { dg-final { scan-assembler-times "fmul" 8 } } */ -/* { dg-final { scan-assembler-times "fnmsub" 4 } } */ +/* { dg-final { scan-assembler-times "fmul" 6 } } */ +/* { dg-final { scan-assembler-times "fnmsub" 3 } } */ double rsqrt_d (double a) Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-3.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/recip-3.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-3.c @@ -7,8 +7,8 @@ /* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 2 } } */ /* { dg-final { scan-assembler-times "xsrsqrtesp\|frsqrtes" 1 } } */ /* { dg-final { scan-assembler-times "xsmsub.sp\|fmsubs" 1 } } */ -/* { dg-final { scan-assembler-times "xsmulsp\|fmuls" 4 } } */ -/* { dg-final { scan-assembler-times "xsnmsub.sp\|fnmsubs" 2 } } */ +/* { dg-final { scan-assembler-times "xsmulsp\|fmuls" 2 } } */ +/* { dg-final { scan-assembler-times "xsnmsub.sp\|fnmsubs" 1 } } */ double rsqrt_d (double a) Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-4.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/recip-4.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-4.c @@ -7,8 +7,8 @@ /* { dg-final { scan-assembler-times "xvnmsub.dp" 2 } } */ /* { dg-final { scan-assembler-times "xvrsqrtesp" 1 } } */ /* { dg-final { scan-assembler-times "xvmsub.sp" 1 } } */ -/* { dg-final { scan-assembler-times "xvmulsp" 4 } } */ -/* { dg-final { scan-assembler-times "xvnmsub.sp" 2 } } */ +/* { dg-final { scan-assembler-times "xvmulsp" 2 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.sp" 1 } } */ #define SIZE 1024 Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-5.c =================================================================== --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/recip-5.c +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/recip-5.c @@ -6,6 +6,14 @@ /* { dg-final { scan-assembler-times "xvresp" 5 } } */ /* { dg-final { scan-assembler-times "xsredp\|fre\ " 2 } } */ /* { dg-final { scan-assembler-times "xsresp\|fres" 2 } } */ +/* { dg-final { scan-assembler-times "xsmulsp\|fmuls" 2 } } */ +/* { dg-final { scan-assembler-times "xsnmsub.sp\|fnmsubs" 2 } } */ +/* { dg-final { scan-assembler-times "xsmuldp\|fmul\ " 2 } } */ +/* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 4 } } */ +/* { dg-final { scan-assembler-times "xvmulsp" 7 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.sp" 5 } } */ +/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */ +/* { dg-final { scan-assembler-times "xvnmsub.dp" 8 } } */ #include