From patchwork Fri Jan 15 16:07:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Edelsohn X-Patchwork-Id: 568196 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 6AFEB140C04 for ; Sat, 16 Jan 2016 03:08:11 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=TG7SeeO3; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; q= dns; s=default; b=gbI9Mgu70jZ8ciuF7uBwfzl9YCbWgKBood7aZf+V/cBiA3 ss8wqE2H9qraqzQjk0+zZ1xdlqRvYVzgYCb3VcE7FzTZ5REVdg1/rRjKCFJIx0ic +D2sUOzwDOaWfwUWMC0OtHSLFxPknLyfft8is1tkyRLUO7dF+cB8RDaukub+I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:date:message-id:subject:from:to:content-type; s= default; bh=j8ztofsUNoiah6t7tZykf+ZBZyU=; b=TG7SeeO3EiXNoxn8KNtC B0IbjoYfyICvcpd0aMHB9Uq4hMBLXWIy4kiKzaUfmHs0lcIPomXIVRO2ASR9R6wn HRcu8KNHW8VK5CvPGQA/+nGq9wKoePNPO+e7N27JfNzMDE9BUwmzEQQNAahzv2YM GGwkWU/iyz/yBn8OF7At7p8= Received: (qmail 111554 invoked by alias); 15 Jan 2016 16:08:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 111541 invoked by uid 89); 15 Jan 2016 16:08:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 spammy=add_reg_note, Apply, estimate, correction X-HELO: mail-lf0-f46.google.com Received: from mail-lf0-f46.google.com (HELO mail-lf0-f46.google.com) (209.85.215.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 15 Jan 2016 16:08:01 +0000 Received: by mail-lf0-f46.google.com with SMTP id 17so83368660lfz.1 for ; Fri, 15 Jan 2016 08:08:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=+FxSySiqgjhWgKwWMEG+AKZVcgh5aIgirmDlrcJ9NpE=; b=Fj329LjR1obki6rVxPi5uqXPSEICDsZZlnRGF6n+hM6B7F37oHsxfyCHz1ApDFCrTI TfpVTYI0C+qsKODV3XDB3wKZZbe/ifCOPwFJJ6rM3MPBViq+2q0PIGro3gR1jWi2QJQf 3BiLe5y0/AJGLwthjd7/NxWY80Pxys9M+yfK2N8uVd+eQt1C3TvCKpPwhNsu/zh8kjMP fZzjU/lavFNvAKKsBdSUfKSJf05o9e4FtC34BVdaPEK2Dgb2cXeJ68H4owef+coay3ZU ArrO2qQxvDeXF7DxLojE8BbdGzyPohfF6f2a2Ut5o5ARQmlZOjt8/wpoj/QAEe8wGMrH hcrg== X-Gm-Message-State: ALoCoQkf1XQE0RoZ9aRLeIlZnrqdDYtW6Y3IiWWssNLmdyVuDvbZk5C2rBUrExVTGyj8iPLmX2XsnSiPcrQYKgZyQ5qwM0MKNA== MIME-Version: 1.0 X-Received: by 10.25.141.129 with SMTP id p123mr3095285lfd.65.1452874077715; Fri, 15 Jan 2016 08:07:57 -0800 (PST) Received: by 10.114.80.200 with HTTP; Fri, 15 Jan 2016 08:07:57 -0800 (PST) Date: Fri, 15 Jan 2016 11:07:57 -0500 Message-ID: Subject: PR68609 From: David Edelsohn To: GCC Patches My initial implementation of software sqrt based on estimate was fragile for denormal inputs. This revised version converts both sqrt and rsqrt to use Goldschmidt's Algorithm and calculates sqrt through an iterative correction to a sqrt estimate. Because sqrt only is profitable for 1 iteration, this patch also restricts swsqrt to processors that generate a high precision estimate. Bootstrapped on powerpc-ibm-aix7.1.0.0 and powerpc64le-linux. Thanks, David PR target/68609 * config/rs6000/rs6000.c (rs6000_emit_msub): Delete. (rs6000_emit_swsqrt): Convert to Goldschmidt's Algorithm * config/rs6000/rs6000.md (sqrt2): Limit swsqrt to high precision estimate. Index: rs6000.c =================================================================== --- rs6000.c (revision 232326) +++ rs6000.c (working copy) @@ -32769,29 +32769,6 @@ emit_move_insn (target, dst); } -/* Generate a FMSUB instruction: dst = fma(m1, m2, -a). */ - -static void -rs6000_emit_msub (rtx target, rtx m1, rtx m2, rtx a) -{ - machine_mode mode = GET_MODE (target); - rtx dst; - - /* Altivec does not support fms directly; - generate in terms of fma in that case. */ - if (optab_handler (fms_optab, mode) != CODE_FOR_nothing) - dst = expand_ternary_op (mode, fms_optab, m1, m2, a, target, 0); - else - { - a = expand_unop (mode, neg_optab, a, NULL_RTX, 0); - dst = expand_ternary_op (mode, fma_optab, m1, m2, a, target, 0); - } - gcc_assert (dst != NULL); - - if (dst != target) - emit_move_insn (target, dst); -} - /* Generate a FNMSUB instruction: dst = -fma(m1, m2, -a). */ static void @@ -32890,15 +32867,16 @@ add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d)); } -/* Newton-Raphson approximation of single/double-precision floating point - rsqrt. Assumes no trapping math and finite arguments. */ +/* Goldschmidt's Algorithm for single/double-precision floating point + sqrt and rsqrt. Assumes no trapping math and finite arguments. */ void rs6000_emit_swsqrt (rtx dst, rtx src, bool recip) { machine_mode mode = GET_MODE (src); - rtx x0 = gen_reg_rtx (mode); - rtx y = gen_reg_rtx (mode); + rtx e = gen_reg_rtx (mode); + rtx g = gen_reg_rtx (mode); + rtx h = gen_reg_rtx (mode); /* Low precision estimates guarantee 5 bits of accuracy. High precision estimates guarantee 14 bits of accuracy. SFmode @@ -32909,55 +32887,68 @@ if (mode == DFmode || mode == V2DFmode) passes++; - REAL_VALUE_TYPE dconst3_2; int i; - rtx halfthree; + rtx mhalf; enum insn_code code = optab_handler (smul_optab, mode); insn_gen_fn gen_mul = GEN_FCN (code); gcc_assert (code != CODE_FOR_nothing); - /* Load up the constant 1.5 either as a scalar, or as a vector. */ - real_from_integer (&dconst3_2, VOIDmode, 3, SIGNED); - SET_REAL_EXP (&dconst3_2, REAL_EXP (&dconst3_2) - 1); + mhalf = rs6000_load_constant_and_splat (mode, dconsthalf); - halfthree = rs6000_load_constant_and_splat (mode, dconst3_2); + /* e = rsqrt estimate */ + emit_insn (gen_rtx_SET (e, gen_rtx_UNSPEC (mode, gen_rtvec (1, src), + UNSPEC_RSQRT))); - /* x0 = rsqrt estimate */ - emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, src), - UNSPEC_RSQRT))); - /* If (src == 0.0) filter infinity to prevent NaN for sqrt(0.0). */ if (!recip) { rtx zero = force_reg (mode, CONST0_RTX (mode)); - rtx target = emit_conditional_move (x0, GT, src, zero, mode, - x0, zero, mode, 0); - if (target != x0) - emit_move_insn (x0, target); + rtx target = emit_conditional_move (e, GT, src, zero, mode, + e, zero, mode, 0); + if (target != e) + emit_move_insn (e, target); } - /* y = 0.5 * src = 1.5 * src - src -> fewer constants */ - rs6000_emit_msub (y, src, halfthree, src); + /* g = sqrt estimate. */ + emit_insn (gen_mul (g, e, src)); + /* h = 1/(2*sqrt) estimate. */ + emit_insn (gen_mul (h, e, mhalf)); - for (i = 0; i < passes; i++) + if (recip) { - rtx x1 = gen_reg_rtx (mode); - rtx u = gen_reg_rtx (mode); - rtx v = gen_reg_rtx (mode); + if (passes == 1) + { + rtx t = gen_reg_rtx (mode); + rs6000_emit_nmsub (t, g, h, mhalf); + /* Apply correction directly to 1/rsqrt estimate. */ + rs6000_emit_madd (dst, e, t, e); + } + else + { + for (i = 0; i < passes; i++) + { + rtx t1 = gen_reg_rtx (mode); + rtx g1 = gen_reg_rtx (mode); + rtx h1 = gen_reg_rtx (mode); - /* x1 = x0 * (1.5 - y * (x0 * x0)) */ - emit_insn (gen_mul (u, x0, x0)); - rs6000_emit_nmsub (v, y, u, halfthree); - emit_insn (gen_mul (x1, x0, v)); - x0 = x1; + rs6000_emit_nmsub (t1, g, h, mhalf); + rs6000_emit_madd (g1, g, t1, g); + rs6000_emit_madd (h1, h, t1, h); + + g = g1; + h = h1; + } + /* Multiply by 2 for 1/rsqrt. */ + emit_insn (gen_add3_insn (dst, h, h)); + } } - - /* If not reciprocal, multiply by src to produce sqrt. */ - if (!recip) - emit_insn (gen_mul (dst, src, x0)); else - emit_move_insn (dst, x0); + { + rtx t = gen_reg_rtx (mode); + rs6000_emit_nmsub (t, g, h, mhalf); + rs6000_emit_madd (dst, g, t, g); + } return; } Index: rs6000.md =================================================================== --- rs6000.md (revision 232326) +++ rs6000.md (working copy) @@ -4444,6 +4444,7 @@ && (TARGET_PPC_GPOPT || (mode == SFmode && TARGET_XILINX_FPU))" { if (mode == SFmode + && TARGET_RECIP_PRECISION && RS6000_RECIP_HAVE_RSQRTE_P (mode) && !optimize_function_for_size_p (cfun) && flag_finite_math_only && !flag_trapping_math