From patchwork Wed Mar 23 16:24:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evandro Menezes X-Patchwork-Id: 601281 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qVZf51yh9z9s9Z for ; Thu, 24 Mar 2016 03:24:45 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=xvu1vz6O; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:references:cc:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=QMWQQ9cnbmz/zGkE3 DGvzB/DtX+E6X+IsRGVFkS6TF8IRJHp3Fg+0XAP+nXPytQ5avbUIyD1kbZmgY7vO pXg7MsOUoyH3mykiT7QJHxcSmLJxtu27Oi8sd9sy8FzS1ZQtwpx8RQDXqbyYWtQF G5XmlNV2G4QBwfLWITSbXdgyIE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:references:cc:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=yQcACrcWGnSonc42hkVLelU MQZ8=; b=xvu1vz6ORRkaDv8Q7+FTkY2Oxjy9+aMYI7kr03pqTUWkPGExyiRDb8f Z4ywEFCsZ0btMussIFElLsQDWLHD0f9ij/ju5sysPl/Xdj0d+jcxmDty+HJrMv46 mwXF690+dFb5Z0ECySM3lFpm55C8wqESu6rraeXAC+MnOX85qhcI= Received: (qmail 42619 invoked by alias); 23 Mar 2016 16:24:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 41682 invoked by uid 89); 23 Mar 2016 16:24:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=newton, Newton, zilch, host_wide_int X-HELO: usmailout4.samsung.com Received: from mailout4.w2.samsung.com (HELO usmailout4.samsung.com) (211.189.100.14) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 23 Mar 2016 16:24:12 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by usmailout4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O4I008M82WAWY20@usmailout4.samsung.com> for gcc-patches@gcc.gnu.org; Wed, 23 Mar 2016 12:24:10 -0400 (EDT) Received: from ussync1.samsung.com ( [203.254.195.81]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 6C.C6.07641.A23C2F65; Wed, 23 Mar 2016 12:24:10 -0400 (EDT) Received: from [172.31.207.194] ([105.140.31.10]) by ussync1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O4I0038M2W93I00@ussync1.samsung.com>; Wed, 23 Mar 2016 12:24:10 -0400 (EDT) From: Evandro Menezes Subject: Re: [AArch64] Emit division using the Newton series To: GCC Patches References: <56EB0EDF.3060401@samsung.com> Cc: James Greenhalgh , Wilco Dijkstra , Andrew Pinski Message-id: <56F2C329.10405@samsung.com> Date: Wed, 23 Mar 2016 11:24:09 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-version: 1.0 In-reply-to: <56EB0EDF.3060401@samsung.com> Content-type: multipart/mixed; boundary=------------070801040602010401030102 X-IsSubscribed: yes On 03/17/16 15:09, Evandro Menezes wrote: > This patch implements FP division by an approximation using the Newton > series. > > With this patch, DF division is sped up by over 100% and SF division, > zilch, both on A57 and on M1. gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros. * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro. (aarch64_emit_approx_div): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Define new function. * config/aarch64/aarch64.md ("div3"): New expansion. * config/aarch64/aarch64-simd.md ("div3"): Likewise. This version of the patch cleans up the changes to the MD files and optimizes the division when the numerator is 1.0. Again, I look forward to your feedback. Thank you, From 5cd2a628086af3656b3242f0c4f41784646f52b1 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 17 Mar 2016 14:44:55 -0500 Subject: [PATCH] [AArch64] Emit division using the Newton series 2016-03-17 Evandro Menezes gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros. * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro. (aarch64_emit_approx_div): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Define new function. * config/aarch64/aarch64.md ("div3"): New expansion. * config/aarch64/aarch64-simd.md ("div3"): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 4 ++ gcc/config/aarch64/aarch64-simd.md | 14 +++++- gcc/config/aarch64/aarch64-tuning-flags.def | 3 +- gcc/config/aarch64/aarch64.c | 73 +++++++++++++++++++++++++++++ gcc/config/aarch64/aarch64.md | 19 ++++++-- 5 files changed, 107 insertions(+), 6 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index dced209..52c4838 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags }; #undef AARCH64_EXTRA_TUNING_OPTION +#define AARCH64_EXTRA_TUNE_APPROX_DIV \ + (AARCH64_EXTRA_TUNE_APPROX_DIV_DF | AARCH64_EXTRA_TUNE_APPROX_DIV_SF) + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); @@ -362,6 +365,7 @@ void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); void aarch64_save_restore_target_globals (tree); void aarch64_emit_approx_rsqrt (rtx, rtx); +bool aarch64_emit_approx_div (rtx, rtx, rtx); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bd73bce..99be92e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1509,7 +1509,19 @@ [(set_attr "type" "neon_fp_mul_")] ) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:VDQF 0 "register_operand") + (div:VDQF (match_operand:VDQF 1 "general_operand") + (match_operand:VDQF 2 "register_operand")))] + "TARGET_SIMD" +{ + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) + DONE; + + operands[1] = force_reg (mode, operands[1]); +}) + +(define_insn "*div3" [(set (match_operand:VDQF 0 "register_operand" "=w") (div:VDQF (match_operand:VDQF 1 "register_operand" "w") (match_operand:VDQF 2 "register_operand" "w")))] diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 7e45a0c..ececdc1 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,5 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT) - +AARCH64_EXTRA_TUNING_OPTION ("approx_div", APPROX_DIV_DF) +AARCH64_EXTRA_TUNING_OPTION ("approx_divf", APPROX_DIV_SF) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12e498d..2c878ce 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7540,6 +7540,79 @@ aarch64_emit_approx_rsqrt (rtx dst, rtx src) emit_move_insn (dst, x0); } +/* Emit the instruction sequence to compute the approximation for a reciprocal. */ + +bool +aarch64_emit_approx_div (rtx quo, rtx num, rtx div) +{ + machine_mode mode = GET_MODE (quo); + + if (!flag_finite_math_only + || flag_trapping_math + || !flag_unsafe_math_optimizations + || optimize_function_for_size_p (cfun) + || ((GET_MODE_INNER (mode) != SFmode + || !(aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_APPROX_DIV_SF)) + && (GET_MODE_INNER (mode) != DFmode + || !(aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_APPROX_DIV_DF)))) + return false; + + /* Estimate the approximate reciprocal. */ + rtx xrcp = gen_reg_rtx (mode); + switch (mode) + { + case SFmode: + emit_insn (gen_aarch64_frecpesf (xrcp, div)); break; + case V2SFmode: + emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break; + case V4SFmode: + emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break; + case DFmode: + emit_insn (gen_aarch64_frecpedf (xrcp, div)); break; + case V2DFmode: + emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break; + default: + gcc_unreachable (); + } + + /* Iterate over the series twice for SF and thrice for DF. */ + int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2; + + rtx xtmp = gen_reg_rtx (mode); + while (iterations--) + { + switch (mode) + { + case SFmode: + emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break; + case V2SFmode: + emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break; + case V4SFmode: + emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break; + case DFmode: + emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break; + case V2DFmode: + emit_insn (gen_aarch64_frecpsv2df (xtmp, xrcp, div)); break; + default: + gcc_unreachable (); + } + + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp)); + } + + if (num != CONST1_RTX (mode)) + { + rtx xnum = force_reg (mode, num); + emit_set_insn (quo, gen_rtx_MULT (mode, xnum, xrcp)); + } + else + emit_move_insn (quo, xrcp); + + return true; +} + /* Return the number of instructions that can be issued per cycle. */ static int aarch64_sched_issue_rate (void) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 68676c9..985915e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4647,11 +4647,22 @@ [(set_attr "type" "fmul")] ) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:GPF 0 "register_operand") + (div:GPF (match_operand:GPF 1 "general_operand") + (match_operand:GPF 2 "register_operand")))] + "TARGET_SIMD" +{ + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) + DONE; + + operands[1] = force_reg (mode, operands[1]); +}) + +(define_insn "*div3" [(set (match_operand:GPF 0 "register_operand" "=w") - (div:GPF - (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w")))] + (div:GPF (match_operand:GPF 1 "register_operand" "w") + (match_operand:GPF 2 "register_operand" "w")))] "TARGET_FLOAT" "fdiv\\t%0, %1, %2" [(set_attr "type" "fdiv")] -- 1.9.1