From patchwork Wed Mar 23 16:24:09 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Evandro Menezes <e.menezes@samsung.com>
X-Patchwork-Id: 601281
Return-Path: 
 <gcc-patches-return-423769-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3qVZf51yh9z9s9Z
	for <incoming@patchwork.ozlabs.org>;
	Thu, 24 Mar 2016 03:24:45 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b=xvu1vz6O; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:references:cc:message-id:date:mime-version
	:in-reply-to:content-type; q=dns; s=default; b=QMWQQ9cnbmz/zGkE3
	DGvzB/DtX+E6X+IsRGVFkS6TF8IRJHp3Fg+0XAP+nXPytQ5avbUIyD1kbZmgY7vO
	pXg7MsOUoyH3mykiT7QJHxcSmLJxtu27Oi8sd9sy8FzS1ZQtwpx8RQDXqbyYWtQF
	G5XmlNV2G4QBwfLWITSbXdgyIE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:references:cc:message-id:date:mime-version
	:in-reply-to:content-type; s=default; bh=yQcACrcWGnSonc42hkVLelU
	MQZ8=; b=xvu1vz6ORRkaDv8Q7+FTkY2Oxjy9+aMYI7kr03pqTUWkPGExyiRDb8f
	Z4ywEFCsZ0btMussIFElLsQDWLHD0f9ij/ju5sysPl/Xdj0d+jcxmDty+HJrMv46
	mwXF690+dFb5Z0ECySM3lFpm55C8wqESu6rraeXAC+MnOX85qhcI=
Received: (qmail 42619 invoked by alias); 23 Mar 2016 16:24:31 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 41682 invoked by uid 89); 23 Mar 2016 16:24:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL, BAYES_00,
	KAM_LAZY_DOMAIN_SECURITY,
	T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=newton,
	Newton, zilch, host_wide_int
X-HELO: usmailout4.samsung.com
Received: from mailout4.w2.samsung.com (HELO usmailout4.samsung.com)
	(211.189.100.14) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Wed, 23 Mar 2016 16:24:12 +0000
Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr
	[203.254.195.115]) by usmailout4.samsung.com (Oracle
	Communications Messaging Server 7.0.5.31.0 64bit (built May 5
	2014)) with ESMTP id <0O4I008M82WAWY20@usmailout4.samsung.com> for
	gcc-patches@gcc.gnu.org; Wed, 23 Mar 2016 12:24:10 -0400 (EDT)
Received: from ussync1.samsung.com ( [203.254.195.81])	by
	uscpsbgm2.samsung.com (USCPMTA) with SMTP id
	6C.C6.07641.A23C2F65; Wed, 23 Mar 2016 12:24:10 -0400 (EDT)
Received: from [172.31.207.194] ([105.140.31.10]) by ussync1.samsung.com
	(Oracle Communications Messaging Server 7.0.5.31.0 64bit
	(built May 5 2014)) with ESMTPA id
	<0O4I0038M2W93I00@ussync1.samsung.com>;
	Wed, 23 Mar 2016 12:24:10 -0400 (EDT)
From: Evandro Menezes <e.menezes@samsung.com>
Subject: Re: [AArch64] Emit division using the Newton series
To: GCC Patches <gcc-patches@gcc.gnu.org>
References: <56EB0EDF.3060401@samsung.com>
Cc: James Greenhalgh <james.greenhalgh@arm.com>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	Andrew Pinski <pinskia@gmail.com>
Message-id: <56F2C329.10405@samsung.com>
Date: Wed, 23 Mar 2016 11:24:09 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-version: 1.0
In-reply-to: <56EB0EDF.3060401@samsung.com>
Content-type: multipart/mixed; boundary=------------070801040602010401030102
X-IsSubscribed: yes

On 03/17/16 15:09, Evandro Menezes wrote:
> This patch implements FP division by an approximation using the Newton
> series.
>
> With this patch, DF division is sped up by over 100% and SF division,
> zilch, both on A57 and on M1.

         gcc/
             * config/aarch64/aarch64-tuning-flags.def
             (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
             * config/aarch64/aarch64-protos.h
             (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
             (aarch64_emit_approx_div): Declare new function.
             * config/aarch64/aarch64.c
             (aarch64_emit_approx_div): Define new function.
             * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
             * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.


This version of the patch cleans up the changes to the MD files and 
optimizes the division when the numerator is 1.0.

Again, I look forward to your feedback.

Thank you,

From 5cd2a628086af3656b3242f0c4f41784646f52b1 Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Thu, 17 Mar 2016 14:44:55 -0500
Subject: [PATCH] [AArch64] Emit division using the Newton series

2016-03-17  Evandro Menezes  <e.menezes@samsung.com>

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
	* config/aarch64/aarch64-protos.h
	(AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
	(aarch64_emit_approx_div): Declare new function.
	* config/aarch64/aarch64.c
	(aarch64_emit_approx_div): Define new function.
	* config/aarch64/aarch64.md ("div<mode>3"): New expansion.
	* config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h         |  4 ++
 gcc/config/aarch64/aarch64-simd.md          | 14 +++++-
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 +-
 gcc/config/aarch64/aarch64.c                | 73 +++++++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md               | 19 ++++++--
 5 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index dced209..52c4838 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
+#define AARCH64_EXTRA_TUNE_APPROX_DIV \
+        (AARCH64_EXTRA_TUNE_APPROX_DIV_DF | AARCH64_EXTRA_TUNE_APPROX_DIV_SF)
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -362,6 +365,7 @@ void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
 void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_div (rtx, rtx, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..99be92e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
   [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
 )
 
-(define_insn "div<mode>3"
+(define_expand "div<mode>3"
+ [(set (match_operand:VDQF 0 "register_operand")
+       (div:VDQF (match_operand:VDQF 1 "general_operand")
+		 (match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+    DONE;
+
+  operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+(define_insn "*div<mode>3"
  [(set (match_operand:VDQF 0 "register_operand" "=w")
        (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
 		 (match_operand:VDQF 2 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..ececdc1 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,5 @@
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("approx_div", APPROX_DIV_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_divf", APPROX_DIV_SF)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12e498d..2c878ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7540,6 +7540,79 @@ aarch64_emit_approx_rsqrt (rtx dst, rtx src)
   emit_move_insn (dst, x0);
 }
 
+/* Emit the instruction sequence to compute the approximation for a reciprocal.  */
+
+bool
+aarch64_emit_approx_div (rtx quo, rtx num, rtx div)
+{
+  machine_mode mode = GET_MODE (quo);
+
+  if (!flag_finite_math_only
+      || flag_trapping_math
+      || !flag_unsafe_math_optimizations
+      || optimize_function_for_size_p (cfun)
+      || ((GET_MODE_INNER (mode) != SFmode
+           || !(aarch64_tune_params.extra_tuning_flags
+                & AARCH64_EXTRA_TUNE_APPROX_DIV_SF))
+          && (GET_MODE_INNER (mode) != DFmode
+              || !(aarch64_tune_params.extra_tuning_flags
+                   & AARCH64_EXTRA_TUNE_APPROX_DIV_DF))))
+    return false;
+
+  /* Estimate the approximate reciprocal.  */
+  rtx xrcp = gen_reg_rtx (mode);
+  switch (mode)
+    {
+      case SFmode:
+	emit_insn (gen_aarch64_frecpesf (xrcp, div)); break;
+      case V2SFmode:
+	emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break;
+      case V4SFmode:
+	emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break;
+      case DFmode:
+	emit_insn (gen_aarch64_frecpedf (xrcp, div)); break;
+      case V2DFmode:
+	emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break;
+      default:
+	gcc_unreachable ();
+    }
+
+  /* Iterate over the series twice for SF and thrice for DF.  */
+  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
+
+  rtx xtmp = gen_reg_rtx (mode);
+  while (iterations--)
+    {
+      switch (mode)
+        {
+	  case SFmode:
+	    emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break;
+	  case V2SFmode:
+	    emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break;
+	  case V4SFmode:
+	    emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break;
+	  case DFmode:
+	    emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break;
+	  case V2DFmode:
+	    emit_insn (gen_aarch64_frecpsv2df (xtmp, xrcp, div)); break;
+	  default:
+	    gcc_unreachable ();
+        }
+
+      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
+    }
+
+  if (num != CONST1_RTX (mode))
+    {
+      rtx xnum = force_reg (mode, num);
+      emit_set_insn (quo, gen_rtx_MULT (mode, xnum, xrcp));
+    }
+  else
+    emit_move_insn (quo, xrcp);
+
+  return true;
+}
+
 /* Return the number of instructions that can be issued per cycle.  */
 static int
 aarch64_sched_issue_rate (void)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 68676c9..985915e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4647,11 +4647,22 @@
   [(set_attr "type" "fmul<s>")]
 )
 
-(define_insn "div<mode>3"
+(define_expand "div<mode>3"
+ [(set (match_operand:GPF 0 "register_operand")
+       (div:GPF (match_operand:GPF 1 "general_operand")
+		(match_operand:GPF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+    DONE;
+
+  operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+(define_insn "*div<mode>3"
   [(set (match_operand:GPF 0 "register_operand" "=w")
-        (div:GPF
-         (match_operand:GPF 1 "register_operand" "w")
-         (match_operand:GPF 2 "register_operand" "w")))]
+        (div:GPF (match_operand:GPF 1 "register_operand" "w")
+	         (match_operand:GPF 2 "register_operand" "w")))]
   "TARGET_FLOAT"
   "fdiv\\t%<s>0, %<s>1, %<s>2"
   [(set_attr "type" "fdiv<s>")]
-- 
1.9.1