From patchwork Fri Feb 21 10:29:54 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1242030
Return-Path: 
 <gcc-patches-return-519887-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized)
	smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131;
	helo=sourceware.org;
	envelope-from=gcc-patches-return-519887-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.a=rsa-sha1 header.s=default header.b=Ov0LDp1X;
	dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 48P74w1ptdz9sPk
	for <incoming@patchwork.ozlabs.org>;
	Fri, 21 Feb 2020 21:30:08 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=eR3yJfCJ+c6+YXc45tE6F3Ky83ZC6SdYjh2vXxDG6hvNqIbElHiz1
	lSRUk4yo7SHKr3T5GqmrQL3KIeC05Jb9gDp9cYb6744ELMYkrUNJ49k4NujUCiOx
	3I+UWfPsQaKZAVnQ6PJCPNosC1ItTukM6WDNsw2vpuOU+7gEPSxzrU=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; s=
	default; bh=T1uLzuJa+7+RfuI3i1y6qWKUcIQ=; b=Ov0LDp1Xqr4Z+SDFrgBX
	VOYnCdOcyQuX1CbNpJAvBVSLENketIceUq57AZdeAJBmTtNwj76TVDdCiycEiB51
	ielRHgJI+OzqJIea2z8T5b5k/ykMGhXWR0uDB43eiK993c0jLPZjTiWYqyIYMZ5b
	cGcD6ZXomz0LqgMxBi9U2BA=
Received: (qmail 10415 invoked by alias); 21 Feb 2020 10:30:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 9816 invoked by uid 89); 21 Feb 2020 10:29:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-17.6 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	KAM_ASCII_DIVIDERS,
	SPF_PASS autolearn=ham version=3.3.1 spammy=calculated, Ofast,
	ofast, subtraction
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 21 Feb 2020 10:29:57 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	1F5C231B	for <gcc-patches@gcc.gnu.org>;
	Fri, 21 Feb 2020 02:29:56 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])	by
	usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
	BC8A53F68F	for <gcc-patches@gcc.gnu.org>;
	Fri, 21 Feb 2020 02:29:55 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [committed] aarch64: Add SVE support for -mlow-precision-div
Date: Fri, 21 Feb 2020 10:29:54 +0000
Message-ID: <mptsgj4w7vh.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
X-IsSubscribed: yes

SVE was missing support for -mlow-precision-div, which meant that
-march=armv8.2-a+sve -mlow-precision-div could cause a performance
regression compared to -march=armv8.2-a -mlow-precision-div.

I ended up doing this much later than originally intended, sorry...

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.

Richard


2020-02-21  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/aarch64.c (aarch64_emit_mult): New function.
	(aarch64_emit_approx_div): Add SVE support.  Use aarch64_emit_mult
	instead of emitting multiplication instructions directly.
	* config/aarch64/iterators.md (SVE_COND_FP_BINARY_OPTAB): New iterator.
	* config/aarch64/aarch64-sve.md (div<mode>3, @aarch64_frecpe<mode>)
	(@aarch64_frecps<mode>): New expanders.

gcc/testsuite/
	* gcc.target/aarch64/sve/recip_1.c: New test.
	* gcc.target/aarch64/sve/recip_1_run.c: Likewise.
	* gcc.target/aarch64/sve/recip_2.c: Likewise.
	* gcc.target/aarch64/sve/recip_2_run.c: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md             | 44 ++++++++++++++++++-
 gcc/config/aarch64/aarch64.c                  | 29 ++++++++++--
 gcc/config/aarch64/iterators.md               | 11 +++++
 .../gcc.target/aarch64/sve/recip_1.c          | 27 ++++++++++++
 .../gcc.target/aarch64/sve/recip_1_run.c      | 27 ++++++++++++
 .../gcc.target/aarch64/sve/recip_2.c          | 27 ++++++++++++
 .../gcc.target/aarch64/sve/recip_2_run.c      | 30 +++++++++++++
 7 files changed, 191 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index fa3852992e1..e3b1da89c1a 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -99,6 +99,7 @@
 ;; ---- [FP] Subtraction
 ;; ---- [FP] Absolute difference
 ;; ---- [FP] Multiplication
+;; ---- [FP] Division
 ;; ---- [FP] Binary logical operations
 ;; ---- [FP] Sign copying
 ;; ---- [FP] Maximum and minimum
@@ -4719,7 +4720,7 @@ (define_expand "<optab><mode>3"
 	   (const_int SVE_RELAXED_GP)
 	   (match_operand:SVE_FULL_F 1 "<sve_pred_fp_rhs1_operand>")
 	   (match_operand:SVE_FULL_F 2 "<sve_pred_fp_rhs2_operand>")]
-	  SVE_COND_FP_BINARY))]
+	  SVE_COND_FP_BINARY_OPTAB))]
   "TARGET_SVE"
   {
     operands[3] = aarch64_ptrue_reg (<VPRED>mode);
@@ -5455,6 +5456,47 @@ (define_insn "@aarch64_mul_lane_<mode>"
   "fmul\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>[%3]"
 )
 
+;; -------------------------------------------------------------------------
+;; ---- [FP] Division
+;; -------------------------------------------------------------------------
+;; The patterns in this section are synthetic.
+;; -------------------------------------------------------------------------
+
+(define_expand "div<mode>3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 3)
+	   (const_int SVE_RELAXED_GP)
+	   (match_operand:SVE_FULL_F 1 "nonmemory_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")]
+	  UNSPEC_COND_FDIV))]
+  "TARGET_SVE"
+  {
+    if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+      DONE;
+
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
+  }
+)
+
+(define_expand "@aarch64_frecpe<mode>"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_operand:SVE_FULL_F 1 "register_operand")]
+	  UNSPEC_FRECPE))]
+  "TARGET_SVE"
+)
+
+(define_expand "@aarch64_frecps<mode>"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")]
+	  UNSPEC_FRECPS))]
+  "TARGET_SVE"
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [FP] Binary logical operations
 ;; -------------------------------------------------------------------------
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0acaa06b91c..c1bbc4917c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12739,6 +12739,25 @@ aarch64_builtin_reciprocal (tree fndecl)
   gcc_unreachable ();
 }
 
+/* Emit code to perform the floating-point operation:
+
+     DST = SRC1 * SRC2
+
+   where all three operands are already known to be registers.
+   If the operation is an SVE one, PTRUE is a suitable all-true
+   predicate.  */
+
+static void
+aarch64_emit_mult (rtx dst, rtx ptrue, rtx src1, rtx src2)
+{
+  if (ptrue)
+    emit_insn (gen_aarch64_pred (UNSPEC_COND_FMUL, GET_MODE (dst),
+				 dst, ptrue, src1, src2,
+				 gen_int_mode (SVE_RELAXED_GP, SImode)));
+  else
+    emit_set_insn (dst, gen_rtx_MULT (GET_MODE (dst), src1, src2));
+}
+
 /* Emit instruction sequence to compute either the approximate square root
    or its approximate reciprocal, depending on the flag RECP, and return
    whether the sequence was emitted or not.  */
@@ -12857,6 +12876,10 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
   if (!TARGET_SIMD && VECTOR_MODE_P (mode))
     return false;
 
+  rtx pg = NULL_RTX;
+  if (aarch64_sve_mode_p (mode))
+    pg = aarch64_ptrue_reg (aarch64_sve_pred_mode (mode));
+
   /* Estimate the approximate reciprocal.  */
   rtx xrcp = gen_reg_rtx (mode);
   emit_insn (gen_aarch64_frecpe (mode, xrcp, den));
@@ -12876,7 +12899,7 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
       emit_insn (gen_aarch64_frecps (mode, xtmp, xrcp, den));
 
       if (iterations > 0)
-	emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
+	aarch64_emit_mult (xrcp, pg, xrcp, xtmp);
     }
 
   if (num != CONST1_RTX (mode))
@@ -12884,11 +12907,11 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
       /* As the approximate reciprocal of DEN is already calculated, only
 	 calculate the approximate division when NUM is not 1.0.  */
       rtx xnum = force_reg (mode, num);
-      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
+      aarch64_emit_mult (xrcp, pg, xrcp, xnum);
     }
 
   /* Finalize the approximation.  */
-  emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
+  aarch64_emit_mult (quo, pg, xrcp, xtmp);
   return true;
 }
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d17d79a30da..548ee0f51e8 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2291,6 +2291,17 @@ (define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_FADD
 					 UNSPEC_COND_FMULX
 					 UNSPEC_COND_FSUB])
 
+;; Same as SVE_COND_FP_BINARY, but without codes that have a dedicated
+;; <optab><mode>3 expander.
+(define_int_iterator SVE_COND_FP_BINARY_OPTAB [UNSPEC_COND_FADD
+					       UNSPEC_COND_FMAX
+					       UNSPEC_COND_FMAXNM
+					       UNSPEC_COND_FMIN
+					       UNSPEC_COND_FMINNM
+					       UNSPEC_COND_FMUL
+					       UNSPEC_COND_FMULX
+					       UNSPEC_COND_FSUB])
+
 (define_int_iterator SVE_COND_FP_BINARY_INT [UNSPEC_COND_FSCALE])
 
 (define_int_iterator SVE_COND_FP_ADD [UNSPEC_COND_FADD])
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c
new file mode 100644
index 00000000000..c9d470f5c03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c
@@ -0,0 +1,27 @@
+/* { dg-options "-Ofast -mlow-precision-div" } */
+
+#define DEF_LOOP(TYPE)			\
+  void					\
+  test_##TYPE (TYPE *x, int n)		\
+  {					\
+    for (int i = 0; i < n; ++i)		\
+      x[i] = (TYPE) 1 / x[i];		\
+  }
+
+#define TEST_ALL(T)	\
+  T (_Float16)		\
+  T (float)		\
+  T (double)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tfrecpe\tz[0-9]+\.h} } } */
+/* { dg-final { scan-assembler-not {\tfrecps\tz[0-9]+\.h} } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.s} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d} 2 } } */
+/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.d} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c
new file mode 100644
index 00000000000..b232b88530a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c
@@ -0,0 +1,27 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-Ofast -mlow-precision-div" } */
+
+#include "recip_1.c"
+
+#define N 77
+
+#define TEST_LOOP(TYPE)				\
+  {						\
+    TYPE a[N];					\
+    for (int i = 0; i < N; ++i)			\
+      a[i] = i + 1;				\
+    test_##TYPE (a, N);				\
+    for (int i = 0; i < N; ++i)			\
+      {						\
+	double diff = a[i] - 1.0 / (i + 1);	\
+	if (__builtin_fabs (diff) > 0x1.0p-8)	\
+	  __builtin_abort ();			\
+      }						\
+  }
+
+int
+main (void)
+{
+  TEST_ALL (TEST_LOOP);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c
new file mode 100644
index 00000000000..f308a6b7874
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c
@@ -0,0 +1,27 @@
+/* { dg-options "-Ofast -mlow-precision-div" } */
+
+#define DEF_LOOP(TYPE)						\
+  void								\
+  test_##TYPE (TYPE *restrict x, TYPE *restrict y, int n)	\
+  {								\
+    for (int i = 0; i < n; ++i)					\
+      x[i] /= y[i];						\
+  }
+
+#define TEST_ALL(T)	\
+  T (_Float16)		\
+  T (float)		\
+  T (double)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tfrecpe\tz[0-9]+\.h} } } */
+/* { dg-final { scan-assembler-not {\tfrecps\tz[0-9]+\.h} } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.s} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d} 3 } } */
+/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.d} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c
new file mode 100644
index 00000000000..25a31e11f55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-Ofast -mlow-precision-div" } */
+
+#include "recip_2.c"
+
+#define N 77
+
+#define TEST_LOOP(TYPE)					\
+  {							\
+    TYPE a[N], b[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	a[i] = i + 11;					\
+	b[i] = i + 1;					\
+      }							\
+    test_##TYPE (a, b, N);				\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double diff = a[i] - (i + 11.0) / (i + 1);	\
+	if (__builtin_fabs (diff) > 0x1.0p-8)		\
+	  __builtin_abort ();				\
+      }							\
+  }
+
+int
+main (void)
+{
+  TEST_ALL (TEST_LOOP);
+  return 0;
+}