From patchwork Fri Dec  7 15:01:50 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1009503
Return-Path: 
 <gcc-patches-return-491887-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-491887-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="GF9n4bov"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 43BG0J0J6rz9s1c
	for <incoming@patchwork.ozlabs.org>;
	Sat,  8 Dec 2018 02:02:07 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=vW+eiOBGfyDmqL3DiOWcI9Ne4TkxZ7Gst0HUGe2Lb/a9u8IJVfn9o
	KUR2zet1iD7hvTCjSdK8mVARyU/yPadwUKHw/Mm/TAKzNaJNCPIcG3I/B/Fb//YP
	FJp/Eyj4IugsC79zlQk5M1giV3O/TkH4UTytDe2pmYSrmUVvs2Y4S0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; s=
	default; bh=Ver2vFFU7J5jQPF9zN2ChL3I3zc=; b=GF9n4bovoaln+ekECn7n
	9mRcv1Gk3tC5SnHPIMdAzHIQJOofk/hVhKF6KislJxuDyY1bCMw8WbmJkg6RMj3z
	fG+pDStrfZlKrkLh9F9jY1/v+rBDVJWmCfWpscz/GnMNvSNrsnPJ7FZdmjNr2hNJ
	CEbyMFFrE5Y8o8ruLfaHIPs=
Received: (qmail 44710 invoked by alias); 7 Dec 2018 15:02:01 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 44671 invoked by uid 89); 7 Dec 2018 15:01:57 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_2,
	GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	SPF_PASS autolearn=ham version=3.3.2 spammy=fmul,
	match_operand, define_insn, unspec
X-HELO: foss.arm.com
Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com)
	(217.140.101.70) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 07 Dec 2018 15:01:54 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	74D2115AB	for <gcc-patches@gcc.gnu.org>;
	Fri,  7 Dec 2018 07:01:52 -0800 (PST)
Received: from localhost (unknown [10.32.99.101])	by
	usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
	EEE033F575	for <gcc-patches@gcc.gnu.org>;
	Fri,  7 Dec 2018 07:01:51 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [AArch64][SVE] Remove unnecessary PTRUEs from FP arithmetic
Date: Fri, 07 Dec 2018 15:01:50 +0000
Message-ID: <87lg51wfc1.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0

When using the unpredicated all-register forms of FADD, FSUB and FMUL,
the rtl patterns would still have the predicate operand we created for
the other forms.  This patch splits the patterns after reload in order
to get rid of the predicate, like we already do for WHILE.

Tested on aarch64-linux-gnu and applied.

Richard


2018-12-07  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* config/aarch64/iterators.md (SVE_UNPRED_FP_BINARY): New code
	iterator.
	(sve_fp_op): Handle minus and mult.
	* config/aarch64/aarch64-sve.md (*add<mode>3, *sub<mode>3)
	(*mul<mode>3): Split the patterns after reload if we don't
	need the predicate operand.
	(*post_ra_<sve_fp_op><mode>3): New pattern.

gcc/testsuite/
	* gcc.target/aarch64/sve/pred_elim_1.c: New test.

Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2018-12-05 08:33:40.970920085 +0000
+++ gcc/config/aarch64/iterators.md	2018-12-07 14:59:39.875208953 +0000
@@ -1220,6 +1220,9 @@ (define_code_iterator SVE_INT_BINARY [pl
 ;; SVE integer binary division operations.
 (define_code_iterator SVE_INT_BINARY_SD [div udiv])
 
+;; SVE floating-point operations with an unpredicated all-register form.
+(define_code_iterator SVE_UNPRED_FP_BINARY [plus minus mult])
+
 ;; SVE integer comparisons.
 (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
 
@@ -1423,6 +1426,8 @@ (define_code_attr sve_int_op_rev [(plus
 
 ;; The floating-point SVE instruction that implements an rtx code.
 (define_code_attr sve_fp_op [(plus "fadd")
+			     (minus "fsub")
+			     (mult "fmul")
 			     (neg "fneg")
 			     (abs "fabs")
 			     (sqrt "fsqrt")])
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-07-18 18:44:56.000000000 +0100
+++ gcc/config/aarch64/aarch64-sve.md	2018-12-07 14:59:39.875208953 +0000
@@ -2194,7 +2194,7 @@ (define_expand "add<mode>3"
 )
 
 ;; Floating-point addition predicated with a PTRUE.
-(define_insn "*add<mode>3"
+(define_insn_and_split "*add<mode>3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
 	(unspec:SVE_F
 	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl")
@@ -2206,7 +2206,12 @@ (define_insn "*add<mode>3"
   "@
    fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
    fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
-   fadd\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[3], <MODE>mode)"
+  [(set (match_dup 0) (plus:SVE_F (match_dup 2) (match_dup 3)))]
 )
 
 ;; Unpredicated floating-point subtraction.
@@ -2225,7 +2230,7 @@ (define_expand "sub<mode>3"
 )
 
 ;; Floating-point subtraction predicated with a PTRUE.
-(define_insn "*sub<mode>3"
+(define_insn_and_split "*sub<mode>3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
 	(unspec:SVE_F
 	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl, Upl")
@@ -2240,7 +2245,13 @@ (define_insn "*sub<mode>3"
    fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
    fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
    fsubr\t%0.<Vetype>, %1/m, %0.<Vetype>, #%2
-   fsub\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[2], <MODE>mode)
+   && register_operand (operands[3], <MODE>mode)"
+  [(set (match_dup 0) (minus:SVE_F (match_dup 2) (match_dup 3)))]
 )
 
 ;; Unpredicated floating-point multiplication.
@@ -2259,7 +2270,7 @@ (define_expand "mul<mode>3"
 )
 
 ;; Floating-point multiplication predicated with a PTRUE.
-(define_insn "*mul<mode>3"
+(define_insn_and_split "*mul<mode>3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
 	(unspec:SVE_F
 	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
@@ -2270,8 +2281,24 @@ (define_insn "*mul<mode>3"
   "TARGET_SVE"
   "@
    fmul\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
-   fmul\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
-)
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[3], <MODE>mode)"
+  [(set (match_dup 0) (mult:SVE_F (match_dup 2) (match_dup 3)))]
+)
+
+;; Unpredicated floating-point binary operations (post-RA only).
+;; These are generated by splitting a predicated instruction whose
+;; predicate is unused.
+(define_insn "*post_ra_<sve_fp_op><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+	(SVE_UNPRED_FP_BINARY:SVE_F
+	  (match_operand:SVE_F 1 "register_operand" "w")
+	  (match_operand:SVE_F 2 "register_operand" "w")))]
+  "TARGET_SVE && reload_completed"
+  "<sve_fp_op>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>")
 
 ;; Unpredicated fma (%0 = (%1 * %2) + %3).
 (define_expand "fma<mode>4"
Index: gcc/testsuite/gcc.target/aarch64/sve/pred_elim_1.c
===================================================================
--- /dev/null	2018-11-29 13:15:04.463550658 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/pred_elim_1.c	2018-12-07 14:59:39.875208953 +0000
@@ -0,0 +1,23 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define TEST_OP(NAME, TYPE, OP)				\
+  void							\
+  NAME##_##TYPE (TYPE *restrict a, TYPE *restrict b,	\
+		 TYPE *restrict c, int n)		\
+  {							\
+    for (int i = 0; i < n; ++i)				\
+      a[i] = b[i] OP c[i];				\
+  }
+
+#define TEST_TYPE(TYPE) \
+  TEST_OP (add, TYPE, +) \
+  TEST_OP (sub, TYPE, -) \
+  TEST_OP (mult, TYPE, *) \
+
+TEST_TYPE (float)
+TEST_TYPE (double)
+
+/* { dg-final { scan-assembler-times {\tfadd\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tfsub\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmul\t} 2 } } */
+/* { dg-final { scan-assembler-not {\tptrue\t} } } */