From patchwork Fri Dec 7 15:01:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1009503 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-491887-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="GF9n4bov"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 43BG0J0J6rz9s1c for ; Sat, 8 Dec 2018 02:02:07 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=vW+eiOBGfyDmqL3DiOWcI9Ne4TkxZ7Gst0HUGe2Lb/a9u8IJVfn9o KUR2zet1iD7hvTCjSdK8mVARyU/yPadwUKHw/Mm/TAKzNaJNCPIcG3I/B/Fb//YP FJp/Eyj4IugsC79zlQk5M1giV3O/TkH4UTytDe2pmYSrmUVvs2Y4S0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=Ver2vFFU7J5jQPF9zN2ChL3I3zc=; b=GF9n4bovoaln+ekECn7n 9mRcv1Gk3tC5SnHPIMdAzHIQJOofk/hVhKF6KislJxuDyY1bCMw8WbmJkg6RMj3z fG+pDStrfZlKrkLh9F9jY1/v+rBDVJWmCfWpscz/GnMNvSNrsnPJ7FZdmjNr2hNJ CEbyMFFrE5Y8o8ruLfaHIPs= Received: (qmail 44710 invoked by alias); 7 Dec 2018 15:02:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 44671 invoked by uid 89); 7 Dec 2018 15:01:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.1 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.2 spammy=fmul, match_operand, define_insn, unspec X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 07 Dec 2018 15:01:54 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 74D2115AB for ; Fri, 7 Dec 2018 07:01:52 -0800 (PST) Received: from localhost (unknown [10.32.99.101]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EEE033F575 for ; Fri, 7 Dec 2018 07:01:51 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [AArch64][SVE] Remove unnecessary PTRUEs from FP arithmetic Date: Fri, 07 Dec 2018 15:01:50 +0000 Message-ID: <87lg51wfc1.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 When using the unpredicated all-register forms of FADD, FSUB and FMUL, the rtl patterns would still have the predicate operand we created for the other forms. This patch splits the patterns after reload in order to get rid of the predicate, like we already do for WHILE. Tested on aarch64-linux-gnu and applied. Richard 2018-12-07 Richard Sandiford gcc/ * config/aarch64/iterators.md (SVE_UNPRED_FP_BINARY): New code iterator. (sve_fp_op): Handle minus and mult. * config/aarch64/aarch64-sve.md (*add3, *sub3) (*mul3): Split the patterns after reload if we don't need the predicate operand. (*post_ra_3): New pattern. gcc/testsuite/ * gcc.target/aarch64/sve/pred_elim_1.c: New test. Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2018-12-05 08:33:40.970920085 +0000 +++ gcc/config/aarch64/iterators.md 2018-12-07 14:59:39.875208953 +0000 @@ -1220,6 +1220,9 @@ (define_code_iterator SVE_INT_BINARY [pl ;; SVE integer binary division operations. (define_code_iterator SVE_INT_BINARY_SD [div udiv]) +;; SVE floating-point operations with an unpredicated all-register form. +(define_code_iterator SVE_UNPRED_FP_BINARY [plus minus mult]) + ;; SVE integer comparisons. (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu]) @@ -1423,6 +1426,8 @@ (define_code_attr sve_int_op_rev [(plus ;; The floating-point SVE instruction that implements an rtx code. (define_code_attr sve_fp_op [(plus "fadd") + (minus "fsub") + (mult "fmul") (neg "fneg") (abs "fabs") (sqrt "fsqrt")]) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-07-18 18:44:56.000000000 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2018-12-07 14:59:39.875208953 +0000 @@ -2194,7 +2194,7 @@ (define_expand "add3" ) ;; Floating-point addition predicated with a PTRUE. -(define_insn "*add3" +(define_insn_and_split "*add3" [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w") (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl, Upl") @@ -2206,7 +2206,12 @@ (define_insn "*add3" "@ fadd\t%0., %1/m, %0., #%3 fsub\t%0., %1/m, %0., #%N3 - fadd\t%0., %2., %3." + #" + ; Split the unpredicated form after reload, so that we don't have + ; the unnecessary PTRUE. + "&& reload_completed + && register_operand (operands[3], mode)" + [(set (match_dup 0) (plus:SVE_F (match_dup 2) (match_dup 3)))] ) ;; Unpredicated floating-point subtraction. @@ -2225,7 +2230,7 @@ (define_expand "sub3" ) ;; Floating-point subtraction predicated with a PTRUE. -(define_insn "*sub3" +(define_insn_and_split "*sub3" [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w") (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl") @@ -2240,7 +2245,13 @@ (define_insn "*sub3" fsub\t%0., %1/m, %0., #%3 fadd\t%0., %1/m, %0., #%N3 fsubr\t%0., %1/m, %0., #%2 - fsub\t%0., %2., %3." + #" + ; Split the unpredicated form after reload, so that we don't have + ; the unnecessary PTRUE. + "&& reload_completed + && register_operand (operands[2], mode) + && register_operand (operands[3], mode)" + [(set (match_dup 0) (minus:SVE_F (match_dup 2) (match_dup 3)))] ) ;; Unpredicated floating-point multiplication. @@ -2259,7 +2270,7 @@ (define_expand "mul3" ) ;; Floating-point multiplication predicated with a PTRUE. -(define_insn "*mul3" +(define_insn_and_split "*mul3" [(set (match_operand:SVE_F 0 "register_operand" "=w, w") (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl") @@ -2270,8 +2281,24 @@ (define_insn "*mul3" "TARGET_SVE" "@ fmul\t%0., %1/m, %0., #%3 - fmul\t%0., %2., %3." -) + #" + ; Split the unpredicated form after reload, so that we don't have + ; the unnecessary PTRUE. + "&& reload_completed + && register_operand (operands[3], mode)" + [(set (match_dup 0) (mult:SVE_F (match_dup 2) (match_dup 3)))] +) + +;; Unpredicated floating-point binary operations (post-RA only). +;; These are generated by splitting a predicated instruction whose +;; predicate is unused. +(define_insn "*post_ra_3" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (SVE_UNPRED_FP_BINARY:SVE_F + (match_operand:SVE_F 1 "register_operand" "w") + (match_operand:SVE_F 2 "register_operand" "w")))] + "TARGET_SVE && reload_completed" + "\t%0., %1., %2.") ;; Unpredicated fma (%0 = (%1 * %2) + %3). (define_expand "fma4" Index: gcc/testsuite/gcc.target/aarch64/sve/pred_elim_1.c =================================================================== --- /dev/null 2018-11-29 13:15:04.463550658 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/pred_elim_1.c 2018-12-07 14:59:39.875208953 +0000 @@ -0,0 +1,23 @@ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define TEST_OP(NAME, TYPE, OP) \ + void \ + NAME##_##TYPE (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + a[i] = b[i] OP c[i]; \ + } + +#define TEST_TYPE(TYPE) \ + TEST_OP (add, TYPE, +) \ + TEST_OP (sub, TYPE, -) \ + TEST_OP (mult, TYPE, *) \ + +TEST_TYPE (float) +TEST_TYPE (double) + +/* { dg-final { scan-assembler-times {\tfadd\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tfsub\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmul\t} 2 } } */ +/* { dg-final { scan-assembler-not {\tptrue\t} } } */