From patchwork Wed Aug 14 08:46:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1146848 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506884-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="nqyroS7s"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467jqv3m1hz9sNp for ; Wed, 14 Aug 2019 18:46:51 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=jyOf6XhQzQfBrvVPf8glIWMe7iDgB0rNn8dfSBh33bA32ihV2H5oH SHNb7HE5/4Ru56LfS6+b2oKt6eeXexnY1jrJbVpl2K8DLjZ3nsbJCkHP8IkM7Lu5 TIL42DVy8YvG9a+JcsbqcEce2sq9XX3/+foUTbzxlTyuDtv0g4ceNQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=rxyRNezg19dzu0MHp72HGKbyTyI=; b=nqyroS7sSuorrNWkcq1c 7IpVOwd/V/7G8oH9oc4DP9AO4MrHc/+80rQCTBRtiPKs6GoFk1Zeo3NTEnQr53so YQedOSlS6PFhkjukZXfVd7CuiWSZZJIX+aiUV2OIsG54IZjmMIYLvAShfjtLLhsh NVWTw1FetsL8BUH/s81VxJE= Received: (qmail 93701 invoked by alias); 14 Aug 2019 08:46:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 93648 invoked by uid 89); 14 Aug 2019 08:46:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 08:46:41 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7C14D344 for ; Wed, 14 Aug 2019 01:46:39 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 227173F694 for ; Wed, 14 Aug 2019 01:46:39 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed][AArch64] Use "x" predication for SVE integer arithmetic patterns Date: Wed, 14 Aug 2019 09:46:38 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes The SVE patterns used an UNSPEC_MERGE_PTRUE unspec to attach a predicate to an otherwise unpredicated integer arithmetic operation. As its name suggests, this was designed to be a wrapper used for merging instructions in which the predicate is known to be a PTRUE. This unspec dates from the very early days of the port and nothing has ever taken advantage of the PTRUE guarantee for arithmetic (as opposed to comparisons). This patch replaces it with the less stringent guarantee that: (a) the values of inactive lanes don't matter and (b) it is valid to make extra lanes active if there's a specific benefit Doing this makes the patterns suitable for the ACLE _x functions, which have the above semantics. See the block comment in the patch for more details. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274425. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.md (UNSPEC_PRED_X): New unspec. * config/aarch64/aarch64-sve.md: Add a section describing it. (@aarch64_pred_mov, @aarch64_pred_mov) (2, *2) (aarch64_abd_3, mul3, *mul3) (mul3_highpart, *mul3_highpart) (3, *3) (*bic3, v3, *v3) (3, *3, *madd) (*msub3, *aarch64_sve_rev64) (*aarch64_sve_rev32, *aarch64_sve_rev16vnx16qi): Use UNSPEC_PRED_X instead of UNSPEC_MERGE_PTRUE. * config/aarch64/aarch64-sve2.md (avg3_floor) (avg3_ceil, *h): Likewise. * config/aarch64/aarch64.c (aarch64_split_sve_subreg_move) (aarch64_evpc_rev_local): Update accordingly. Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2019-08-14 09:34:05.509786440 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-14 09:43:23.977659217 +0100 @@ -220,6 +220,7 @@ (define_c_enum "unspec" [ UNSPEC_LD1_GATHER UNSPEC_ST1_SCATTER UNSPEC_MERGE_PTRUE + UNSPEC_PRED_X UNSPEC_PTEST UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:39:44.323282457 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:43:23.973659247 +0100 @@ -24,6 +24,7 @@ ;; == General notes ;; ---- Note on the handling of big-endian SVE ;; ---- Description of UNSPEC_PTEST +;; ---- Note on predicated integer arithemtic and UNSPEC_PRED_X ;; ---- Note on predicated FP arithmetic patterns and GP "strictness" ;; ;; == Moves @@ -230,6 +231,63 @@ ;; - OP is the predicate we want to test, of the same mode as CAST_GP. ;; ;; ------------------------------------------------------------------------- +;; ---- Note on predicated integer arithemtic and UNSPEC_PRED_X +;; ------------------------------------------------------------------------- +;; +;; Many SVE integer operations are predicated. We can generate them +;; from four sources: +;; +;; (1) Using normal unpredicated optabs. In this case we need to create +;; an all-true predicate register to act as the governing predicate +;; for the SVE instruction. There are no inactive lanes, and thus +;; the values of inactive lanes don't matter. +;; +;; (2) Using _x ACLE functions. In this case the function provides a +;; specific predicate and some lanes might be inactive. However, +;; as for (1), the values of the inactive lanes don't matter. +;; We can make extra lanes active without changing the behavior +;; (although for code-quality reasons we should avoid doing so +;; needlessly). +;; +;; (3) Using cond_* optabs that correspond to IFN_COND_* internal functions. +;; These optabs have a predicate operand that specifies which lanes are +;; active and another operand that provides the values of inactive lanes. +;; +;; (4) Using _m and _z ACLE functions. These functions map to the same +;; patterns as (3), with the _z functions setting inactive lanes to zero +;; and the _m functions setting the inactive lanes to one of the function +;; arguments. +;; +;; For (1) and (2) we need a way of attaching the predicate to a normal +;; unpredicated integer operation. We do this using: +;; +;; (unspec:M [pred (code:M (op0 op1 ...))] UNSPEC_PRED_X) +;; +;; where (code:M (op0 op1 ...)) is the normal integer operation and PRED +;; is a predicate of mode . PRED might or might not be a PTRUE; +;; it always is for (1), but might not be for (2). +;; +;; The unspec as a whole has the same value as (code:M ...) when PRED is +;; all-true. It is always semantically valid to replace PRED with a PTRUE, +;; but as noted above, we should only do so if there's a specific benefit. +;; +;; (The "_X" in the unspec is named after the ACLE functions in (2).) +;; +;; For (3) and (4) we can simply use the SVE port's normal representation +;; of a predicate-based select: +;; +;; (unspec:M [pred (code:M (op0 op1 ...)) inactive] UNSPEC_SEL) +;; +;; where INACTIVE specifies the values of inactive lanes. +;; +;; We can also use the UNSPEC_PRED_X wrapper in the UNSPEC_SEL rather +;; than inserting the integer operation directly. This is mostly useful +;; if we want the combine pass to merge an integer operation with an explicit +;; vcond_mask (in other words, with a following SEL instruction). However, +;; it's generally better to merge such operations at the gimple level +;; using (3). +;; +;; ------------------------------------------------------------------------- ;; ---- Note on predicated FP arithmetic patterns and GP "strictness" ;; ------------------------------------------------------------------------- ;; @@ -430,7 +488,7 @@ (define_insn_and_split "@aarch64_pred_mo (unspec:SVE_ALL [(match_operand: 1 "register_operand" "Upl, Upl, Upl") (match_operand:SVE_ALL 2 "nonimmediate_operand" "w, m, w")] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE && (register_operand (operands[0], mode) || register_operand (operands[2], mode))" @@ -578,7 +636,7 @@ (define_insn_and_split "@aarch64_pred_mo (unspec:SVE_STRUCT [(match_operand: 1 "register_operand" "Upl, Upl, Upl") (match_operand:SVE_STRUCT 2 "aarch64_sve_struct_nonimmediate_operand" "w, Utx, w")] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE && (register_operand (operands[0], mode) || register_operand (operands[2], mode))" @@ -1327,7 +1385,7 @@ (define_expand "2" (unspec:SVE_I [(match_dup 2) (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 1 "register_operand"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[2] = aarch64_ptrue_reg (mode); @@ -1341,7 +1399,7 @@ (define_insn "*2" [(match_operand: 1 "register_operand" "Upl") (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 2 "register_operand" "w"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "\t%0., %1/m, %2." ) @@ -1600,7 +1658,7 @@ (define_insn "aarch64_abd_3" (:SVE_I (match_dup 2) (match_dup 3)))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ abd\t%0., %1/m, %0., %3. @@ -1623,7 +1681,7 @@ (define_expand "mul3" (mult:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "aarch64_sve_mul_operand"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -1641,7 +1699,7 @@ (define_insn_and_split "*mul3" (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, 0, w") (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w, w"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ # @@ -1686,7 +1744,7 @@ (define_expand "mul3_highpart" (unspec:SVE_I [(match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "register_operand")] MUL_HIGHPART)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -1701,7 +1759,7 @@ (define_insn "*mul3_highpart" (unspec:SVE_I [(match_operand:SVE_I 2 "register_operand" "%0, w") (match_operand:SVE_I 3 "register_operand" "w, w")] MUL_HIGHPART)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ mulh\t%0., %1/m, %0., %3. @@ -1727,7 +1785,7 @@ (define_expand "3" (SVE_INT_BINARY_SD:SVE_SDI (match_operand:SVE_SDI 1 "register_operand") (match_operand:SVE_SDI 2 "register_operand"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -1742,7 +1800,7 @@ (define_insn "*3" (SVE_INT_BINARY_SD:SVE_SDI (match_operand:SVE_SDI 2 "register_operand" "0, w, w") (match_operand:SVE_SDI 3 "aarch64_sve_mul_operand" "w, 0, w"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ \t%0., %1/m, %0., %3. @@ -1864,7 +1922,7 @@ (define_insn_and_rewrite "*bic3" (unspec:SVE_I [(match_operand 3) (not:SVE_I (match_operand:SVE_I 2 "register_operand" "w"))] - UNSPEC_MERGE_PTRUE) + UNSPEC_PRED_X) (match_operand:SVE_I 1 "register_operand" "w")))] "TARGET_SVE" "bic\t%0.d, %1.d, %2.d" @@ -1918,7 +1976,7 @@ (define_expand "v3" (ASHIFT:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "aarch64_sve_shift_operand"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -1936,7 +1994,7 @@ (define_insn_and_split "*v3 (ASHIFT:SVE_I (match_operand:SVE_I 2 "register_operand" "w, 0, w") (match_operand:SVE_I 3 "aarch64_sve_shift_operand" "D, w, w"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ # @@ -1978,7 +2036,7 @@ (define_expand "3" [(match_dup 3) (MAXMIN:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "register_operand"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -1992,7 +2050,7 @@ (define_insn "*3" [(match_operand: 1 "register_operand" "Upl, Upl") (MAXMIN:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w") (match_operand:SVE_I 3 "register_operand" "w, w"))] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "@ \t%0., %1/m, %0., %3. @@ -2549,7 +2607,7 @@ (define_insn "*madd" [(match_operand: 1 "register_operand" "Upl, Upl, Upl") (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w, w") (match_operand:SVE_I 3 "register_operand" "w, w, w"))] - UNSPEC_MERGE_PTRUE) + UNSPEC_PRED_X) (match_operand:SVE_I 4 "register_operand" "w, 0, w")))] "TARGET_SVE" "@ @@ -2576,7 +2634,7 @@ (define_insn "*msub3" [(match_operand: 1 "register_operand" "Upl, Upl, Upl") (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w, w") (match_operand:SVE_I 3 "register_operand" "w, w, w"))] - UNSPEC_MERGE_PTRUE)))] + UNSPEC_PRED_X)))] "TARGET_SVE" "@ msb\t%0., %1/m, %3., %4. @@ -3485,7 +3543,7 @@ (define_insn "*aarch64_sve_rev64" [(match_operand:VNx2BI 1 "register_operand" "Upl") (unspec:SVE_BHS [(match_operand:SVE_BHS 2 "register_operand" "w")] UNSPEC_REV64)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "rev\t%0.d, %1/m, %2.d" ) @@ -3497,7 +3555,7 @@ (define_insn "*aarch64_sve_rev32" [(match_operand:VNx4BI 1 "register_operand" "Upl") (unspec:SVE_BH [(match_operand:SVE_BH 2 "register_operand" "w")] UNSPEC_REV32)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "rev\t%0.s, %1/m, %2.s" ) @@ -3509,7 +3567,7 @@ (define_insn "*aarch64_sve_rev16vnx16qi" [(match_operand:VNx8BI 1 "register_operand" "Upl") (unspec:VNx16QI [(match_operand:VNx16QI 2 "register_operand" "w")] UNSPEC_REV16)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE" "revb\t%0.h, %1/m, %2.h" ) Index: gcc/config/aarch64/aarch64-sve2.md =================================================================== --- gcc/config/aarch64/aarch64-sve2.md 2019-05-30 18:34:35.946485479 +0100 +++ gcc/config/aarch64/aarch64-sve2.md 2019-08-14 09:43:23.973659247 +0100 @@ -26,7 +26,7 @@ (define_expand "avg3_floor" (unspec:SVE_I [(match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "register_operand")] HADD)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE2" { operands[3] = force_reg (mode, CONSTM1_RTX (mode)); @@ -41,7 +41,7 @@ (define_expand "avg3_ceil" (unspec:SVE_I [(match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "register_operand")] RHADD)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE2" { operands[3] = force_reg (mode, CONSTM1_RTX (mode)); @@ -56,10 +56,10 @@ (define_insn "*h" (unspec:SVE_I [(match_operand:SVE_I 2 "register_operand" "%0, w") (match_operand:SVE_I 3 "register_operand" "w, w")] HADDSUB)] - UNSPEC_MERGE_PTRUE))] + UNSPEC_PRED_X))] "TARGET_SVE2" "@ h\t%0., %1/m, %0., %3. movprfx\t%0, %2\;h\t%0., %1/m, %0., %3." [(set_attr "movprfx" "*,yes")] -) \ No newline at end of file +) Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-08-14 09:29:52.867653714 +0100 +++ gcc/config/aarch64/aarch64.c 2019-08-14 09:43:23.977659217 +0100 @@ -4097,8 +4097,7 @@ aarch64_split_sve_subreg_move (rtx dest, /* Emit: - (set DEST (unspec [PTRUE (unspec [SRC] UNSPEC_REV)] - UNSPEC_MERGE_PTRUE)) + (set DEST (unspec [PTRUE (unspec [SRC] UNSPEC_REV)] UNSPEC_PRED_X)) with the appropriate modes. */ ptrue = gen_lowpart (pred_mode, ptrue); @@ -4106,7 +4105,7 @@ aarch64_split_sve_subreg_move (rtx dest, src = aarch64_replace_reg_mode (src, mode_with_narrower_elts); src = gen_rtx_UNSPEC (mode_with_narrower_elts, gen_rtvec (1, src), unspec); src = gen_rtx_UNSPEC (mode_with_narrower_elts, gen_rtvec (2, ptrue, src), - UNSPEC_MERGE_PTRUE); + UNSPEC_PRED_X); emit_insn (gen_rtx_SET (dest, src)); } @@ -17434,7 +17433,7 @@ aarch64_evpc_rev_local (struct expand_ve { rtx pred = aarch64_ptrue_reg (pred_mode); src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src), - UNSPEC_MERGE_PTRUE); + UNSPEC_PRED_X); } emit_set_insn (d->target, src); return true;