From patchwork Wed Aug 14 09:19:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1146863 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506895-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="XQoi7zmg"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467kYS3fbpz9sDB for ; Wed, 14 Aug 2019 19:19:24 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=lM0TQS6IC1N3BNL0UNH6J9KUYrfS8PVn4trZYOSfwKiVGMSxD1 65RJ8ily9oq+RmvyvrdHtHUgW1nxvdu7/tKv2eAUv6IE4eeiX0YZd3JeahmR5VWj Rm81IcpuRuxM86o2M08ci5yZ7mPXI3/qvOH67HVWeACD43+E+H0+pG8Xo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=zaDRoDdnWoVdSEBxSSMGuFUzg4Y=; b=XQoi7zmgvc4UrV8t6fmD pPljwYnmKSws4yx9AOi+asTO2GX2SlrLW1yuY2KkxSV98VtQiURmswa73vnZ0pLU P+gZ+4hK2HFpisOrg0SHQf3cvOm+w+aRpvk3ylgpho6AfSXDukVHnsfl0tYzOdN0 1CUmB4nF7BPdU7npYmPTzKA= Received: (qmail 52223 invoked by alias); 14 Aug 2019 09:19:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 52204 invoked by uid 89); 14 Aug 2019 09:19:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 09:19:11 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 54FAA344; Wed, 14 Aug 2019 02:19:10 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B7B5E3F694; Wed, 14 Aug 2019 02:19:09 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, Kugan Vivekanandarajah , richard.sandiford@arm.com Cc: Kugan Vivekanandarajah Subject: [committed][AArch64] Make more use of SVE conditional constant moves Date: Wed, 14 Aug 2019 10:19:08 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes This patch extends the SVE UNSPEC_SEL patterns so that they can use: (1) MOV /M of a duplicated integer constant (2) MOV /M of a duplicated floating-point constant bitcast to an integer, accepting the same constants as (1) (3) FMOV /M of a duplicated floating-point constant (4) MOV /Z of a duplicated integer constant (5) MOV /Z of a duplicated floating-point constant bitcast to an integer, accepting the same constants as (4) (6) MOVPRFXed FMOV /M of a duplicated floating-point constant We already handled (4) with a special pattern; the rest are new. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274441. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64.c (aarch64_bit_representation): New function. (aarch64_print_vector_float_operand): Also handle 8-bit floats. (aarch64_print_operand): Add support for %I. (aarch64_sve_dup_immediate_p): Handle scalars as well as vectors. Bitcast floating-point constants to the corresponding integer constant. (aarch64_float_const_representable_p): Handle vectors as well as scalars. (aarch64_expand_sve_vcond): Make sure that the operands are valid for the new vcond_mask_ expander. * config/aarch64/predicates.md (aarch64_sve_dup_immediate): Also test aarch64_float_const_representable_p. (aarch64_sve_reg_or_dup_imm): New predicate. * config/aarch64/aarch64-sve.md (vec_extract): Use gen_vcond_mask_ instead of gen_aarch64_sve_dup_const. (vcond_mask_): Turn into a define_expand that accepts aarch64_sve_reg_or_dup_imm and aarch64_simd_reg_or_zero for operands 1 and 2 respectively. Force operand 2 into a register if operand 1 is a register. Fold old define_insn... (aarch64_sve_dup_const): ...and this define_insn... (*vcond_mask_): ...into this new pattern. Handle floating-point constants that can be moved as integers. Add alternatives for MOV /M and FMOV /M. (vcond, vcondu) (vcond): Accept nonmemory_operand for operands 1 and 2 respectively. * config/aarch64/constraints.md (Ufc): Handle vectors as well as scalars. (vss): New constraint. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_18.c: New test. * gcc.target/aarch64/sve/vcond_18_run.c: Likewise. * gcc.target/aarch64/sve/vcond_19.c: Likewise. * gcc.target/aarch64/sve/vcond_19_run.c: Likewise. * gcc.target/aarch64/sve/vcond_20.c: Likewise. * gcc.target/aarch64/sve/vcond_20_run.c: Likewise. Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-08-14 09:54:30.816741891 +0100 +++ gcc/config/aarch64/aarch64.c 2019-08-14 10:16:30.671052843 +0100 @@ -1482,6 +1482,16 @@ aarch64_dbx_register_number (unsigned re return DWARF_FRAME_REGISTERS; } +/* If X is a CONST_DOUBLE, return its bit representation as a constant + integer, otherwise return X unmodified. */ +static rtx +aarch64_bit_representation (rtx x) +{ + if (CONST_DOUBLE_P (x)) + x = gen_lowpart (int_mode_for_mode (GET_MODE (x)).require (), x); + return x; +} + /* Return true if MODE is any of the Advanced SIMD structure modes. */ static bool aarch64_advsimd_struct_mode_p (machine_mode mode) @@ -8275,7 +8285,8 @@ aarch64_print_vector_float_operand (FILE if (negate) r = real_value_negate (&r); - /* We only handle the SVE single-bit immediates here. */ + /* Handle the SVE single-bit immediates specially, since they have a + fixed form in the assembly syntax. */ if (real_equal (&r, &dconst0)) asm_fprintf (f, "0.0"); else if (real_equal (&r, &dconst1)) @@ -8283,7 +8294,13 @@ aarch64_print_vector_float_operand (FILE else if (real_equal (&r, &dconsthalf)) asm_fprintf (f, "0.5"); else - return false; + { + const int buf_size = 20; + char float_buf[buf_size] = {'\0'}; + real_to_decimal_for_mode (float_buf, &r, buf_size, buf_size, + 1, GET_MODE (elt)); + asm_fprintf (f, "%s", float_buf); + } return true; } @@ -8312,6 +8329,11 @@ sizetochar (int size) and print it as an unsigned integer, in decimal. 'e': Print the sign/zero-extend size as a character 8->b, 16->h, 32->w. + 'I': If the operand is a duplicated vector constant, + replace it with the duplicated scalar. If the + operand is then a floating-point constant, replace + it with the integer bit representation. Print the + transformed constant as a signed decimal number. 'p': Prints N such that 2^N == X (X must be power of 2 and const int). 'P': Print the number of non-zero bits in X (a const_int). @@ -8444,6 +8466,19 @@ aarch64_print_operand (FILE *f, rtx x, i asm_fprintf (f, "%s", reg_names [REGNO (x) + 1]); break; + case 'I': + { + x = aarch64_bit_representation (unwrap_const_vec_duplicate (x)); + if (CONST_INT_P (x)) + asm_fprintf (f, "%wd", INTVAL (x)); + else + { + output_operand_lossage ("invalid operand for '%%%c'", code); + return; + } + break; + } + case 'M': case 'm': { @@ -15116,13 +15151,11 @@ aarch64_sve_bitmask_immediate_p (rtx x) bool aarch64_sve_dup_immediate_p (rtx x) { - rtx elt; - - if (!const_vec_duplicate_p (x, &elt) - || !CONST_INT_P (elt)) + x = aarch64_bit_representation (unwrap_const_vec_duplicate (x)); + if (!CONST_INT_P (x)) return false; - HOST_WIDE_INT val = INTVAL (elt); + HOST_WIDE_INT val = INTVAL (x); if (val & 0xff) return IN_RANGE (val, -0x80, 0x7f); return IN_RANGE (val, -0x8000, 0x7f00); @@ -16965,6 +16998,7 @@ aarch64_float_const_representable_p (rtx REAL_VALUE_TYPE r, m; bool fail; + x = unwrap_const_vec_duplicate (x); if (!CONST_DOUBLE_P (x)) return false; @@ -18086,6 +18120,13 @@ aarch64_expand_sve_vcond (machine_mode d else aarch64_expand_sve_vec_cmp_int (pred, GET_CODE (ops[3]), ops[4], ops[5]); + if (!aarch64_sve_reg_or_dup_imm (ops[1], data_mode)) + ops[1] = force_reg (data_mode, ops[1]); + /* The "false" value can only be zero if the "true" value is a constant. */ + if (register_operand (ops[1], data_mode) + || !aarch64_simd_reg_or_zero (ops[2], data_mode)) + ops[2] = force_reg (data_mode, ops[2]); + rtvec vec = gen_rtvec (3, pred, ops[1], ops[2]); emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL)); } Index: gcc/config/aarch64/predicates.md =================================================================== --- gcc/config/aarch64/predicates.md 2019-08-14 10:14:27.899953691 +0100 +++ gcc/config/aarch64/predicates.md 2019-08-14 10:16:30.671052843 +0100 @@ -629,7 +629,8 @@ (define_predicate "aarch64_sve_vsm_immed (define_predicate "aarch64_sve_dup_immediate" (and (match_code "const,const_vector") - (match_test "aarch64_sve_dup_immediate_p (op)"))) + (ior (match_test "aarch64_sve_dup_immediate_p (op)") + (match_test "aarch64_float_const_representable_p (op)")))) (define_predicate "aarch64_sve_cmp_vsc_immediate" (and (match_code "const,const_vector") @@ -689,6 +690,10 @@ (define_predicate "aarch64_sve_vsm_opera (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_sve_vsm_immediate"))) +(define_predicate "aarch64_sve_reg_or_dup_imm" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_dup_immediate"))) + (define_predicate "aarch64_sve_cmp_vsc_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_sve_cmp_vsc_immediate"))) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:14:27.899953691 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:16:30.667052873 +0100 @@ -1404,9 +1404,9 @@ (define_expand "vec_extract" "TARGET_SVE" { rtx tmp = gen_reg_rtx (mode); - emit_insn (gen_aarch64_sve_dup_const (tmp, operands[1], - CONST1_RTX (mode), - CONST0_RTX (mode))); + emit_insn (gen_vcond_mask_ (tmp, operands[1], + CONST1_RTX (mode), + CONST0_RTX (mode))); emit_insn (gen_vec_extract (operands[0], tmp, operands[2])); DONE; } @@ -3023,6 +3023,7 @@ (define_insn_and_rewrite "*cond_< ;; ---- [INT,FP] Select based on predicates ;; ------------------------------------------------------------------------- ;; Includes merging patterns for: +;; - FMOV ;; - MOV ;; - SEL ;; ------------------------------------------------------------------------- @@ -3030,27 +3031,43 @@ (define_insn_and_rewrite "*cond_< ;; vcond_mask operand order: true, false, mask ;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR) ;; SEL operand order: mask, true, false -(define_insn "vcond_mask_" - [(set (match_operand:SVE_ALL 0 "register_operand" "=w") +(define_expand "vcond_mask_" + [(set (match_operand:SVE_ALL 0 "register_operand") (unspec:SVE_ALL - [(match_operand: 3 "register_operand" "Upa") - (match_operand:SVE_ALL 1 "register_operand" "w") - (match_operand:SVE_ALL 2 "register_operand" "w")] + [(match_operand: 3 "register_operand") + (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm") + (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero")] UNSPEC_SEL))] "TARGET_SVE" - "sel\t%0., %3, %1., %2." + { + if (register_operand (operands[1], mode)) + operands[2] = force_reg (mode, operands[2]); + } ) -;; Selects between a duplicated immediate and zero. -(define_insn "aarch64_sve_dup_const" - [(set (match_operand:SVE_I 0 "register_operand" "=w") - (unspec:SVE_I - [(match_operand: 1 "register_operand" "Upl") - (match_operand:SVE_I 2 "aarch64_sve_dup_immediate") - (match_operand:SVE_I 3 "aarch64_simd_imm_zero")] +;; Selects between: +;; - two registers +;; - a duplicated immediate and a register +;; - a duplicated immediate and zero +(define_insn "*vcond_mask_" + [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w, w, ?w, ?&w, ?&w") + (unspec:SVE_ALL + [(match_operand: 3 "register_operand" "Upa, Upa, Upa, Upa, Upl, Upl, Upl") + (match_operand:SVE_ALL 1 "aarch64_sve_reg_or_dup_imm" "w, vss, vss, Ufc, Ufc, vss, Ufc") + (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero" "w, 0, Dz, 0, Dz, w, w")] UNSPEC_SEL))] - "TARGET_SVE" - "mov\t%0., %1/z, #%2" + "TARGET_SVE + && (!register_operand (operands[1], mode) + || register_operand (operands[2], mode))" + "@ + sel\t%0., %3, %1., %2. + mov\t%0., %3/m, #%I1 + mov\t%0., %3/z, #%I1 + fmov\t%0., %3/m, #%1 + movprfx\t%0., %3/z, %0.\;fmov\t%0., %3/m, #%1 + movprfx\t%0, %2\;mov\t%0., %3/m, #%I1 + movprfx\t%0, %2\;fmov\t%0., %3/m, #%1" + [(set_attr "movprfx" "*,*,*,*,yes,yes,yes")] ) ;; ------------------------------------------------------------------------- @@ -3067,8 +3084,8 @@ (define_expand "vcond (match_operator 3 "comparison_operator" [(match_operand: 4 "register_operand") (match_operand: 5 "nonmemory_operand")]) - (match_operand:SVE_ALL 1 "register_operand") - (match_operand:SVE_ALL 2 "register_operand")))] + (match_operand:SVE_ALL 1 "nonmemory_operand") + (match_operand:SVE_ALL 2 "nonmemory_operand")))] "TARGET_SVE" { aarch64_expand_sve_vcond (mode, mode, operands); @@ -3084,8 +3101,8 @@ (define_expand "vcondu 4 "register_operand") (match_operand: 5 "nonmemory_operand")]) - (match_operand:SVE_ALL 1 "register_operand") - (match_operand:SVE_ALL 2 "register_operand")))] + (match_operand:SVE_ALL 1 "nonmemory_operand") + (match_operand:SVE_ALL 2 "nonmemory_operand")))] "TARGET_SVE" { aarch64_expand_sve_vcond (mode, mode, operands); @@ -3101,8 +3118,8 @@ (define_expand "vcond" (match_operator 3 "comparison_operator" [(match_operand: 4 "register_operand") (match_operand: 5 "aarch64_simd_reg_or_zero")]) - (match_operand:SVE_HSD 1 "register_operand") - (match_operand:SVE_HSD 2 "register_operand")))] + (match_operand:SVE_HSD 1 "nonmemory_operand") + (match_operand:SVE_HSD 2 "nonmemory_operand")))] "TARGET_SVE" { aarch64_expand_sve_vcond (mode, mode, operands); Index: gcc/config/aarch64/constraints.md =================================================================== --- gcc/config/aarch64/constraints.md 2019-08-14 10:14:27.899953691 +0100 +++ gcc/config/aarch64/constraints.md 2019-08-14 10:16:30.671052843 +0100 @@ -293,7 +293,7 @@ (define_memory_constraint "Utx" (define_constraint "Ufc" "A floating point constant which can be used with an\ FMOV immediate operation." - (and (match_code "const_double") + (and (match_code "const_double,const_vector") (match_test "aarch64_float_const_representable_p (op)"))) (define_constraint "Uvi" @@ -400,6 +400,12 @@ (define_constraint "vsc" CMP instructions." (match_operand 0 "aarch64_sve_cmp_vsc_immediate")) +(define_constraint "vss" + "@internal + A constraint that matches a signed immediate operand valid for SVE + DUP instructions." + (match_test "aarch64_sve_dup_immediate_p (op)")) + (define_constraint "vsd" "@internal A constraint that matches an unsigned immediate operand valid for SVE Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_18.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_18.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define DEF_LOOP(TYPE, NAME, CONST) \ + void \ + test_##TYPE##_##NAME (TYPE *restrict x, \ + TYPE *restrict pred, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + x[i] = pred[i] > 0 ? CONST : 0; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, 2, 2.0) \ + T (TYPE, 1p25, 1.25) \ + T (TYPE, 32p25, 32.25) \ + T (TYPE, m4, -4.0) \ + T (TYPE, m2p5, -2.5) \ + T (TYPE, m64p5, -64.5) + +#define TEST_ALL(T) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, double) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/z, #16384\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/z, #15616\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/z, #-15360\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/z, #-16128\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.h, p[0-7], z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ + +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.s), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.s), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.s), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.s), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, p[0-7], z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ + +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.d), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.d), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.d), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tmovprfx\t(z[0-9]+\.d), (p[0-7])/z, \1\n\tfmov\t\1, \2/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.d, p[0-7], z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_18_run.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_18_run.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,30 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_18.c" + +#define N 97 + +#define TEST_LOOP(TYPE, NAME, CONST) \ + { \ + TYPE x[N], pred[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + pred[i] = i % 5 <= i % 6; \ + asm volatile ("" ::: "memory"); \ + } \ + test_##TYPE##_##NAME (x, pred, N); \ + for (int i = 0; i < N; ++i) \ + { \ + if (x[i] != (TYPE) (pred[i] > 0 ? CONST : 0)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (int argc, char **argv) +{ + TEST_ALL (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_19.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_19.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define DEF_LOOP(TYPE, NAME, CONST) \ + void \ + test_##TYPE##_##NAME (TYPE *restrict x, \ + TYPE *restrict pred, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + x[i] = pred[i] > 0 ? CONST : pred[i]; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, 2, 2.0) \ + T (TYPE, 1p25, 1.25) \ + T (TYPE, 32p25, 32.25) \ + T (TYPE, m4, -4.0) \ + T (TYPE, m2p5, -2.5) \ + T (TYPE, m64p5, -64.5) + +#define TEST_ALL(T) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, double) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #16384\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #15616\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #-15360\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #-16128\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.h, p[0-7], z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ + +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, p[0-7], z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ + +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.d, p[0-7], z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_19_run.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_19_run.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,30 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_19.c" + +#define N 97 + +#define TEST_LOOP(TYPE, NAME, CONST) \ + { \ + TYPE x[N], pred[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + pred[i] = i % 5 <= i % 6 ? i : 0; \ + asm volatile ("" ::: "memory"); \ + } \ + test_##TYPE##_##NAME (x, pred, N); \ + for (int i = 0; i < N; ++i) \ + { \ + if (x[i] != (TYPE) (pred[i] > 0 ? CONST : pred[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (int argc, char **argv) +{ + TEST_ALL (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_20.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_20.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define DEF_LOOP(TYPE, NAME, CONST) \ + void \ + test_##TYPE##_##NAME (TYPE *restrict x, \ + TYPE *restrict pred, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + x[i] = pred[i] > 0 ? CONST : 12.0; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, 2, 2.0) \ + T (TYPE, 1p25, 1.25) \ + T (TYPE, 32p25, 32.25) \ + T (TYPE, m4, -4.0) \ + T (TYPE, m2p5, -2.5) \ + T (TYPE, m64p5, -64.5) + +#define TEST_ALL(T) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, double) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #16384\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #15616\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #-15360\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]/m, #-16128\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.h, p[0-7], z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ + +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.s), p[0-7]/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, p[0-7], z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ + +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #2\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #1\.25(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #-4\.0(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.d), p[0-7]/m, #-2\.5(?:e[+]0)?\n} } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.d, p[0-7], z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+, z[0-9]+\n} 12 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_20_run.c =================================================================== --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_20_run.c 2019-08-14 10:16:30.671052843 +0100 @@ -0,0 +1,30 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_20.c" + +#define N 97 + +#define TEST_LOOP(TYPE, NAME, CONST) \ + { \ + TYPE x[N], pred[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + pred[i] = i % 5 <= i % 6; \ + asm volatile ("" ::: "memory"); \ + } \ + test_##TYPE##_##NAME (x, pred, N); \ + for (int i = 0; i < N; ++i) \ + { \ + if (x[i] != (TYPE) (pred[i] > 0 ? CONST : 12.0)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (int argc, char **argv) +{ + TEST_ALL (TEST_LOOP) + return 0; +}