From patchwork Sat Nov 16 15:39:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1196132 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513795-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="JWVX5vRm"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47FfXX3lHnz9sP6 for ; Sun, 17 Nov 2019 02:39:22 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=Q6YavoFCU2OHr939rim9ThA+gxR2JXQ9iA4xYwlKzZhWuZJ0oqBtT kqQwalb6shyTNGlHxaSOGBPUhmGKaX8lknLhps4kUjle/zYIfBeCiuYqaock144e 10WW8p75drNyOp0cLAajWGqFJ1wvhwI8y+gvPbqcF0xDksVYFA7bHI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=StHxYrgEUXJBO7Xzoale5sUo/t0=; b=JWVX5vRm5kdcj2cHfl/u 7AybsJfRulJ3elzHEQ7yWsWaXsmPwE0HfrFMpetrX6Ik7ZLSVr9ceLAtQ0Q5lTbg ohcIzytzk1AnyWo27g2BsUXq4r3DI+LmzQCoWWwRmexK6KMY8+sUzVZHfB2oCUsy sDl5poiau0X4ubUeqNGzrgo= Received: (qmail 103334 invoked by alias); 16 Nov 2019 15:39:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 103326 invoked by uid 89); 16 Nov 2019 15:39:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 16 Nov 2019 15:39:10 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4906930E for ; Sat, 16 Nov 2019 07:39:08 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C69C93F6C4 for ; Sat, 16 Nov 2019 07:39:07 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Add optabs for accelerating RAW and WAR alias checks Date: Sat, 16 Nov 2019 15:39:06 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes This patch adds optabs that check whether a read followed by a write or a write followed by a read can be divided into interleaved byte accesses without changing the dependencies between the bytes. This is one of the uses of the SVE2 WHILERW and WHILEWR instructions. (The instructions can also be used to limit the VF at runtime, but that's future work.) This applies on top of: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00787.html Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard 2019-11-16 Richard Sandiford gcc/ * doc/sourcebuild.texi (vect_check_ptrs): Document. * optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs. * doc/md.texi: Document them. * internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New internal functions. * internal-fn.h (internal_check_ptrs_fn_supported_p): Declare. * internal-fn.c (check_ptrs_direct): New macro. (expand_check_ptrs_optab_fn): Likewise. (direct_check_ptrs_optab_supported_p): Likewise. (internal_check_ptrs_fn_supported_p): New fuction. * tree-data-ref.c: Include internal-fn.h. (create_ifn_alias_checks): New function. (create_intersect_range_checks): Use it. * config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator. (optab, cmp_op): Handle it. (raw_war, unspec): New int attributes. * config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New constants. * config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand): New predicate. * config/aarch64/aarch64-sve2.md (check__ptrs): New expander. (@aarch64_sve2_while_ptest): New pattern. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_check_ptrs): New procedure. * gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be used, if available. * gcc.dg/vect/vect-alias-check-15.c: Likewise. * gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW. * gcc.target/aarch64/sve2/whilerw_1.c: New test. * gcc.target/aarch64/sve2/whilewr_1.c: Likewise. * gcc.target/aarch64/sve2/whilewr_2.c: Likewise. Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2019-11-16 15:33:44.000000000 +0000 +++ gcc/doc/sourcebuild.texi 2019-11-16 15:33:44.726783462 +0000 @@ -1487,6 +1487,10 @@ Target supports hardware vectors of @cod @item vect_long_long Target supports hardware vectors of @code{long long}. +@item vect_check_ptrs +Target supports the @code{check_raw_ptrs} and @code{check_war_ptrs} +optabs on vectors. + @item vect_fully_masked Target supports fully-masked (also known as fully-predicated) loops, so that vector loops can handle partial as well as full vectors. Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2019-11-16 15:33:44.000000000 +0000 +++ gcc/optabs.def 2019-11-16 15:33:44.730783434 +0000 @@ -429,6 +429,9 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") +OPTAB_D (check_raw_ptrs_optab, "check_raw_ptrs$a") +OPTAB_D (check_war_ptrs_optab, "check_war_ptrs$a") + OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2019-11-16 15:33:44.000000000 +0000 +++ gcc/doc/md.texi 2019-11-16 15:33:44.726783462 +0000 @@ -5076,6 +5076,37 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n} operand0[i] = operand0[i - 1] && (operand1 + i < operand2); @end smallexample +@cindex @code{check_raw_ptrs@var{m}} instruction pattern +@item @samp{check_raw_ptrs@var{m}} +Check whether, given two pointers @var{a} and @var{b} and a length @var{len}, +a write of @var{len} bytes at @var{a} followed by a read of @var{len} bytes +at @var{b} can be split into interleaved byte accesses +@samp{@var{a}[0], @var{b}[0], @var{a}[1], @var{b}[1], @dots{}} +without affecting the dependencies between the bytes. Set operand 0 +to true if the split is possible and false otherwise. + +Operands 1, 2 and 3 provide the values of @var{a}, @var{b} and @var{len} +respectively. Operand 4 is a constant integer that provides the known +common alignment of @var{a} and @var{b}. All inputs have mode @var{m}. + +This split is possible if: + +@smallexample +@var{a} == @var{b} || @var{a} + @var{len} <= @var{b} || @var{b} + @var{len} <= @var{a} +@end smallexample + +You should only define this pattern if the target has a way of accelerating +the test without having to do the individual comparisons. + +@cindex @code{check_war_ptrs@var{m}} instruction pattern +@item @samp{check_war_ptrs@var{m}} +Like @samp{check_raw_ptrs@var{m}}, but with the read and write swapped round. +The split is possible in this case if: + +@smallexample +@var{b} <= @var{a} || @var{a} + @var{len} <= @var{b} +@end smallexample + @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern @item @samp{vec_cmp@var{m}@var{n}} Output a vector comparison. Operand 0 of mode @var{n} is the destination for Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2019-11-16 15:33:44.000000000 +0000 +++ gcc/internal-fn.def 2019-11-16 15:33:44.730783434 +0000 @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3. - cond_ternary: a conditional ternary optab, such as cond_fma_rev - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode + - check_ptrs: used for check_{raw,war}_ptrs DEF_INTERNAL_SIGNED_OPTAB_FN defines an internal function that maps to one of two optabs, depending on the signedness of an input. @@ -136,6 +137,10 @@ DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, vec_mask_store_lanes, mask_store_lanes) DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while) +DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW, + check_raw_ptrs, check_ptrs) +DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW, + check_war_ptrs, check_ptrs) DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2019-11-16 15:33:44.000000000 +0000 +++ gcc/internal-fn.h 2019-11-16 15:33:44.730783434 +0000 @@ -221,6 +221,8 @@ extern int internal_fn_mask_index (inter extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, tree, int); +extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, + poly_uint64, unsigned int); extern void expand_internal_call (gcall *); extern void expand_internal_call (internal_fn, gcall *); Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2019-11-16 15:33:44.000000000 +0000 +++ gcc/internal-fn.c 2019-11-16 15:33:44.726783462 +0000 @@ -118,6 +118,7 @@ #define while_direct { 0, 2, false } #define fold_extract_direct { 2, 2, false } #define fold_left_direct { 1, 1, false } #define mask_fold_left_direct { 1, 1, false } +#define check_ptrs_direct { 0, 0, false } const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, @@ -3006,6 +3007,9 @@ #define expand_fold_left_optab_fn(FN, ST #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 3) +#define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 4) + /* RETURN_TYPE and ARGS are a return type and argument list that are in principle compatible with FN (which satisfies direct_internal_fn_p). Return the types that should be used to determine whether the @@ -3095,6 +3099,7 @@ #define direct_while_optab_supported_p c #define direct_fold_extract_optab_supported_p direct_optab_supported_p #define direct_fold_left_optab_supported_p direct_optab_supported_p #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p +#define direct_check_ptrs_optab_supported_p direct_optab_supported_p /* Return the optab used by internal function FN. */ @@ -3572,6 +3577,24 @@ internal_gather_scatter_fn_supported_p ( && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); } +/* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN + for pointers of type TYPE when the accesses have LENGTH bytes and their + common byte alignment is ALIGN. */ + +bool +internal_check_ptrs_fn_supported_p (internal_fn ifn, tree type, + poly_uint64 length, unsigned int align) +{ + machine_mode mode = TYPE_MODE (type); + optab optab = direct_internal_fn_optab (ifn); + insn_code icode = direct_optab_handler (optab, mode); + if (icode == CODE_FOR_nothing) + return false; + rtx length_rtx = immed_wide_int_const (length, mode); + return (insn_operand_matches (icode, 3, length_rtx) + && insn_operand_matches (icode, 4, GEN_INT (align))); +} + /* Expand STMT as though it were a call to internal function FN. */ void Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2019-11-16 15:33:44.000000000 +0000 +++ gcc/tree-data-ref.c 2019-11-16 15:33:44.734783405 +0000 @@ -96,6 +96,7 @@ Software Foundation; either version 3, o #include "builtins.h" #include "tree-eh.h" #include "ssa.h" +#include "internal-fn.h" static struct datadep_stats { @@ -1719,6 +1720,80 @@ prune_runtime_alias_test_list (vec_ptrs" + [(match_operand:GPI 0 "register_operand") + (unspec:VNx16BI + [(match_operand:GPI 1 "register_operand") + (match_operand:GPI 2 "register_operand") + (match_operand:GPI 3 "aarch64_bytes_per_sve_vector_operand") + (match_operand:GPI 4 "const_int_operand")] + SVE2_WHILE_PTR)] + "TARGET_SVE2" +{ + /* Use the widest predicate mode we can. */ + unsigned int align = INTVAL (operands[4]); + if (align > 8) + align = 8; + machine_mode pred_mode = aarch64_sve_pred_mode (align).require (); + + /* Emit a WHILERW or WHILEWR, setting the condition codes based on + the result. */ + emit_insn (gen_aarch64_sve2_while_ptest + (, mode, pred_mode, + gen_rtx_SCRATCH (pred_mode), operands[1], operands[2], + CONSTM1_RTX (VNx16BImode), CONSTM1_RTX (pred_mode))); + + /* Set operand 0 to true if the last bit of the predicate result is set, + i.e. if all elements are free of dependencies. */ + rtx cc_reg = gen_rtx_REG (CC_NZCmode, CC_REGNUM); + rtx cmp = gen_rtx_LTU (mode, cc_reg, const0_rtx); + emit_insn (gen_aarch64_cstore (operands[0], cmp, cc_reg)); + DONE; +}) + +;; A WHILERW or WHILEWR in which only the flags result is interesting. +(define_insn_and_rewrite "@aarch64_sve2_while_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand 3) + (match_operand 4) + (const_int SVE_KNOWN_PTRUE) + (unspec:PRED_ALL + [(match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "register_operand" "r")] + SVE2_WHILE_PTR)] + UNSPEC_PTEST)) + (clobber (match_scratch:PRED_ALL 0 "=Upa"))] + "TARGET_SVE2" + "while\t%0., %x1, %x2" + ;; Force the compiler to drop the unused predicate operand, so that we + ;; don't have an unnecessary PTRUE. + "&& (!CONSTANT_P (operands[3]) || !CONSTANT_P (operands[4]))" + { + operands[3] = CONSTM1_RTX (VNx16BImode); + operands[4] = CONSTM1_RTX (mode); + } +) Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2019-11-16 15:33:44.000000000 +0000 +++ gcc/testsuite/lib/target-supports.exp 2019-11-16 15:33:44.730783434 +0000 @@ -6459,6 +6459,13 @@ proc check_effective_target_vect_natural return $et_vect_natural_alignment } +# Return true if the target supports the check_raw_ptrs and check_war_ptrs +# optabs on vectors. + +proc check_effective_target_vect_check_ptrs { } { + return [check_effective_target_aarch64_sve2] +} + # Return true if fully-masked loops are supported. proc check_effective_target_vect_fully_masked { } { Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-16 15:33:36.862838590 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c 2019-11-16 15:33:44.730783434 +0000 @@ -60,5 +60,6 @@ main (void) /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */ /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */ -/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" { target { ! vect_check_ptrs } } } } */ +/* { dg-final { scan-tree-dump "using an IFN_CHECK_WAR_PTRS test" "vect" { target vect_check_ptrs } } } */ /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-16 15:33:36.862838590 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c 2019-11-16 15:33:44.730783434 +0000 @@ -57,5 +57,6 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" { target { ! vect_check_ptrs } } } } */ +/* { dg-final { scan-tree-dump "using an IFN_CHECK_WAR_PTRS test" "vect" { target vect_check_ptrs } } } */ /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 2019-11-16 15:33:44.000000000 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c 2019-11-16 15:33:44.730783434 +0000 @@ -62,5 +62,6 @@ main (void) } /* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */ -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" { target { ! vect_check_ptrs } } } } */ +/* { dg-final { scan-tree-dump "using an IFN_CHECK_RAW_PTRS test" "vect" { target vect_check_ptrs } } } */ /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ Index: gcc/testsuite/gcc.target/aarch64/sve2/whilerw_1.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve2/whilerw_1.c 2019-11-16 15:33:44.730783434 +0000 @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-require-effective-target lp64 } */ + +#include + +#define TEST_LOOP(TYPE) \ + TYPE \ + test_##TYPE (TYPE *dst, TYPE *src, int n) \ + { \ + TYPE res = 0; \ + for (int i = 0; i < n; ++i) \ + { \ + dst[i] += 1; \ + res += src[i]; \ + } \ + return res; \ + } + +TEST_LOOP (int8_t); +TEST_LOOP (int16_t); +TEST_LOOP (int32_t); +TEST_LOOP (int64_t); + +/* { dg-final { scan-assembler-times {\twhilerw\t} 4 } } */ +/* { dg-final { scan-assembler-times {\twhilerw\tp[0-9]+\.b, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilerw\tp[0-9]+\.h, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilerw\tp[0-9]+\.s, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilerw\tp[0-9]+\.d, x[0-9]+, x1\n} 1 } } */ +/* { dg-final { scan-assembler-not {\twhilewr\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve2/whilewr_1.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve2/whilewr_1.c 2019-11-16 15:33:44.730783434 +0000 @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-require-effective-target lp64 } */ + +#include + +#define TEST_LOOP(TYPE) \ + void \ + test_##TYPE (TYPE *dst, TYPE *src1, TYPE *src2, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dst[i] = src1[i] + src2[i]; \ + } + +TEST_LOOP (int8_t); +TEST_LOOP (int16_t); +TEST_LOOP (int32_t); +TEST_LOOP (int64_t); + +/* { dg-final { scan-assembler-times {\twhilewr\t} 8 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.b, x1, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.b, x2, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.h, x1, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.h, x2, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.s, x1, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.s, x2, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.d, x1, x0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.d, x2, x0\n} 1 } } */ +/* { dg-final { scan-assembler-not {\twhilerw\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve2/whilewr_2.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve2/whilewr_2.c 2019-11-16 15:33:44.730783434 +0000 @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */ +/* { dg-require-effective-target lp64 } */ + +#include + +#define TEST_LOOP(TYPE) \ + void \ + test_##TYPE (TYPE *dst1, TYPE *dst2, TYPE *dst3, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + { \ + dst1[i] = 1; \ + dst2[i] = 2; \ + dst3[i] = 3; \ + } \ + } + +TEST_LOOP (int8_t); +TEST_LOOP (int16_t); +TEST_LOOP (int32_t); +TEST_LOOP (int64_t); + +/* { dg-final { scan-assembler-times {\twhilewr\t} 12 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.b, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.b, x0, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.b, x1, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.h, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.h, x0, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.h, x1, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.s, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.s, x0, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.s, x1, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.d, x0, x1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.d, x0, x2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\twhilewr\tp[0-9]+\.d, x1, x2\n} 1 } } */ +/* { dg-final { scan-assembler-not {\twhilerw\t} } } */