From patchwork Fri Nov 17 15:11:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 839042 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-467148-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="XUyhroUV"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ydhQs0Rgmz9s1h for ; Sat, 18 Nov 2017 02:11:32 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=E+vJaA9QuxfAGlZGLwDA5NXYV447TiXixlQDKfeeDOR8+q/VMT9CU ErC4JMseeJAonkND+4DHaugJprFm1QVq0/5sYfQ4xTZiE4mCBBbM0Na4FEGUKxzX bvTp/r8FFPYu+Agx1JcXndQ8Blepou9m5k8miTiReBRoAbhOsawMXU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=XasJxUb4Dm7ozdw+C0Ac2ob3JG4=; b=XUyhroUVFJ1G7ACXkaTR HFalucf+XIO1k43XjQm7Hs3w78pnFBw5HlRwwwmVgy6ik7afIK4Xa4bMCC610cKJ frIfFyCnAuMgXq4+Xf5RuDwvft+efk8Bz5Pi/HKIZHjy9nfta6zDstY0tKlxGQv3 gKYgO68hwb/vcE7bFg1SsKo= Received: (qmail 6122 invoked by alias); 17 Nov 2017 15:11:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 6106 invoked by uid 89); 17 Nov 2017 15:11:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f46.google.com Received: from mail-wm0-f46.google.com (HELO mail-wm0-f46.google.com) (74.125.82.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 15:11:15 +0000 Received: by mail-wm0-f46.google.com with SMTP id b189so7118937wmd.0 for ; Fri, 17 Nov 2017 07:11:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=gBK8rLeECj46ipSHWbbBQhXsd0OwkzqsaKdW0TL/muc=; b=r9+DhhJtRQs6ZijXy8C+3rOF4RcBirY+sqHF4qpEyiZoqJZhzs/ZQoxFeZg/+ypD7H /BKhwqOmQyPkRxftIOD4RAPdqYD8N8Bk/vt0XqYXfiIs6hV7fOhWZRlRZmyfDiJXyesh JzUYC6MGwK8MihxHhMOasoM1eeYwl8mU+rntCscqxeT6JLclTKxez6/+NiHz6h1L0m0C qtTyg2QGiKPfmyYgF2BUY62OK5Y2lQ7nKHcg4pkiICKVCnoCf8w2OCp1ZgUlQ+96N6CN bqSdWjqZg1VXvgsXr3zq+ytm4lfmjue4rd4X/YhguHKBF4s4r8DNkzNee5BVB6Vnal8Z mw3Q== X-Gm-Message-State: AJaThX5VW2m2Sj57mJrHef+3MpBqsT8hoOzPbkOThbWmNTZDek57dflk 00qkzVpgtkmOHmXSWKjHtreOLF3MvSY= X-Google-Smtp-Source: AGs4zMZlt9BI5HHFlvZAn3pnaV4qenHARAOBF/3I60469xfm6nEUQrnVwdO6vFktOwvYJ5jhPPzwtg== X-Received: by 10.28.54.3 with SMTP id d3mr4484493wma.79.1510931473017; Fri, 17 Nov 2017 07:11:13 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id p28sm6848341wmf.2.2017.11.17.07.11.11 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 07:11:12 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Allow the number of iterations to be smaller than VF Date: Fri, 17 Nov 2017 15:11:10 +0000 Message-ID: <87d14hym7l.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Fully-masked loops can be profitable even if the iteration count is smaller than the vectorisation factor. In this case we're effectively doing a complete unroll followed by SLP. The documentation for min-vect-loop-bound says that the default value is 0, but actually the default and minimum were 1. We need it to be 0 for this case since the parameter counts a whole number of vector iterations. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * doc/sourcebuild.texi (vect_fully_masked): Document. * params.def (PARAM_MIN_VECT_LOOP_BOUND): Change minimum and default value to 0. * tree-vect-loop.c (vect_analyze_loop_costing): New function, split out from... (vect_analyze_loop_2): ...here. Don't check the vectorization factor against the number of loop iterations if the loop is fully-masked. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fully_masked): New proc. * gcc.dg/vect/slp-3.c: Expect all loops to be vectorized if vect_fully_masked. * gcc.target/aarch64/sve_loop_add_4.c: New test. * gcc.target/aarch64/sve_loop_add_4_run.c: Likewise. * gcc.target/aarch64/sve_loop_add_5.c: Likewise. * gcc.target/aarch64/sve_loop_add_5_run.c: Likewise. * gcc.target/aarch64/sve_miniloop_1.c: Likewise. * gcc.target/aarch64/sve_miniloop_2.c: Likewise. Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2017-11-17 15:09:28.740330131 +0000 +++ gcc/doc/sourcebuild.texi 2017-11-17 15:09:28.967330125 +0000 @@ -1403,6 +1403,10 @@ Target supports hardware vectors of @cod @item vect_long_long Target supports hardware vectors of @code{long long}. +@item vect_fully_masked +Target supports fully-masked (also known as fully-predicated) loops, +so that vector loops can handle partial as well as full vectors. + @item vect_masked_store Target supports vector masked stores. Index: gcc/params.def =================================================================== --- gcc/params.def 2017-11-17 15:09:28.740330131 +0000 +++ gcc/params.def 2017-11-17 15:09:28.967330125 +0000 @@ -139,7 +139,7 @@ DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS, DEFPARAM (PARAM_MIN_VECT_LOOP_BOUND, "min-vect-loop-bound", "If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.", - 1, 1, 0) + 0, 0, 0) /* The maximum number of instructions to consider when looking for an instruction to fill a delay slot. If more than this arbitrary Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-11-17 15:09:28.740330131 +0000 +++ gcc/tree-vect-loop.c 2017-11-17 15:09:28.969330125 +0000 @@ -1893,6 +1893,101 @@ vect_analyze_loop_operations (loop_vec_i return true; } +/* Analyze the cost of the loop described by LOOP_VINFO. Decide if it + is worthwhile to vectorize. Return 1 if definitely yes, 0 if + definitely no, or -1 if it's worth retrying. */ + +static int +vect_analyze_loop_costing (loop_vec_info loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo); + + /* Only fully-masked loops can have iteration counts less than the + vectorization factor. */ + if (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + { + HOST_WIDE_INT max_niter; + + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + max_niter = LOOP_VINFO_INT_NITERS (loop_vinfo); + else + max_niter = max_stmt_executions_int (loop); + + if (max_niter != -1 + && (unsigned HOST_WIDE_INT) max_niter < assumed_vf) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: iteration count smaller than " + "vectorization factor.\n"); + return 0; + } + } + + int min_profitable_iters, min_profitable_estimate; + vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters, + &min_profitable_estimate); + + if (min_profitable_iters < 0) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: vectorization not profitable.\n"); + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: vector version will never be " + "profitable.\n"); + return -1; + } + + int min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) + * assumed_vf); + + /* Use the cost model only if it is more conservative than user specified + threshold. */ + unsigned int th = (unsigned) MAX (min_scalar_loop_bound, + min_profitable_iters); + + LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; + + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && LOOP_VINFO_INT_NITERS (loop_vinfo) < th) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: vectorization not profitable.\n"); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "not vectorized: iteration count smaller than user " + "specified loop bound parameter or minimum profitable " + "iterations (whichever is more conservative).\n"); + return 0; + } + + HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop); + if (estimated_niter == -1) + estimated_niter = likely_max_stmt_executions_int (loop); + if (estimated_niter != -1 + && ((unsigned HOST_WIDE_INT) estimated_niter + < MAX (th, (unsigned) min_profitable_estimate))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: estimated iteration count too " + "small.\n"); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "not vectorized: estimated iteration count smaller " + "than specified loop bound parameter or minimum " + "profitable iterations (whichever is more " + "conservative).\n"); + return -1; + } + + return 1; +} + /* Function vect_analyze_loop_2. @@ -1903,6 +1998,7 @@ vect_analyze_loop_operations (loop_vec_i vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal) { bool ok; + int res; unsigned int max_vf = MAX_VECTORIZATION_FACTOR; poly_uint64 min_vf = 2; unsigned int n_stmts = 0; @@ -2060,9 +2156,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ vect_compute_single_scalar_iteration_cost (loop_vinfo); poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - HOST_WIDE_INT estimated_niter; unsigned th; - int min_scalar_loop_bound; /* Check the SLP opportunities in the loop, analyze and build SLP trees. */ ok = vect_analyze_slp (loop_vinfo, n_stmts); @@ -2092,7 +2186,6 @@ vect_analyze_loop_2 (loop_vec_info loop_ /* Now the vectorization factor is final. */ poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); gcc_assert (must_ne (vectorization_factor, 0U)); - unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo); if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ()) { @@ -2105,17 +2198,6 @@ vect_analyze_loop_2 (loop_vec_info loop_ HOST_WIDE_INT max_niter = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); - if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && (LOOP_VINFO_INT_NITERS (loop_vinfo) < assumed_vf)) - || (max_niter != -1 - && (unsigned HOST_WIDE_INT) max_niter < assumed_vf)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: iteration count smaller than " - "vectorization factor.\n"); - return false; - } /* Analyze the alignment of the data-refs in the loop. Fail if a data reference is found that cannot be vectorized. */ @@ -2229,65 +2311,16 @@ vect_analyze_loop_2 (loop_vec_info loop_ } } - /* Analyze cost. Decide if worth while to vectorize. */ - int min_profitable_estimate, min_profitable_iters; - vect_estimate_min_profitable_iters (loop_vinfo, &min_profitable_iters, - &min_profitable_estimate); - - if (min_profitable_iters < 0) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: vectorization not profitable.\n"); - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: vector version will never be " - "profitable.\n"); - goto again; - } - - min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) - * assumed_vf); - - /* Use the cost model only if it is more conservative than user specified - threshold. */ - th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters); - - LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; - - if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) - && LOOP_VINFO_INT_NITERS (loop_vinfo) < th) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: vectorization not profitable.\n"); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "not vectorized: iteration count smaller than user " - "specified loop bound parameter or minimum profitable " - "iterations (whichever is more conservative).\n"); - goto again; - } - - estimated_niter - = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo)); - if (estimated_niter == -1) - estimated_niter = max_niter; - if (estimated_niter != -1 - && ((unsigned HOST_WIDE_INT) estimated_niter - < MAX (th, (unsigned) min_profitable_estimate))) + /* Check the costings of the loop make vectorizing worthwhile. */ + res = vect_analyze_loop_costing (loop_vinfo); + if (res < 0) + goto again; + if (!res) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: estimated iteration count too " - "small.\n"); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "not vectorized: estimated iteration count smaller " - "than specified loop bound parameter or minimum " - "profitable iterations (whichever is more " - "conservative).\n"); - goto again; + "Loop costings not worthwhile.\n"); + return false; } /* Decide whether we need to create an epilogue loop to handle @@ -3869,7 +3902,6 @@ vect_estimate_min_profitable_iters (loop * assumed_vf - vec_inside_cost * peel_iters_prologue - vec_inside_cost * peel_iters_epilogue); - if (min_profitable_iters <= 0) min_profitable_iters = 0; else Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2017-11-17 15:09:28.740330131 +0000 +++ gcc/testsuite/lib/target-supports.exp 2017-11-17 15:09:28.968330125 +0000 @@ -6434,6 +6434,12 @@ proc check_effective_target_vect_natural return $et_vect_natural_alignment } +# Return true if fully-masked loops are supported. + +proc check_effective_target_vect_fully_masked { } { + return [check_effective_target_aarch64_sve] +} + # Return 1 if the target doesn't prefer any alignment beyond element # alignment during vectorization. Index: gcc/testsuite/gcc.dg/vect/slp-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/slp-3.c 2017-11-17 15:09:28.740330131 +0000 +++ gcc/testsuite/gcc.dg/vect/slp-3.c 2017-11-17 15:09:28.967330125 +0000 @@ -141,6 +141,8 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { ! vect_fully_masked } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target vect_fully_masked } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target { ! vect_fully_masked } } } }*/ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target vect_fully_masked } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_loop_add_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_loop_add_4.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,96 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */ + +#include + +#define LOOP(TYPE, NAME, STEP) \ + __attribute__((noinline, noclone)) \ + void \ + test_##TYPE##_##NAME (TYPE *dst, TYPE base, int count) \ + { \ + for (int i = 0; i < count; ++i, base += STEP) \ + dst[i] += base; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, m17, -17) \ + T (TYPE, m16, -16) \ + T (TYPE, m15, -15) \ + T (TYPE, m1, -1) \ + T (TYPE, 1, 1) \ + T (TYPE, 15, 15) \ + T (TYPE, 16, 16) \ + T (TYPE, 17, 17) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, int64_t) + +TEST_ALL (LOOP) + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]+/z, \[x[0-9]+, x[0-9]+\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7]+, \[x[0-9]+, x[0-9]+\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tincb\tx[0-9]+\n} 8 } } */ + +/* { dg-final { scan-assembler-not {\tdecb\tz[0-9]+\.b} } } */ +/* We don't need to increment the vector IV for steps -16 and 16, since the + increment is always a multiple of 256. */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 14 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tincb\tx[0-9]+\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdech\tz[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tinch\tz[0-9]+\.h, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 10 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tincw\tx[0-9]+\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdecw\tz[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincw\tz[0-9]+\.s, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 10 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tincd\tx[0-9]+\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tdecd\tz[0-9]+\.d\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d, all, mul #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tincd\tz[0-9]+\.d, all, mul #16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 10 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_loop_add_4_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_loop_add_4_run.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,30 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_loop_add_4.c" + +#define N 131 +#define BASE 41 + +#define TEST_LOOP(TYPE, NAME, STEP) \ + { \ + TYPE a[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + a[i] = i * i + i % 5; \ + asm volatile ("" ::: "memory"); \ + } \ + test_##TYPE##_##NAME (a, BASE, N); \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE expected = i * i + i % 5 + BASE + i * STEP; \ + if (a[i] != expected) \ + __builtin_abort (); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST_ALL (TEST_LOOP) +} Index: gcc/testsuite/gcc.target/aarch64/sve_loop_add_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_loop_add_5.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256" } */ + +#include "sve_loop_add_4.c" + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]+/z, \[x[0-9]+, x[0-9]+\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7]+, \[x[0-9]+, x[0-9]+\]} 8 } } */ + +/* The induction vector is invariant for steps of -16 and 16. */ +/* { dg-final { scan-assembler-not {\tsub\tz[0-9]+\.b, z[0-9]+\.b, #} } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, #} 6 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */ + +/* The (-)17 * 16 is out of range. */ +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.h, z[0-9]+\.h, #} 2 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, #} 4 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 10 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.s, w[0-9]+, w[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 2\]} 8 } } */ + +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.s, z[0-9]+\.s, #} 4 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, #} 4 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-16\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #-15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, #15\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.d, x[0-9]+, x[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]+/z, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7]+, \[x[0-9]+, x[0-9]+, lsl 3\]} 8 } } */ + +/* { dg-final { scan-assembler-times {\tsub\tz[0-9]+\.d, z[0-9]+\.d, #} 4 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, #} 4 } } */ +/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_loop_add_5_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_loop_add_5_run.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,5 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256" { target aarch64_sve256_hw } } */ + +#include "sve_loop_add_4_run.c" Index: gcc/testsuite/gcc.target/aarch64/sve_miniloop_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_miniloop_1.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,23 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +void loop (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c, + int * __restrict__ d, int * __restrict__ e, int * __restrict__ f, + int * __restrict__ g, int * __restrict__ h) +{ + int i = 0; + for (i = 0; i < 3; i++) + { + a[i] += i; + b[i] += i; + c[i] += i; + d[i] += i; + e[i] += i; + f[i] += a[i] + 7; + g[i] += b[i] - 3; + h[i] += c[i] + 3; + } +} + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, } 8 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, } 8 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_miniloop_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_miniloop_2.c 2017-11-17 15:09:28.967330125 +0000 @@ -0,0 +1,7 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps -msve-vector-bits=256" } */ + +#include "sve_miniloop_1.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, } 8 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, } 8 } } */