From patchwork Wed May 16 10:18:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 914517 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-477750-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="C2jTSBl3"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40m9QH5lzHz9s0y for ; Wed, 16 May 2018 20:19:03 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=GGyvLwUvxAviEJzBR43YtYBaonjqZ7y/dJ06ziQka7RayU21iK3Uf ufdwFHvQZsOykidrzJsVJREnzTFMHeNqL7QdCtPCl+uEAMOyGM9rbydU5Tye0ApV rzDXbKFAiX5h4o6uC9obz597hcetjkeJbS38aPSTRzQGSfJkrsd2Tw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=mCLk1NxF37kz89zDrQ7lfVfKphA=; b=C2jTSBl3aUP+VY7o8Pc+ cqYjDCo5qjbN8JW/8A1BDElqliwcwxf2cPmicPA6uhXy4fjLwTf/nw0CXXna/mko A/dShLOSSYPtwjzgqP1RubgMO3b7vH9OjcP793/EdfuLXFzS3nR5UXEQXTH4V6PO 7jGqYl8yuF9fatAVX/Jq1uo= Received: (qmail 114833 invoked by alias); 16 May 2018 10:18:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 114824 invoked by uid 89); 16 May 2018 10:18:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f49.google.com Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 May 2018 10:18:52 +0000 Received: by mail-wm0-f49.google.com with SMTP id f6-v6so316532wmc.4 for ; Wed, 16 May 2018 03:18:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=BphQS4NFCPiGksRUQAUFt4LKUomcxLlmPOPO0dI2Yb4=; b=dds7Jdez+vGM/V1XRMEWBbii8qRPPPJBVOTfmEWMsY7hr1ZSAvGVBwuWSj8+tQPX0D fJfNhce/lewyoYMg5Fl8Fw/cKbiSBR/W5r24hSdco0hrBlNhm9oupeWyoFn0cPiOINky JelrUVsd10WHpf4IL2Hjv3NoJBZes+Mh+evsymzFxN5jqlqTDQAsFb6hZs2a2CNFbM9l D4YSKZeNMZhpXjdNvSHW6rJ0K+OAfE3cOISYPhvB8mtI76l0GLUA/oh0ma7QHk2HbVD7 3O6O7eZzTvA/A2AjZavkOaJB8bfzit2claD54bnO5NRjXIG+xHWo1eq4k3nvlewwf/xa C2vA== X-Gm-Message-State: ALKqPwfCOAphpkkH8TZLxdVd6b1irKNt/+GxGfBMClVTMk8cDvfSPnk+ gvWVK+kCMUW0g2HInywmPuxNFe9Oppc= X-Google-Smtp-Source: AB8JxZpWll10PqIVr11DtSQwKB0myBz1RsvbtlRqk4SNF+c3ksRBvBVTFKx/q9sMJNa70fpgtiKxHQ== X-Received: by 2002:a1c:4a0d:: with SMTP id x13-v6mr157168wma.150.1526465929696; Wed, 16 May 2018 03:18:49 -0700 (PDT) Received: from localhost ([217.140.96.141]) by smtp.gmail.com with ESMTPSA id b66-v6sm3863094wma.48.2018.05.16.03.18.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 16 May 2018 03:18:49 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Implement SLP of internal functions Date: Wed, 16 May 2018 11:18:48 +0100 Message-ID: <87muwzoqd3.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 SLP of calls was previously restricted to built-in functions. This patch extends it to internal functions. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and x86_64-linux-gnu. OK to install? Richard 2018-05-16 Richard Sandiford gcc/ * internal-fn.h (vectorizable_internal_fn_p): New function. * tree-vect-slp.c (compatible_calls_p): Likewise. (vect_build_slp_tree_1): Remove nops argument. Handle calls to internal functions. (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1. gcc/testsuite/ * gcc.target/aarch64/sve/cond_arith_4.c: New test. * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_arith_5.c: Likewise. * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise. * gcc.target/aarch64/sve/slp_14.c: Likewise. * gcc.target/aarch64/sve/slp_14_run.c: Likewise. Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2018-05-16 11:06:14.513574219 +0100 +++ gcc/internal-fn.h 2018-05-16 11:12:11.872116220 +0100 @@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn) return direct_internal_fn_array[fn].type0 >= -1; } +/* Return true if FN is a direct internal function that can be vectorized by + converting the return type and all argument types to vectors of the same + number of elements. E.g. we can vectorize an IFN_SQRT on floats as an + IFN_SQRT on vectors of N floats. */ + +inline bool +vectorizable_internal_fn_p (internal_fn fn) +{ + return direct_internal_fn_array[fn].vectorizable; +} + /* Return optab information about internal function FN. Only meaningful if direct_internal_fn_p (FN). */ Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100 +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100 @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v return 0; } +/* Return true if call statements CALL1 and CALL2 are similar enough + to be combined into the same SLP group. */ + +static bool +compatible_calls_p (gcall *call1, gcall *call2) +{ + unsigned int nargs = gimple_call_num_args (call1); + if (nargs != gimple_call_num_args (call2)) + return false; + + if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) + return false; + + if (gimple_call_internal_p (call1)) + { + if (TREE_TYPE (gimple_call_lhs (call1)) + != TREE_TYPE (gimple_call_lhs (call2))) + return false; + for (unsigned int i = 0; i < nargs; ++i) + if (TREE_TYPE (gimple_call_arg (call1, i)) + != TREE_TYPE (gimple_call_arg (call2, i))) + return false; + } + else + { + if (!operand_equal_p (gimple_call_fn (call1), + gimple_call_fn (call2), 0)) + return false; + + if (gimple_call_fntype (call1) != gimple_call_fntype (call2)) + return false; + } + return true; +} + /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the caller's attempt to find the vector type in STMT with the narrowest element type. Return true if VECTYPE is nonnull and if it is valid @@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo, static bool vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, vec stmts, unsigned int group_size, - unsigned nops, poly_uint64 *max_nunits, - bool *matches, bool *two_operators) + poly_uint64 *max_nunits, bool *matches, + bool *two_operators) { unsigned int i; gimple *first_stmt = stmts[0], *stmt = stmts[0]; @@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, if (gcall *call_stmt = dyn_cast (stmt)) { rhs_code = CALL_EXPR; - if (gimple_call_internal_p (call_stmt) + if ((gimple_call_internal_p (call_stmt) + && (!vectorizable_internal_fn_p + (gimple_call_internal_fn (call_stmt)))) || gimple_call_tail_p (call_stmt) || gimple_call_noreturn_p (call_stmt) || !gimple_call_nothrow_p (call_stmt) @@ -833,11 +870,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, if (rhs_code == CALL_EXPR) { gimple *first_stmt = stmts[0]; - if (gimple_call_num_args (stmt) != nops - || !operand_equal_p (gimple_call_fn (first_stmt), - gimple_call_fn (stmt), 0) - || gimple_call_fntype (first_stmt) - != gimple_call_fntype (stmt)) + if (!compatible_calls_p (as_a (first_stmt), + as_a (stmt))) { if (dump_enabled_p ()) { @@ -1166,8 +1200,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, bool two_operators = false; unsigned char *swap = XALLOCAVEC (unsigned char, group_size); - if (!vect_build_slp_tree_1 (vinfo, swap, - stmts, group_size, nops, + if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size, &this_max_nunits, matches, &two_operators)) return NULL; Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,62 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST(TYPE, NAME, OP) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE##_##NAME (TYPE *__restrict x, \ + TYPE *__restrict y, \ + TYPE z1, TYPE z2, \ + TYPE *__restrict pred, int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ + } \ + } + +#define TEST_INT_TYPE(TYPE) \ + TEST (TYPE, div, /) + +#define TEST_FP_TYPE(TYPE) \ + TEST (TYPE, add, +) \ + TEST (TYPE, sub, -) \ + TEST (TYPE, mul, *) \ + TEST (TYPE, div, /) + +#define TEST_ALL \ + TEST_INT_TYPE (int32_t) \ + TEST_INT_TYPE (uint32_t) \ + TEST_INT_TYPE (int64_t) \ + TEST_INT_TYPE (uint64_t) \ + TEST_FP_TYPE (float) \ + TEST_FP_TYPE (double) + +TEST_ALL + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */ + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,32 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "cond_arith_4.c" + +#define N 98 + +#undef TEST +#define TEST(TYPE, NAME, OP) \ + { \ + TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ + for (int i = 0; i < N; ++i) \ + { \ + y[i] = i * i; \ + pred[i] = i % 3; \ + } \ + test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N); \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ + if (x[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + TEST_ALL + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include + +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ + void __attribute__ ((noinline, noclone)) \ + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x, \ + DATA_TYPE *__restrict y, \ + DATA_TYPE z1, DATA_TYPE z2, \ + DATA_TYPE *__restrict pred, \ + OTHER_TYPE *__restrict foo, \ + int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ + foo[i] += 1; \ + foo[i + 1] += 2; \ + } \ + } + +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \ + TEST (DATA_TYPE, OTHER_TYPE, div, /) + +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \ + TEST (DATA_TYPE, OTHER_TYPE, add, +) \ + TEST (DATA_TYPE, OTHER_TYPE, sub, -) \ + TEST (DATA_TYPE, OTHER_TYPE, mul, *) \ + TEST (DATA_TYPE, OTHER_TYPE, div, /) + +#define TEST_ALL \ + TEST_INT_TYPE (int32_t, int8_t) \ + TEST_INT_TYPE (int32_t, int16_t) \ + TEST_INT_TYPE (uint32_t, int8_t) \ + TEST_INT_TYPE (uint32_t, int16_t) \ + TEST_INT_TYPE (int64_t, int8_t) \ + TEST_INT_TYPE (int64_t, int16_t) \ + TEST_INT_TYPE (int64_t, int32_t) \ + TEST_INT_TYPE (uint64_t, int8_t) \ + TEST_INT_TYPE (uint64_t, int16_t) \ + TEST_INT_TYPE (uint64_t, int32_t) \ + TEST_FP_TYPE (float, int8_t) \ + TEST_FP_TYPE (float, int16_t) \ + TEST_FP_TYPE (double, int8_t) \ + TEST_FP_TYPE (double, int16_t) \ + TEST_FP_TYPE (double, int32_t) + +TEST_ALL + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* The load XFAILs for fixed-length SVE account for extra loads from the + constant pool. */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */ + +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */ + +/* 72 for x operations, 6 for foo operations. */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* 36 for x operations, 6 for foo operations. */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */ + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,35 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "cond_arith_5.c" + +#define N 98 + +#undef TEST +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ + { \ + DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ + OTHER_TYPE foo[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + y[i] = i * i; \ + pred[i] = i % 3; \ + foo[i] = i * 5; \ + } \ + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1], \ + pred, foo, N); \ + for (int i = 0; i < N; ++i) \ + { \ + DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ + if (x[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + TEST_ALL + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define VEC_PERM(TYPE) \ +void __attribute__ ((weak)) \ +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \ +{ \ + for (int i = 0; i < n; ++i) \ + { \ + TYPE a1 = a[i * 2]; \ + TYPE a2 = a[i * 2 + 1]; \ + TYPE b1 = b[i * 2]; \ + TYPE b2 = b[i * 2 + 1]; \ + a[i * 2] = b1 > 1 ? a1 / b1 : a1; \ + a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2; \ + } \ +} + +#define TEST_ALL(T) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (float) \ + T (double) + +TEST_ALL (VEC_PERM) + +/* The loop should be fully-masked. The load XFAILs for fixed-length + SVE account for extra loads from the constant pool. */ +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */ +/* { dg-final { scan-assembler-not {\tldr} } } */ +/* { dg-final { scan-assembler-not {\tstr} } } */ + +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */ + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,34 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "slp_14.c" + +#define N1 (103 * 2) +#define N2 (111 * 2) + +#define HARNESS(TYPE) \ + { \ + TYPE a[N2], b[N2]; \ + for (unsigned int i = 0; i < N2; ++i) \ + { \ + a[i] = i * 2 + i % 5; \ + b[i] = i % 11; \ + } \ + vec_slp_##TYPE (a, b, N1 / 2); \ + for (unsigned int i = 0; i < N2; ++i) \ + { \ + TYPE orig_a = i * 2 + i % 5; \ + TYPE orig_b = i % 11; \ + TYPE expected_a = orig_a; \ + if (i < N1 && orig_b > (i & 1 ? 2 : 1)) \ + expected_a /= orig_b; \ + if (a[i] != expected_a || b[i] != orig_b) \ + __builtin_abort (); \ + } \ + } + +int +main (void) +{ + TEST_ALL (HARNESS) +}