From patchwork Wed Nov 8 16:37:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 835901 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466273-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="PAv+0eik"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yXBmk47MNz9sMN for ; Thu, 9 Nov 2017 03:37:56 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=rdBX7MojbwZdjUJgP9hXAr9edJRC+mAvzrumEQ3jppYrRmeFks3Pw mzeB4jM5xLb2shSTCrbt3b+mi4/zDFLCIP77dfkZGeLGUnN3SRKT+5bTQC7PKbDh nBA2k1qln8r/b82nrpXgozJNvMLwMIVbGuUjSMxkWBvBNvXhzJJkOY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=xa3FM8tjJz2hy2auJuM4BVGKBoU=; b=PAv+0eikiZV4FUFW2Gjs wL9f3Cx/iWoJ0rbWqSUhwhf+2DCBAwtQqJ6QNR8HgLiB0karFN2Pk9DrXcLpMYPH W4s4YjkjxpUXNUI490nZl67NbUbNqRxf1y/Ei8kI21iPkAsGN6TVxT+wRU0HLYSc klA1JflT3/HG4AIAE3KnFiQ= Received: (qmail 89540 invoked by alias); 8 Nov 2017 16:37:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 89526 invoked by uid 89); 8 Nov 2017 16:37:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=tz X-HELO: mail-wr0-f175.google.com Received: from mail-wr0-f175.google.com (HELO mail-wr0-f175.google.com) (209.85.128.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 08 Nov 2017 16:37:33 +0000 Received: by mail-wr0-f175.google.com with SMTP id 15so2985613wrb.5 for ; Wed, 08 Nov 2017 08:37:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=x+HXuW6A6TNUSQSv+lDgxA4zM3rLhYUt5m1E9Li/V5I=; b=uYVvqWemyqh/Wj9ymi8811SSM7m34p6jt8LKYAcVAJUYkjUyQhbokfHG4OSsQh5UWW B1c9yit2unQj5d+ZGTrHwKKu9g5JJnQ3cOSakmoyIM5POid+fCOZFMfYaj3rQiATjK8/ lfEB4ymP6AXzBqRFcBxibhAaAJj2/Vn331BXduwHmuu7fDsx3PzwU99/tdG/FAoBOYt9 L+bCOsf2LnqTw/co8TeRmSRGMocUntiyxFF95+xjuWNRKFNtdNYPmkC6eqnNMS+1RhsD WOG40LYY9f+zEbPQne1ZagYwhCDGG4COGvusyJgY7afHBXr5uleV5ztJraoKI9NgNNBS RDpg== X-Gm-Message-State: AJaThX4b4iM7nK+Knup0hmhUlijPeiH0g4RkmVFt5JIbHZlLH2kI7Bwk iYKnfxbDy+6E8QujoiYVQnOYDink5O4= X-Google-Smtp-Source: ABhQp+RxZb70fL63wF/DGPOh96jqQ3VJ/neY8C36fpOzEUjGOPbY7PXpi42xivgH8p/TUaghH1P0Ow== X-Received: by 10.223.187.1 with SMTP id r1mr953845wrg.253.1510159048856; Wed, 08 Nov 2017 08:37:28 -0800 (PST) Received: from localhost (94.197.121.218.threembb.co.uk. [94.197.121.218]) by smtp.gmail.com with ESMTPSA id b23sm5083039wrg.37.2017.11.08.08.37.26 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 08 Nov 2017 08:37:27 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Add support for masked load/store_lanes Date: Wed, 08 Nov 2017 16:37:25 +0000 Message-ID: <87efp8wwu2.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 This patch adds support for vectorising groups of IFN_MASK_LOADs and IFN_MASK_STOREs using conditional load/store-lanes instructions. This requires new internal functions to represent the result (IFN_MASK_{LOAD,STORE}_LANES), as well as associated optabs. The normal IFN_{LOAD,STORE}_LANES functions are const operations that logically just perform the permute: the load or store is encoded as a MEM operand to the call statement. In contrast, the IFN_MASK_{LOAD,STORE}_LANES functions use the same kind of interface as IFN_MASK_{LOAD,STORE}, since the memory is only conditionally accessed. The AArch64 patterns were added as part of the main LD[234]/ST[234] patch. Tested on aarch64-linux-gnu (both with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Thanks, Richard 2017-11-08 Richard Sandiford Alan Hayward David Sherwood gcc/ * optabs.def (vec_mask_load_lanes_optab): New optab. (vec_mask_store_lanes_optab): Likewise. * internal-fn.def (MASK_LOAD_LANES): New internal function. (MASK_STORE_LANES): Likewise. * internal-fn.c (mask_load_lanes_direct): New macro. (mask_store_lanes_direct): Likewise. (expand_mask_load_optab_fn): Handle masked operations. (expand_mask_load_lanes_optab_fn): New macro. (expand_mask_store_optab_fn): Handle masked operations. (expand_mask_store_lanes_optab_fn): New macro. (direct_mask_load_lanes_optab_supported_p): Likewise. (direct_mask_store_lanes_optab_supported_p): Likewise. * tree-vectorizer.h (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-data-refs.c (strip_conversion): New function. (can_group_stmts_p): Likewise. (vect_analyze_data_ref_accesses): Use it instead of checking for a pair of assignments. (vect_store_lanes_supported): Take a masked_p parameter. (vect_load_lanes_supported): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Update calls to vect_store_lanes_supported and vect_load_lanes_supported. * tree-vect-slp.c (vect_analyze_slp_instance): Likewise. * tree-vect-stmts.c (replace_mask_load): New function, split out from vectorizable_mask_load_store. Keep the group information up-to-date. (get_store_op): New function. (get_group_load_store_type): Take a masked_p parameter. Don't allow gaps for masked accesses. Use get_store_op. Update calls to vect_store_lanes_supported and vect_load_lanes_supported. (get_load_store_type): Take a masked_p parameter and update call to get_group_load_store_type. (init_stored_values, advance_stored_values): New functions, split out from vectorizable_store. (do_load_lanes, do_store_lanes): New functions. (get_masked_group_alias_ptr_type): New function. (vectorizable_mask_load_store): Update call to get_load_store_type. Handle masked VMAT_LOAD_STORE_LANES. Update GROUP_STORE_COUNT when vectorizing a group of stores and only vectorize when we reach the last statement in the group. Vectorize the first statement in a group of loads. Use an array aggregate type rather than a vector type for load/store_lanes. Use init_stored_values, advance_stored_values, do_load_lanes, do_store_lanes, get_masked_group_alias_ptr_type and replace_mask_load. (vectorizable_store): Update call to get_load_store_type. Use init_stored_values, advance_stored_values and do_store_lanes. (vectorizable_load): Update call to get_load_store_type. Use do_load_lanes. (vect_transform_stmt): Set grouped_store for grouped IFN_MASK_STOREs. Only set is_store for the last element in the group. gcc/testsuite/ * gcc.dg/vect/vect-ooo-group-1.c: New test. * gcc.target/aarch64/sve_mask_struct_load_1.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_1_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_2.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_2_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_3.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_3_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_4.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_5.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_6.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_7.c: Likewise. * gcc.target/aarch64/sve_mask_struct_load_8.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_1.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_1_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_2.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_2_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_3.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_3_run.c: Likewise. * gcc.target/aarch64/sve_mask_struct_store_4.c: Likewise. Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-08 15:05:55.697852337 +0000 +++ gcc/optabs.def 2017-11-08 16:35:04.763816035 +0000 @@ -80,6 +80,8 @@ OPTAB_CD(ssmsub_widen_optab, "ssmsub$b$a OPTAB_CD(usmsub_widen_optab, "usmsub$a$b4") OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") +OPTAB_CD(vec_mask_load_lanes_optab, "vec_mask_load_lanes$a$b") +OPTAB_CD(vec_mask_store_lanes_optab, "vec_mask_store_lanes$a$b") OPTAB_CD(vcond_optab, "vcond$a$b") OPTAB_CD(vcondu_optab, "vcondu$a$b") OPTAB_CD(vcondeq_optab, "vcondeq$a$b") Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-01 08:07:13.340797708 +0000 +++ gcc/internal-fn.def 2017-11-08 16:35:04.763816035 +0000 @@ -45,9 +45,11 @@ along with GCC; see the file COPYING3. - mask_load: currently just maskload - load_lanes: currently just vec_load_lanes + - mask_load_lanes: currently just vec_mask_load_lanes - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes + - mask_store_lanes: currently just vec_mask_store_lanes DEF_INTERNAL_FLT_FN is like DEF_INTERNAL_OPTAB_FN, but in addition, the function implements the computational part of a built-in math @@ -92,9 +94,13 @@ along with GCC; see the file COPYING3. DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) +DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, + vec_mask_load_lanes, mask_load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) +DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, + vec_mask_store_lanes, mask_store_lanes) DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-11-08 15:05:55.618852345 +0000 +++ gcc/internal-fn.c 2017-11-08 16:35:04.763816035 +0000 @@ -79,8 +79,10 @@ #define DEF_INTERNAL_FN(CODE, FLAGS, FNS #define not_direct { -2, -2, false } #define mask_load_direct { -1, 2, false } #define load_lanes_direct { -1, -1, false } +#define mask_load_lanes_direct { -1, -1, false } #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } +#define mask_store_lanes_direct { 0, 0, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } @@ -2277,7 +2279,7 @@ expand_LOOP_DIST_ALIAS (internal_fn, gca gcc_unreachable (); } -/* Expand MASK_LOAD call STMT using optab OPTAB. */ +/* Expand MASK_LOAD{,_LANES} call STMT using optab OPTAB. */ static void expand_mask_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) @@ -2286,6 +2288,7 @@ expand_mask_load_optab_fn (internal_fn, tree type, lhs, rhs, maskt, ptr; rtx mem, target, mask; unsigned align; + insn_code icode; maskt = gimple_call_arg (stmt, 2); lhs = gimple_call_lhs (stmt); @@ -2298,6 +2301,12 @@ expand_mask_load_optab_fn (internal_fn, type = build_aligned_type (type, align); rhs = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), ptr); + if (optab == vec_mask_load_lanes_optab) + icode = get_multi_vector_move (type, optab); + else + icode = convert_optab_handler (optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))); + mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); gcc_assert (MEM_P (mem)); mask = expand_normal (maskt); @@ -2305,12 +2314,12 @@ expand_mask_load_optab_fn (internal_fn, create_output_operand (&ops[0], target, TYPE_MODE (type)); create_fixed_operand (&ops[1], mem); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (convert_optab_handler (optab, TYPE_MODE (type), - TYPE_MODE (TREE_TYPE (maskt))), - 3, ops); + expand_insn (icode, 3, ops); } -/* Expand MASK_STORE call STMT using optab OPTAB. */ +#define expand_mask_load_lanes_optab_fn expand_mask_load_optab_fn + +/* Expand MASK_STORE{,_LANES} call STMT using optab OPTAB. */ static void expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) @@ -2319,6 +2328,7 @@ expand_mask_store_optab_fn (internal_fn, tree type, lhs, rhs, maskt, ptr; rtx mem, reg, mask; unsigned align; + insn_code icode; maskt = gimple_call_arg (stmt, 2); rhs = gimple_call_arg (stmt, 3); @@ -2329,6 +2339,12 @@ expand_mask_store_optab_fn (internal_fn, type = build_aligned_type (type, align); lhs = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), ptr); + if (optab == vec_mask_store_lanes_optab) + icode = get_multi_vector_move (type, optab); + else + icode = convert_optab_handler (optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))); + mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); gcc_assert (MEM_P (mem)); mask = expand_normal (maskt); @@ -2336,11 +2352,11 @@ expand_mask_store_optab_fn (internal_fn, create_fixed_operand (&ops[0], mem); create_input_operand (&ops[1], reg, TYPE_MODE (type)); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (convert_optab_handler (optab, TYPE_MODE (type), - TYPE_MODE (TREE_TYPE (maskt))), - 3, ops); + expand_insn (icode, 3, ops); } +#define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn + static void expand_ABNORMAL_DISPATCHER (internal_fn, gcall *) { @@ -2732,8 +2748,10 @@ #define direct_unary_optab_supported_p d #define direct_binary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p +#define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p +#define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p /* Return true if FN is supported for the types in TYPES when the optimization type is OPT_TYPE. The types are those associated with Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-11-08 15:05:33.791822333 +0000 +++ gcc/tree-vectorizer.h 2017-11-08 16:35:04.771159765 +0000 @@ -1284,9 +1284,9 @@ extern tree bump_vector_ptr (tree, gimpl tree); extern tree vect_create_destination_var (tree, tree); extern bool vect_grouped_store_supported (tree, unsigned HOST_WIDE_INT); -extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT); +extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); extern bool vect_grouped_load_supported (tree, bool, unsigned HOST_WIDE_INT); -extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT); +extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT, bool); extern void vect_permute_store_chain (vec ,unsigned int, gimple *, gimple_stmt_iterator *, vec *); extern tree vect_setup_realignment (gimple *, gimple_stmt_iterator *, tree *, Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-08 15:06:16.087850270 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-08 16:35:04.768405866 +0000 @@ -2791,6 +2791,62 @@ dr_group_sort_cmp (const void *dra_, con return cmp; } +/* If OP is the result of a conversion, return the unconverted value, + otherwise return null. */ + +static tree +strip_conversion (tree op) +{ + if (TREE_CODE (op) != SSA_NAME) + return NULL_TREE; + gimple *stmt = SSA_NAME_DEF_STMT (op); + if (!is_gimple_assign (stmt) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))) + return NULL_TREE; + return gimple_assign_rhs1 (stmt); +} + +/* Return true if vectorizable_* routines can handle statements STMT1 + and STMT2 being in a single group. */ + +static bool +can_group_stmts_p (gimple *stmt1, gimple *stmt2) +{ + if (gimple_assign_single_p (stmt1)) + return gimple_assign_single_p (stmt2); + + if (is_gimple_call (stmt1) && gimple_call_internal_p (stmt1)) + { + /* Check for two masked loads or two masked stores. */ + if (!is_gimple_call (stmt2) || !gimple_call_internal_p (stmt2)) + return false; + internal_fn ifn = gimple_call_internal_fn (stmt1); + if (ifn != IFN_MASK_LOAD && ifn != IFN_MASK_STORE) + return false; + if (ifn != gimple_call_internal_fn (stmt2)) + return false; + + /* Check that the masks are the same. Cope with casts of masks, + like those created by build_mask_conversion. */ + tree mask1 = gimple_call_arg (stmt1, 2); + tree mask2 = gimple_call_arg (stmt2, 2); + if (!operand_equal_p (mask1, mask2, 0)) + { + mask1 = strip_conversion (mask1); + if (!mask1) + return false; + mask2 = strip_conversion (mask2); + if (!mask2) + return false; + if (!operand_equal_p (mask1, mask2, 0)) + return false; + } + return true; + } + + return false; +} + /* Function vect_analyze_data_ref_accesses. Analyze the access pattern of all the data references in the loop. @@ -2857,8 +2913,7 @@ vect_analyze_data_ref_accesses (vec_info || data_ref_compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb)) != 0 || data_ref_compare_tree (DR_OFFSET (dra), DR_OFFSET (drb)) != 0 - || !gimple_assign_single_p (DR_STMT (dra)) - || !gimple_assign_single_p (DR_STMT (drb))) + || !can_group_stmts_p (DR_STMT (dra), DR_STMT (drb))) break; /* Check that the data-refs have the same constant size. */ @@ -4662,15 +4717,21 @@ vect_grouped_store_supported (tree vecty } -/* Return TRUE if vec_store_lanes is available for COUNT vectors of - type VECTYPE. */ +/* Return TRUE if vec_{mask_}store_lanes is available for COUNT vectors of + type VECTYPE. MASKED_P says whether the masked form is needed. */ bool -vect_store_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count) +vect_store_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count, + bool masked_p) { - return vect_lanes_optab_supported_p ("vec_store_lanes", - vec_store_lanes_optab, - vectype, count); + if (masked_p) + return vect_lanes_optab_supported_p ("vec_mask_store_lanes", + vec_mask_store_lanes_optab, + vectype, count); + else + return vect_lanes_optab_supported_p ("vec_store_lanes", + vec_store_lanes_optab, + vectype, count); } @@ -5238,15 +5299,21 @@ vect_grouped_load_supported (tree vectyp return false; } -/* Return TRUE if vec_load_lanes is available for COUNT vectors of - type VECTYPE. */ +/* Return TRUE if vec_{masked_}load_lanes is available for COUNT vectors of + type VECTYPE. MASKED_P says whether the masked form is needed. */ bool -vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count) +vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count, + bool masked_p) { - return vect_lanes_optab_supported_p ("vec_load_lanes", - vec_load_lanes_optab, - vectype, count); + if (masked_p) + return vect_lanes_optab_supported_p ("vec_mask_load_lanes", + vec_mask_load_lanes_optab, + vectype, count); + else + return vect_lanes_optab_supported_p ("vec_load_lanes", + vec_load_lanes_optab, + vectype, count); } /* Function vect_permute_load_chain. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-11-08 15:05:36.349044117 +0000 +++ gcc/tree-vect-loop.c 2017-11-08 16:35:04.770241799 +0000 @@ -2247,7 +2247,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo)); unsigned int size = STMT_VINFO_GROUP_SIZE (vinfo); tree vectype = STMT_VINFO_VECTYPE (vinfo); - if (! vect_store_lanes_supported (vectype, size) + if (! vect_store_lanes_supported (vectype, size, false) && ! vect_grouped_store_supported (vectype, size)) return false; FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (instance), j, node) @@ -2257,7 +2257,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ bool single_element_p = !STMT_VINFO_GROUP_NEXT_ELEMENT (vinfo); size = STMT_VINFO_GROUP_SIZE (vinfo); vectype = STMT_VINFO_VECTYPE (vinfo); - if (! vect_load_lanes_supported (vectype, size) + if (! vect_load_lanes_supported (vectype, size, false) && ! vect_grouped_load_supported (vectype, single_element_p, size)) return false; Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-11-08 15:05:34.296308263 +0000 +++ gcc/tree-vect-slp.c 2017-11-08 16:35:04.770241799 +0000 @@ -2175,7 +2175,7 @@ vect_analyze_slp_instance (vec_info *vin instructions do not generate this SLP instance. */ if (is_a (vinfo) && loads_permuted - && dr && vect_store_lanes_supported (vectype, group_size)) + && dr && vect_store_lanes_supported (vectype, group_size, false)) { slp_tree load_node; FOR_EACH_VEC_ELT (loads, i, load_node) @@ -2188,7 +2188,7 @@ vect_analyze_slp_instance (vec_info *vin if (STMT_VINFO_STRIDED_P (stmt_vinfo) || ! vect_load_lanes_supported (STMT_VINFO_VECTYPE (stmt_vinfo), - GROUP_SIZE (stmt_vinfo))) + GROUP_SIZE (stmt_vinfo), false)) break; } if (i == loads.length ()) Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-08 15:05:36.350875282 +0000 +++ gcc/tree-vect-stmts.c 2017-11-08 16:35:04.771159765 +0000 @@ -1700,6 +1700,69 @@ vectorizable_internal_function (combined static tree permute_vec_elements (tree, tree, tree, gimple *, gimple_stmt_iterator *); +/* Replace IFN_MASK_LOAD statement STMT with a dummy assignment, to ensure + that it won't be expanded even when there's no following DCE pass. */ + +static void +replace_mask_load (gimple *stmt, gimple_stmt_iterator *gsi) +{ + /* If this statement is part of a pattern created by the vectorizer, + get the original statement. */ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + if (STMT_VINFO_RELATED_STMT (stmt_info)) + { + stmt = STMT_VINFO_RELATED_STMT (stmt_info); + stmt_info = vinfo_for_stmt (stmt); + } + + gcc_assert (gsi_stmt (*gsi) == stmt); + tree lhs = gimple_call_lhs (stmt); + tree zero = build_zero_cst (TREE_TYPE (lhs)); + gimple *new_stmt = gimple_build_assign (lhs, zero); + set_vinfo_for_stmt (new_stmt, stmt_info); + set_vinfo_for_stmt (stmt, NULL); + STMT_VINFO_STMT (stmt_info) = new_stmt; + + /* If STMT was the first statement in a group, redirect all + GROUP_FIRST_ELEMENT pointers to the new statement (which has the + same stmt_info as the old statement). */ + if (GROUP_FIRST_ELEMENT (stmt_info) == stmt) + { + gimple *group_stmt = new_stmt; + do + { + GROUP_FIRST_ELEMENT (vinfo_for_stmt (group_stmt)) = new_stmt; + group_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (group_stmt)); + } + while (group_stmt); + } + else if (GROUP_FIRST_ELEMENT (stmt_info)) + { + /* Otherwise redirect the GROUP_NEXT_ELEMENT. It would be more + efficient if these pointers were to the stmt_vec_info rather + than the gimple statements themselves, but this is by no means + the only quadractic loop for groups. */ + gimple *group_stmt = GROUP_FIRST_ELEMENT (stmt_info); + while (GROUP_NEXT_ELEMENT (vinfo_for_stmt (group_stmt)) != stmt) + group_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (group_stmt)); + GROUP_NEXT_ELEMENT (vinfo_for_stmt (group_stmt)) = new_stmt; + } + gsi_replace (gsi, new_stmt, true); +} + +/* STMT is either a masked or unconditional store. Return the value + being stored. */ + +static tree +get_store_op (gimple *stmt) +{ + if (gimple_assign_single_p (stmt)) + return gimple_assign_rhs1 (stmt); + if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) + return gimple_call_arg (stmt, 3); + gcc_unreachable (); +} + /* STMT is a non-strided load or store, meaning that it accesses elements with a known constant step. Return -1 if that step is negative, 0 if it is zero, and 1 if it is greater than zero. */ @@ -1744,7 +1807,7 @@ perm_mask_for_reverse (tree vectype) static bool get_group_load_store_type (gimple *stmt, tree vectype, bool slp, - vec_load_store_type vls_type, + bool masked_p, vec_load_store_type vls_type, vect_memory_access_type *memory_access_type) { stmt_vec_info stmt_info = vinfo_for_stmt (stmt); @@ -1765,7 +1828,10 @@ get_group_load_store_type (gimple *stmt, /* True if we can cope with such overrun by peeling for gaps, so that there is at least one final scalar iteration after the vector loop. */ - bool can_overrun_p = (vls_type == VLS_LOAD && loop_vinfo && !loop->inner); + bool can_overrun_p = (!masked_p + && vls_type == VLS_LOAD + && loop_vinfo + && !loop->inner); /* There can only be a gap at the end of the group if the stride is known at compile time. */ @@ -1828,6 +1894,7 @@ get_group_load_store_type (gimple *stmt, and so we are guaranteed to access a non-gap element in the same B-sized block. */ if (would_overrun_p + && !masked_p && gap < (vect_known_alignment_in_bytes (first_dr) / vect_get_scalar_dr_size (first_dr))) would_overrun_p = false; @@ -1838,8 +1905,8 @@ get_group_load_store_type (gimple *stmt, { /* First try using LOAD/STORE_LANES. */ if (vls_type == VLS_LOAD - ? vect_load_lanes_supported (vectype, group_size) - : vect_store_lanes_supported (vectype, group_size)) + ? vect_load_lanes_supported (vectype, group_size, masked_p) + : vect_store_lanes_supported (vectype, group_size, masked_p)) { *memory_access_type = VMAT_LOAD_STORE_LANES; overrun_p = would_overrun_p; @@ -1865,8 +1932,7 @@ get_group_load_store_type (gimple *stmt, gimple *next_stmt = GROUP_NEXT_ELEMENT (stmt_info); while (next_stmt) { - gcc_assert (gimple_assign_single_p (next_stmt)); - tree op = gimple_assign_rhs1 (next_stmt); + tree op = get_store_op (next_stmt); gimple *def_stmt; enum vect_def_type dt; if (!vect_is_simple_use (op, vinfo, &def_stmt, &dt)) @@ -1950,11 +2016,12 @@ get_negative_load_store_type (gimple *st or scatters, fill in GS_INFO accordingly. SLP says whether we're performing SLP rather than loop vectorization. + MASKED_P is true if the statement is conditional on a vectorized mask. VECTYPE is the vector type that the vectorized statements will use. NCOPIES is the number of vector statements that will be needed. */ static bool -get_load_store_type (gimple *stmt, tree vectype, bool slp, +get_load_store_type (gimple *stmt, tree vectype, bool slp, bool masked_p, vec_load_store_type vls_type, unsigned int ncopies, vect_memory_access_type *memory_access_type, gather_scatter_info *gs_info) @@ -1982,7 +2049,7 @@ get_load_store_type (gimple *stmt, tree } else if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { - if (!get_group_load_store_type (stmt, vectype, slp, vls_type, + if (!get_group_load_store_type (stmt, vectype, slp, masked_p, vls_type, memory_access_type)) return false; } @@ -2031,6 +2098,174 @@ get_load_store_type (gimple *stmt, tree return true; } +/* Set up the stored values for the first copy of a vectorized store. + GROUP_SIZE is the number of stores in the group (which is 1 for + ungrouped stores). FIRST_STMT is the first statement in the group. + + On return, initialize OPERANDS to a new vector in which element I + is the value that the first copy of group member I should store. + The caller should free OPERANDS after use. */ + +static void +init_stored_values (unsigned int group_size, gimple *first_stmt, + vec *operands) +{ + operands->create (group_size); + gimple *next_stmt = first_stmt; + for (unsigned int i = 0; i < group_size; i++) + { + /* Since gaps are not supported for interleaved stores, + GROUP_SIZE is the exact number of stmts in the chain. + Therefore, NEXT_STMT can't be NULL_TREE. In case that + there is no interleaving, GROUP_SIZE is 1, and only one + iteration of the loop will be executed. */ + gcc_assert (next_stmt); + tree op = get_store_op (next_stmt); + tree vec_op = vect_get_vec_def_for_operand (op, next_stmt); + operands->quick_push (vec_op); + next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt)); + } +} + +/* OPERANDS is a vector set up by init_stored_values. Update each element + for the next copy of each statement. GROUP_SIZE and FIRST_STMT are + as for init_stored_values. */ + +static void +advance_stored_values (unsigned int group_size, gimple *first_stmt, + vec operands) +{ + vec_info *vinfo = vinfo_for_stmt (first_stmt)->vinfo; + for (unsigned int i = 0; i < group_size; i++) + { + tree op = operands[i]; + enum vect_def_type dt; + gimple *def_stmt; + vect_is_simple_use (op, vinfo, &def_stmt, &dt); + operands[i] = vect_get_vec_def_for_stmt_copy (dt, op); + } +} + +/* Emit one copy of a vectorized LOAD_LANES for STMT. GROUP_SIZE is + the number of vectors being loaded and VECTYPE is the type of each + vector. AGGR_TYPE is the type that should be used to refer to the + memory source (which contains the same number of elements as + GROUP_SIZE copies of VECTYPE, but in a different order). + DATAREF_PTR points to the first element that should be loaded. + ALIAS_PTR_TYPE is the type of the accessed elements for aliasing + purposes. MASK, if nonnull, is a mask in which element I is true + if element I of each destination vector should be loaded. */ + +static void +do_load_lanes (gimple *stmt, gimple_stmt_iterator *gsi, + unsigned int group_size, tree vectype, tree aggr_type, + tree dataref_ptr, tree alias_ptr_type, tree mask) +{ + tree scalar_dest = gimple_get_lhs (stmt); + tree vec_array = create_vector_array (vectype, group_size); + + gcall *new_stmt; + if (mask) + { + /* Emit: VEC_ARRAY = MASK_LOAD_LANES (DATAREF_PTR, ALIAS_PTR, MASK). */ + tree alias_ptr = build_int_cst (alias_ptr_type, + TYPE_ALIGN_UNIT (TREE_TYPE (vectype))); + new_stmt = gimple_build_call_internal (IFN_MASK_LOAD_LANES, 3, + dataref_ptr, alias_ptr, mask); + } + else + { + /* Emit: VEC_ARRAY = LOAD_LANES (MEM_REF[...all elements...]). */ + tree data_ref = create_array_ref (aggr_type, dataref_ptr, + alias_ptr_type); + new_stmt = gimple_build_call_internal (IFN_LOAD_LANES, 1, data_ref); + } + gimple_call_set_lhs (new_stmt, vec_array); + gimple_call_set_nothrow (new_stmt, true); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + /* Extract each vector into an SSA_NAME. */ + auto_vec dr_chain; + dr_chain.reserve (group_size); + for (unsigned int i = 0; i < group_size; i++) + { + tree new_temp = read_vector_array (stmt, gsi, scalar_dest, vec_array, i); + dr_chain.quick_push (new_temp); + } + + /* Record the mapping between SSA_NAMEs and statements. */ + vect_record_grouped_load_vectors (stmt, dr_chain); +} + +/* Emit one copy of a vectorized STORE_LANES for STMT. GROUP_SIZE is + the number of vectors being stored and OPERANDS[I] is the value + that group member I should store. AGGR_TYPE is the type that should + be used to refer to the memory destination (which contains the same + number of elements as the source vectors, but in a different order). + DATAREF_PTR points to the first store location. ALIAS_PTR_TYPE is + the type of the accessed elements for aliasing purposes. MASK, + if nonnull, is a mask in which element I is true if element I of + each source vector should be stored. */ + +static gimple * +do_store_lanes (gimple *stmt, gimple_stmt_iterator *gsi, + unsigned int group_size, tree aggr_type, tree dataref_ptr, + tree alias_ptr_type, vec operands, tree mask) +{ + /* Combine all the vectors into an array. */ + tree vectype = TREE_TYPE (operands[0]); + tree vec_array = create_vector_array (vectype, group_size); + for (unsigned int i = 0; i < group_size; i++) + write_vector_array (stmt, gsi, operands[i], vec_array, i); + + gcall *new_stmt; + if (mask) + { + /* Emit: MASK_STORE_LANES (DATAREF_PTR, ALIAS_PTR, MASK, VEC_ARRAY). */ + tree alias_ptr = build_int_cst (alias_ptr_type, + TYPE_ALIGN_UNIT (TREE_TYPE (vectype))); + new_stmt = gimple_build_call_internal (IFN_MASK_STORE_LANES, 4, + dataref_ptr, alias_ptr, + mask, vec_array); + } + else + { + /* Emit: MEM_REF[...all elements...] = STORE_LANES (VEC_ARRAY). */ + tree data_ref = create_array_ref (aggr_type, dataref_ptr, alias_ptr_type); + new_stmt = gimple_build_call_internal (IFN_STORE_LANES, 1, vec_array); + gimple_call_set_lhs (new_stmt, data_ref); + } + gimple_call_set_nothrow (new_stmt, true); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + return new_stmt; +} + +/* Return the alias pointer type for the group of masked loads or + stores starting at FIRST_STMT. */ + +static tree +get_masked_group_alias_ptr_type (gimple *first_stmt) +{ + tree type, next_type; + gimple *next_stmt; + + type = TREE_TYPE (gimple_call_arg (first_stmt, 1)); + next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (first_stmt)); + while (next_stmt) + { + next_type = TREE_TYPE (gimple_call_arg (next_stmt, 1)); + if (get_alias_set (type) != get_alias_set (next_type)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "conflicting alias set types.\n"); + return ptr_type_node; + } + next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt)); + } + return type; +} + /* Function vectorizable_mask_load_store. Check if STMT performs a conditional load or store that can be vectorized. @@ -2053,6 +2288,7 @@ vectorizable_mask_load_store (gimple *st tree rhs_vectype = NULL_TREE; tree mask_vectype; tree elem_type; + tree aggr_type; gimple *new_stmt; tree dummy; tree dataref_ptr = NULL_TREE; @@ -2066,6 +2302,8 @@ vectorizable_mask_load_store (gimple *st tree mask; gimple *def_stmt; enum vect_def_type dt; + gimple *first_stmt = stmt; + unsigned int group_size = 1; if (slp_node != NULL) return false; @@ -2127,7 +2365,7 @@ vectorizable_mask_load_store (gimple *st vls_type = VLS_LOAD; vect_memory_access_type memory_access_type; - if (!get_load_store_type (stmt, vectype, false, vls_type, ncopies, + if (!get_load_store_type (stmt, vectype, false, true, vls_type, ncopies, &memory_access_type, &gs_info)) return false; @@ -2144,7 +2382,18 @@ vectorizable_mask_load_store (gimple *st return false; } } - else if (memory_access_type != VMAT_CONTIGUOUS) + else if (rhs_vectype + && !useless_type_conversion_p (vectype, rhs_vectype)) + return false; + else if (memory_access_type == VMAT_CONTIGUOUS) + { + if (!VECTOR_MODE_P (TYPE_MODE (vectype)) + || !can_vec_mask_load_store_p (TYPE_MODE (vectype), + TYPE_MODE (mask_vectype), + vls_type == VLS_LOAD)) + return false; + } + else if (memory_access_type != VMAT_LOAD_STORE_LANES) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -2152,13 +2401,6 @@ vectorizable_mask_load_store (gimple *st vls_type == VLS_LOAD ? "load" : "store"); return false; } - else if (!VECTOR_MODE_P (TYPE_MODE (vectype)) - || !can_vec_mask_load_store_p (TYPE_MODE (vectype), - TYPE_MODE (mask_vectype), - vls_type == VLS_LOAD) - || (rhs_vectype - && !useless_type_conversion_p (vectype, rhs_vectype))) - return false; if (!vec_stmt) /* transformation not required. */ { @@ -2176,6 +2418,14 @@ vectorizable_mask_load_store (gimple *st /* Transform. */ + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + { + first_stmt = GROUP_FIRST_ELEMENT (stmt_info); + group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt)); + if (vls_type != VLS_LOAD) + GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))++; + } + if (memory_access_type == VMAT_GATHER_SCATTER) { tree vec_oprnd0 = NULL_TREE, op; @@ -2343,23 +2593,28 @@ vectorizable_mask_load_store (gimple *st prev_stmt_info = vinfo_for_stmt (new_stmt); } - /* Ensure that even with -fno-tree-dce the scalar MASK_LOAD is removed - from the IL. */ - if (STMT_VINFO_RELATED_STMT (stmt_info)) - { - stmt = STMT_VINFO_RELATED_STMT (stmt_info); - stmt_info = vinfo_for_stmt (stmt); - } - tree lhs = gimple_call_lhs (stmt); - new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); - set_vinfo_for_stmt (new_stmt, stmt_info); - set_vinfo_for_stmt (stmt, NULL); - STMT_VINFO_STMT (stmt_info) = new_stmt; - gsi_replace (gsi, new_stmt, true); + replace_mask_load (stmt, gsi); return true; } - else if (vls_type != VLS_LOAD) + + if (memory_access_type == VMAT_LOAD_STORE_LANES) + aggr_type = build_array_type_nelts (elem_type, group_size * nunits); + else + aggr_type = vectype; + + if (vls_type != VLS_LOAD) { + /* Vectorize the whole group when we reach the final statement. + Replace all other statements with an empty sequence. */ + if (STMT_VINFO_GROUPED_ACCESS (stmt_info) + && (GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt)) + < GROUP_SIZE (vinfo_for_stmt (first_stmt)))) + { + *vec_stmt = NULL; + return true; + } + + auto_vec operands; tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; prev_stmt_info = NULL; LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; @@ -2369,48 +2624,62 @@ vectorizable_mask_load_store (gimple *st if (i == 0) { - tree rhs = gimple_call_arg (stmt, 3); - vec_rhs = vect_get_vec_def_for_operand (rhs, stmt); + init_stored_values (group_size, first_stmt, &operands); + vec_rhs = operands[0]; vec_mask = vect_get_vec_def_for_operand (mask, stmt, mask_vectype); - /* We should have catched mismatched types earlier. */ + /* We should have caught mismatched types earlier. */ gcc_assert (useless_type_conversion_p (vectype, TREE_TYPE (vec_rhs))); - dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, - NULL_TREE, &dummy, gsi, - &ptr_incr, false, &inv_p); + dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, + NULL, NULL_TREE, &dummy, + gsi, &ptr_incr, false, + &inv_p); gcc_assert (!inv_p); } else { - vect_is_simple_use (vec_rhs, loop_vinfo, &def_stmt, &dt); - vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs); + advance_stored_values (group_size, first_stmt, operands); + vec_rhs = operands[0]; vect_is_simple_use (vec_mask, loop_vinfo, &def_stmt, &dt); vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); - dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, - TYPE_SIZE_UNIT (vectype)); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, + gsi, first_stmt, + TYPE_SIZE_UNIT (aggr_type)); } - align = DR_TARGET_ALIGNMENT (dr); - if (aligned_access_p (dr)) - misalign = 0; - else if (DR_MISALIGNMENT (dr) == -1) + if (memory_access_type == VMAT_LOAD_STORE_LANES) { - align = TYPE_ALIGN_UNIT (elem_type); - misalign = 0; + tree ref_type = get_masked_group_alias_ptr_type (first_stmt); + new_stmt = do_store_lanes (stmt, gsi, group_size, aggr_type, + dataref_ptr, ref_type, operands, + vec_mask); } else - misalign = DR_MISALIGNMENT (dr); - set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, - misalign); - tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), - misalign ? least_bit_hwi (misalign) : align); - gcall *call - = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, - ptr, vec_mask, vec_rhs); - gimple_call_set_nothrow (call, true); - new_stmt = call; - vect_finish_stmt_generation (stmt, new_stmt, gsi); + { + align = DR_TARGET_ALIGNMENT (dr); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), + misalign + ? least_bit_hwi (misalign) + : align); + gcall *call + = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, + ptr, vec_mask, vec_rhs); + gimple_call_set_nothrow (call, true); + new_stmt = call; + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } if (i == 0) STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; else @@ -2420,73 +2689,88 @@ vectorizable_mask_load_store (gimple *st } else { + /* Vectorize the whole group when we reach the first statement. + For later statements we just need to return the cached + replacement. */ + if (group_size > 1 + && STMT_VINFO_VEC_STMT (vinfo_for_stmt (first_stmt))) + { + *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); + replace_mask_load (stmt, gsi); + return true; + } + tree vec_mask = NULL_TREE; prev_stmt_info = NULL; - vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + if (memory_access_type == VMAT_LOAD_STORE_LANES) + vec_dest = NULL_TREE; + else + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), + vectype); for (i = 0; i < ncopies; i++) { unsigned align, misalign; if (i == 0) { + gcc_assert (mask == gimple_call_arg (first_stmt, 2)); vec_mask = vect_get_vec_def_for_operand (mask, stmt, mask_vectype); - dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, - NULL_TREE, &dummy, gsi, - &ptr_incr, false, &inv_p); + dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, + NULL, NULL_TREE, &dummy, + gsi, &ptr_incr, false, + &inv_p); gcc_assert (!inv_p); } else { vect_is_simple_use (vec_mask, loop_vinfo, &def_stmt, &dt); vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); - dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, - TYPE_SIZE_UNIT (vectype)); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, + gsi, first_stmt, + TYPE_SIZE_UNIT (aggr_type)); } - align = DR_TARGET_ALIGNMENT (dr); - if (aligned_access_p (dr)) - misalign = 0; - else if (DR_MISALIGNMENT (dr) == -1) + if (memory_access_type == VMAT_LOAD_STORE_LANES) { - align = TYPE_ALIGN_UNIT (elem_type); - misalign = 0; + tree ref_type = get_masked_group_alias_ptr_type (first_stmt); + do_load_lanes (stmt, gsi, group_size, vectype, + aggr_type, dataref_ptr, ref_type, vec_mask); + *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); } else - misalign = DR_MISALIGNMENT (dr); - set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, - misalign); - tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), - misalign ? least_bit_hwi (misalign) : align); - gcall *call - = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, - ptr, vec_mask); - gimple_call_set_lhs (call, make_ssa_name (vec_dest)); - gimple_call_set_nothrow (call, true); - vect_finish_stmt_generation (stmt, call, gsi); - if (i == 0) - STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = call; - else - STMT_VINFO_RELATED_STMT (prev_stmt_info) = call; - prev_stmt_info = vinfo_for_stmt (call); + { + align = DR_TARGET_ALIGNMENT (dr); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), + misalign + ? least_bit_hwi (misalign) + : align); + gcall *call + = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, + ptr, vec_mask); + gimple_call_set_lhs (call, make_ssa_name (vec_dest)); + gimple_call_set_nothrow (call, true); + vect_finish_stmt_generation (stmt, call, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = call; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = call; + prev_stmt_info = vinfo_for_stmt (call); + } } - } - if (vls_type == VLS_LOAD) - { - /* Ensure that even with -fno-tree-dce the scalar MASK_LOAD is removed - from the IL. */ - if (STMT_VINFO_RELATED_STMT (stmt_info)) - { - stmt = STMT_VINFO_RELATED_STMT (stmt_info); - stmt_info = vinfo_for_stmt (stmt); - } - tree lhs = gimple_call_lhs (stmt); - new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); - set_vinfo_for_stmt (new_stmt, stmt_info); - set_vinfo_for_stmt (stmt, NULL); - STMT_VINFO_STMT (stmt_info) = new_stmt; - gsi_replace (gsi, new_stmt, true); + replace_mask_load (stmt, gsi); } return true; @@ -5818,7 +6102,7 @@ vectorizable_store (gimple *stmt, gimple return false; vect_memory_access_type memory_access_type; - if (!get_load_store_type (stmt, vectype, slp, vls_type, ncopies, + if (!get_load_store_type (stmt, vectype, slp, false, vls_type, ncopies, &memory_access_type, &gs_info)) return false; @@ -6353,34 +6637,21 @@ vectorizable_store (gimple *stmt, gimple vec_oprnd = vec_oprnds[0]; } else - { - /* For interleaved stores we collect vectorized defs for all the - stores in the group in DR_CHAIN and OPRNDS. DR_CHAIN is then - used as an input to vect_permute_store_chain(), and OPRNDS as - an input to vect_get_vec_def_for_stmt_copy() for the next copy. - - If the store is not grouped, GROUP_SIZE is 1, and DR_CHAIN and - OPRNDS are of size 1. */ - next_stmt = first_stmt; - for (i = 0; i < group_size; i++) - { - /* Since gaps are not supported for interleaved stores, - GROUP_SIZE is the exact number of stmts in the chain. - Therefore, NEXT_STMT can't be NULL_TREE. In case that - there is no interleaving, GROUP_SIZE is 1, and only one - iteration of the loop will be executed. */ - gcc_assert (next_stmt - && gimple_assign_single_p (next_stmt)); - op = gimple_assign_rhs1 (next_stmt); - - vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt); - dr_chain.quick_push (vec_oprnd); - oprnds.quick_push (vec_oprnd); - next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt)); - } + { + /* For interleaved stores we collect vectorized defs + for all the stores in the group in DR_CHAIN and OPRNDS. + DR_CHAIN is then used as an input to + vect_permute_store_chain(), and OPRNDS as an input to + vect_get_vec_def_for_stmt_copy() for the next copy. + + If the store is not grouped, GROUP_SIZE is 1, and DR_CHAIN + and OPRNDS are of size 1. */ + init_stored_values (group_size, first_stmt, &oprnds); + dr_chain.safe_splice (oprnds); + vec_oprnd = oprnds[0]; } - /* We should have catched mismatched types earlier. */ + /* We should have caught mismatched types earlier. */ gcc_assert (useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); bool simd_lane_access_p @@ -6414,14 +6685,10 @@ vectorizable_store (gimple *stmt, gimple next copy. If the store is not grouped, GROUP_SIZE is 1, and DR_CHAIN and OPRNDS are of size 1. */ - for (i = 0; i < group_size; i++) - { - op = oprnds[i]; - vect_is_simple_use (op, vinfo, &def_stmt, &dt); - vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, op); - dr_chain[i] = vec_oprnd; - oprnds[i] = vec_oprnd; - } + advance_stored_values (group_size, first_stmt, oprnds); + dr_chain.truncate (0); + dr_chain.splice (oprnds); + vec_oprnd = oprnds[0]; if (dataref_offset) dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, @@ -6432,27 +6699,8 @@ vectorizable_store (gimple *stmt, gimple } if (memory_access_type == VMAT_LOAD_STORE_LANES) - { - tree vec_array; - - /* Combine all the vectors into an array. */ - vec_array = create_vector_array (vectype, vec_num); - for (i = 0; i < vec_num; i++) - { - vec_oprnd = dr_chain[i]; - write_vector_array (stmt, gsi, vec_oprnd, vec_array, i); - } - - /* Emit: - MEM_REF[...all elements...] = STORE_LANES (VEC_ARRAY). */ - data_ref = create_array_ref (aggr_type, dataref_ptr, ref_type); - gcall *call = gimple_build_call_internal (IFN_STORE_LANES, 1, - vec_array); - gimple_call_set_lhs (call, data_ref); - gimple_call_set_nothrow (call, true); - new_stmt = call; - vect_finish_stmt_generation (stmt, new_stmt, gsi); - } + new_stmt = do_store_lanes (stmt, gsi, vec_num, aggr_type, + dataref_ptr, ref_type, dr_chain, NULL_TREE); else { new_stmt = NULL; @@ -6859,7 +7107,7 @@ vectorizable_load (gimple *stmt, gimple_ } vect_memory_access_type memory_access_type; - if (!get_load_store_type (stmt, vectype, slp, VLS_LOAD, ncopies, + if (!get_load_store_type (stmt, vectype, slp, false, VLS_LOAD, ncopies, &memory_access_type, &gs_info)) return false; @@ -7553,32 +7801,8 @@ vectorizable_load (gimple *stmt, gimple_ dr_chain.create (vec_num); if (memory_access_type == VMAT_LOAD_STORE_LANES) - { - tree vec_array; - - vec_array = create_vector_array (vectype, vec_num); - - /* Emit: - VEC_ARRAY = LOAD_LANES (MEM_REF[...all elements...]). */ - data_ref = create_array_ref (aggr_type, dataref_ptr, ref_type); - gcall *call = gimple_build_call_internal (IFN_LOAD_LANES, 1, - data_ref); - gimple_call_set_lhs (call, vec_array); - gimple_call_set_nothrow (call, true); - new_stmt = call; - vect_finish_stmt_generation (stmt, new_stmt, gsi); - - /* Extract each vector into an SSA_NAME. */ - for (i = 0; i < vec_num; i++) - { - new_temp = read_vector_array (stmt, gsi, scalar_dest, - vec_array, i); - dr_chain.quick_push (new_temp); - } - - /* Record the mapping between SSA_NAMEs and statements. */ - vect_record_grouped_load_vectors (stmt, dr_chain); - } + do_load_lanes (stmt, gsi, group_size, vectype, aggr_type, + dataref_ptr, ref_type, NULL_TREE); else { for (i = 0; i < vec_num; i++) @@ -8907,7 +9131,16 @@ vect_transform_stmt (gimple *stmt, gimpl done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) - is_store = true; + { + gcc_assert (!slp_node); + /* As with normal stores, we vectorize the whole group when + we reach the last call in the group. The other calls are + in the group are left with a null VEC_STMT. */ + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + *grouped_store = true; + if (STMT_VINFO_VEC_STMT (stmt_info)) + is_store = true; + } break; case call_simd_clone_vec_info_type: Index: gcc/testsuite/gcc.dg/vect/vect-ooo-group-1.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-ooo-group-1.c 2017-11-08 16:35:04.763816035 +0000 @@ -0,0 +1,12 @@ +/* { dg-do compile } */ + +void +f (int *restrict a, int *restrict b, int *restrict c) +{ + for (int i = 0; i < 100; ++i) + if (c[i]) + { + a[i * 2] = b[i * 5 + 2]; + a[i * 2 + 1] = b[i * 5]; + } +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_1.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_1.c 2017-11-08 16:35:04.763816035 +0000 @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_2 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 2] + src[i * 2 + 1]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld2b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for half float) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld2h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tld2w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tld2d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_1_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_1_run.c 2017-11-08 16:35:04.763816035 +0000 @@ -0,0 +1,38 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_load_1.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N]; \ + INTYPE in[N * 2]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + out[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 2; ++i) \ + in[i] = i * 9 / 2; \ + NAME##_2 (out, in, mask, N); \ + for (int i = 0; i < N; ++i) \ + { \ + OUTTYPE if_true = in[i * 2] + in[i * 2 + 1]; \ + OUTTYPE if_false = i * 7 / 2; \ + if (out[i] != (mask[i] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_2.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_2.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_3 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = (src[i * 3] \ + + src[i * 3 + 1] \ + + src[i * 3 + 2]); \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld3b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for _Float16) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld3h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tld3w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tld3d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_2_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_2_run.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,40 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_load_2.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N]; \ + INTYPE in[N * 3]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + out[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 3; ++i) \ + in[i] = i * 9 / 2; \ + NAME##_3 (out, in, mask, N); \ + for (int i = 0; i < N; ++i) \ + { \ + OUTTYPE if_true = (in[i * 3] \ + + in[i * 3 + 1] \ + + in[i * 3 + 2]); \ + OUTTYPE if_false = i * 7 / 2; \ + if (out[i] != (mask[i] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_3.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_3.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_4 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = (src[i * 4] \ + + src[i * 4 + 1] \ + + src[i * 4 + 2] \ + + src[i * 4 + 3]); \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld4b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for half float) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld4h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tld4w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tld4d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_3_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_3_run.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,41 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_load_3.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N]; \ + INTYPE in[N * 4]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + out[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 4; ++i) \ + in[i] = i * 9 / 2; \ + NAME##_4 (out, in, mask, N); \ + for (int i = 0; i < N; ++i) \ + { \ + OUTTYPE if_true = (in[i * 4] \ + + in[i * 4 + 1] \ + + in[i * 4 + 2] \ + + in[i * 4 + 3]); \ + OUTTYPE if_false = i * 7 / 2; \ + if (out[i] != (mask[i] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_4.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_4.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_3 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 3] + src[i * 3 + 2]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld3b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for half float) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld3h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tld3w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tld3d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_5.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_5.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_4 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 4] + src[i * 4 + 3]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld4b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for half float) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tld4h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tld4w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + Out 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tld4d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_6.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_6.c 2017-11-08 16:35:04.766569934 +0000 @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_2 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 2]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* { dg-final { scan-assembler-not {\tld2b\t} } } */ +/* { dg-final { scan-assembler-not {\tld2h\t} } } */ +/* { dg-final { scan-assembler-not {\tld2w\t} } } */ +/* { dg-final { scan-assembler-not {\tld2d\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_7.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_7.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_3 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 3] + src[i * 3 + 1]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* { dg-final { scan-assembler-not {\tld3b\t} } } */ +/* { dg-final { scan-assembler-not {\tld3h\t} } } */ +/* { dg-final { scan-assembler-not {\tld3w\t} } } */ +/* { dg-final { scan-assembler-not {\tld3d\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_8.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_load_8.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_4 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + dest[i] = src[i * 4] + src[i * 4 + 2]; \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* { dg-final { scan-assembler-not {\tld4b\t} } } */ +/* { dg-final { scan-assembler-not {\tld4h\t} } } */ +/* { dg-final { scan-assembler-not {\tld4w\t} } } */ +/* { dg-final { scan-assembler-not {\tld4d\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_1.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_1.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_2 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + { \ + dest[i * 2] = src[i]; \ + dest[i * 2 + 1] = src[i]; \ + } \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst2b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for _Float16) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst2h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tst2w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tst2d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_1_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_1_run.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,38 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_store_1.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N * 2]; \ + INTYPE in[N]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + in[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 2; ++i) \ + out[i] = i * 9 / 2; \ + NAME##_2 (out, in, mask, N); \ + for (int i = 0; i < N * 2; ++i) \ + { \ + OUTTYPE if_true = in[i / 2]; \ + OUTTYPE if_false = i * 9 / 2; \ + if (out[i] != (mask[i / 2] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_2.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_2.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,71 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_3 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + { \ + dest[i * 3] = src[i]; \ + dest[i * 3 + 1] = src[i]; \ + dest[i * 3 + 2] = src[i]; \ + } \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst3b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for _Float16) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst3h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tst3w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tst3d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_2_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_2_run.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,38 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_store_2.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N * 3]; \ + INTYPE in[N]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + in[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 3; ++i) \ + out[i] = i * 9 / 2; \ + NAME##_3 (out, in, mask, N); \ + for (int i = 0; i < N * 3; ++i) \ + { \ + OUTTYPE if_true = in[i / 3]; \ + OUTTYPE if_false = i * 9 / 2; \ + if (out[i] != (mask[i / 3] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_3.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_3.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,72 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_4 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cond[i]) \ + { \ + dest[i * 4] = src[i]; \ + dest[i * 4 + 1] = src[i]; \ + dest[i * 4 + 2] = src[i]; \ + dest[i * 4 + 3] = src[i]; \ + } \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 1 1 1 1 + 16 | 1 1 1 1 + 32 | 1 1 1 1 + 64 | 1 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst4b\t.z[0-9]} 16 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 2 2 2 2 + 16 | 2 1 1 1 x2 (for half float) + 32 | 2 1 1 1 + 64 | 2 1 1 1. */ +/* { dg-final { scan-assembler-times {\tst4h\t.z[0-9]} 28 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 4 4 4 4 + 16 | 4 2 2 2 + 32 | 4 2 1 1 x2 (for float) + 64 | 4 2 1 1. */ +/* { dg-final { scan-assembler-times {\tst4w\t.z[0-9]} 50 } } */ + +/* Mask | 8 16 32 64 + -------+------------ + In 8 | 8 8 8 8 + 16 | 8 4 4 4 + 32 | 8 4 2 2 + 64 | 8 4 2 1 x2 (for double). */ +/* { dg-final { scan-assembler-times {\tst4d\t.z[0-9]} 98 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_3_run.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_3_run.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,38 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-tree-dce -ffast-math -march=armv8-a+sve" } */ + +#include "sve_mask_struct_store_3.c" + +#define N 100 + +#undef TEST_LOOP +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + { \ + OUTTYPE out[N * 4]; \ + INTYPE in[N]; \ + MASKTYPE mask[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + in[i] = i * 7 / 2; \ + mask[i] = i % 5 <= i % 3; \ + asm volatile ("" ::: "memory"); \ + } \ + for (int i = 0; i < N * 4; ++i) \ + out[i] = i * 9 / 2; \ + NAME##_4 (out, in, mask, N); \ + for (int i = 0; i < N * 4; ++i) \ + { \ + OUTTYPE if_true = in[i / 4]; \ + OUTTYPE if_false = i * 9 / 2; \ + if (out[i] != (mask[i / 4] ? if_true : if_false)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int __attribute__ ((optimize (1))) +main (void) +{ + TEST (test); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_4.c =================================================================== --- /dev/null 2017-11-08 11:04:45.353113300 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_struct_store_4.c 2017-11-08 16:35:04.767487900 +0000 @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve" } */ + +#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \ + void __attribute__ ((noinline, noclone)) \ + NAME##_2 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ + MASKTYPE *__restrict cond, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + { \ + if (cond[i] < 8) \ + dest[i * 2] = src[i]; \ + if (cond[i] > 2) \ + dest[i * 2 + 1] = src[i]; \ + } \ + } + +#define TEST2(NAME, OUTTYPE, INTYPE) \ + TEST_LOOP (NAME##_i8, OUTTYPE, INTYPE, signed char) \ + TEST_LOOP (NAME##_i16, OUTTYPE, INTYPE, unsigned short) \ + TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, float) \ + TEST_LOOP (NAME##_f64, OUTTYPE, INTYPE, double) + +#define TEST1(NAME, OUTTYPE) \ + TEST2 (NAME##_i8, OUTTYPE, signed char) \ + TEST2 (NAME##_i16, OUTTYPE, unsigned short) \ + TEST2 (NAME##_i32, OUTTYPE, int) \ + TEST2 (NAME##_i64, OUTTYPE, unsigned long) + +#define TEST(NAME) \ + TEST1 (NAME##_i8, signed char) \ + TEST1 (NAME##_i16, unsigned short) \ + TEST1 (NAME##_i32, int) \ + TEST1 (NAME##_i64, unsigned long) \ + TEST2 (NAME##_f16_f16, _Float16, _Float16) \ + TEST2 (NAME##_f32_f32, float, float) \ + TEST2 (NAME##_f64_f64, double, double) + +TEST (test) + +/* { dg-final { scan-assembler-not {\tst2b\t.z[0-9]} } } */ +/* { dg-final { scan-assembler-not {\tst2h\t.z[0-9]} } } */ +/* { dg-final { scan-assembler-not {\tst2w\t.z[0-9]} } } */ +/* { dg-final { scan-assembler-not {\tst2d\t.z[0-9]} } } */