From patchwork Thu Nov 28 23:09:06 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 295031 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 4038E2C0079 for ; Fri, 29 Nov 2013 10:09:45 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=rev6A5UTLD3efgkOy /FeJCZBbcOpGPBNBkmruCWaznlttRlBVXgSyKKmU8B+GO/1Hj30lzw1Mbc3K+ge7 5LcZz+jSJ7An3J8dpeu6cgVzKT+gaVP4rK6UEcDx1I34PHoAKK7XdjnKKxVxhnx8 ofwtexr7jg3OJ2O7gFtSxZjEaU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=AxTxZMxgKvLWQigmT6XuOx8 6MGc=; b=aPlMDHJ94yuuz2SKJE7DYiyeQ15L6MxMlLJ0IHs37yzQcxu5xdZX7/q cIDbBCQaqqrqUBNSMSjBb4smUoCPbOQuzyH9oZeWXyB+RkNUZkclBOAIjefV+cpl ijYQ9TN6v49NsfbaIosNxIrTDsWgImd4F/Juw+H+r3IuBvH93VFM= Received: (qmail 22408 invoked by alias); 28 Nov 2013 23:09:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22397 invoked by uid 89); 28 Nov 2013 23:09:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_80, RDNS_NONE, SPF_HELO_PASS, SPF_PASS, URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from Unknown (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 28 Nov 2013 23:09:17 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rASN9AvR022749 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 28 Nov 2013 18:09:10 -0500 Received: from tucnak.zalov.cz (vpn1-4-30.ams2.redhat.com [10.36.4.30]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id rASN98Uf022744 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 28 Nov 2013 18:09:09 -0500 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id rASN97fL005132; Fri, 29 Nov 2013 00:09:07 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id rASN96L2005131; Fri, 29 Nov 2013 00:09:06 +0100 Date: Fri, 29 Nov 2013 00:09:06 +0100 From: Jakub Jelinek To: Richard Biener Cc: Sergey Ostanevich , Richard Henderson , gcc-patches@gcc.gnu.org Subject: [PATCH] Masked load/store vectorization (take 6) Message-ID: <20131128230906.GX892@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20131022105614.GK30970@tucnak.zalov.cz> <20131022132658.GM30970@tucnak.zalov.cz> <20131023172220.GW30970@tucnak.zalov.cz> <20131024111439.GZ30970@tucnak.zalov.cz> <20131112142930.GT27813@tucnak.zalov.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes On Wed, Nov 27, 2013 at 04:10:16PM +0100, Richard Biener wrote: > As you pinged this ... can you re-post a patch with changelog that > includes the followups as we decided? Ok, here is the updated patch against latest trunk with the follow-ups incorporated. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-11-28 Jakub Jelinek * tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field. (LOOP_VINFO_SCALAR_LOOP): Define. (slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument. * config/i386/sse.md (maskload, maskstore): New expanders. * tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref. (get_references_in_stmt): Don't record operand addresses, but operands themselves. Handle MASK_LOAD and MASK_STORE. (find_data_references_in_stmt, graphite_find_data_references_in_stmt): Adjust for the pos -> ref change. * internal-fn.def (LOOP_VECTORIZED, MASK_LOAD, MASK_STORE): New internal fns. * tree-if-conv.c: Include target.h, expr.h, optabs.h and tree-ssa-address.h. (release_bb_predicate): New function. (free_bb_predicate): Use it. (reset_bb_predicate): Likewise. Don't unallocate bb->aux just to immediately allocate it again. (if_convertible_phi_p): Add any_mask_load_store argument, if true, handle it like flag_tree_loop_if_convert_stores. (insert_gimplified_predicates): Likewise. If bb dominates loop->latch, call reset_bb_predicate. (ifcvt_can_use_mask_load_store): New function. (if_convertible_gimple_assign_stmt_p): Add any_mask_load_store argument, check if some conditional loads or stores can't be converted into MASK_LOAD or MASK_STORE. (if_convertible_stmt_p): Add any_mask_load_store argument, pass it down to if_convertible_gimple_assign_stmt_p. (predicate_bbs): Don't return bool, only check if the last stmt of a basic block is GIMPLE_COND and handle that. For basic blocks that dominate loop->latch assume they don't need to be predicated. (if_convertible_loop_p_1): Only call predicate_bbs if flag_tree_loop_if_convert_stores and free_bb_predicate in that case afterwards, check gimple_code of stmts here. Replace is_predicated check with dominance check. Add any_mask_load_store argument, pass it down to if_convertible_stmt_p and if_convertible_phi_p, call if_convertible_phi_p only after all if_convertible_stmt_p calls. (if_convertible_loop_p): Add any_mask_load_store argument, pass it down to if_convertible_loop_p_1. (predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls. (combine_blocks): Add any_mask_load_store argument, pass it down to insert_gimplified_predicates and call predicate_mem_writes if it is set. Call predicate_bbs. (version_loop_for_if_conversion): New function. (tree_if_conversion): Adjust if_convertible_loop_p and combine_blocks calls. Return todo flags instead of bool, call version_loop_for_if_conversion if if-conversion should be just for the vectorized loops and nothing else. (main_tree_if_conversion): Adjust caller. Don't call tree_if_conversion for dont_vectorize loops if if-conversion isn't explicitly enabled. * tree-vect-data-refs.c (vect_check_gather): Handle MASK_LOAD/MASK_STORE. (vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise. * gimple.h (gimple_expr_type): Handle MASK_STORE. * internal-fn.c (expand_LOOP_VECTORIZED, expand_MASK_LOAD, expand_MASK_STORE): New functions. * tree-vectorizer.c: Include tree-cfg.h and gimple-fold.h. (vect_loop_vectorized_call, vect_loop_select): New functions. (vectorize_loops): Don't try to vectorize loops with loop->dont_vectorize set. Set LOOP_VINFO_SCALAR_LOOP for if-converted loops, fold LOOP_VECTORIZED internal call depending on if loop has been vectorized or not. Use vect_loop_select to attempt to vectorize an if-converted loop before it's non-if-converted counterpart. If outer loop vectorization is successful in that case, ensure the loop in the soon to be dead non-if-converted loop is not vectorized. * tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges): New function. (slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument. If non-NULL, copy basic blocks from scalar_loop instead of loop, but still to loop's entry or exit edge. (slpeel_tree_peel_loop_to_edge): Add scalar_loop argument, pass it down to slpeel_tree_duplicate_loop_to_edge_cfg. (vect_do_peeling_for_loop_bound, vect_do_peeling_for_loop_alignment): Adjust callers. (vect_loop_versioning): If LOOP_VINFO_SCALAR_LOOP, perform loop versioning from that loop instead of LOOP_VINFO_LOOP, move it to the right place in the CFG afterwards. * tree-vect-loop.c (vect_determine_vectorization_factor): Handle MASK_STORE. * cfgloop.h (struct loop): Add dont_vectorize field. * tree-loop-distribution.c (copy_loop_before): Adjust slpeel_tree_duplicate_loop_to_edge_cfg caller. * optabs.def (maskload_optab, maskstore_optab): New optabs. * passes.def: Add a note that pass_vectorize must immediately follow pass_if_conversion. * tree-predcom.c (split_data_refs_to_components): Give up if DR_STMT is a call. * tree-vect-stmts.c (vect_mark_relevant): Don't crash if lhs is NULL. (exist_non_indexing_operands_for_use_p): Handle MASK_LOAD and MASK_STORE. (vectorizable_mask_load_store): New function. (vectorizable_call): Call it for MASK_LOAD or MASK_STORE. (vect_transform_stmt): Handle MASK_STORE. * tree-ssa-phiopt.c (cond_if_else_store_replacement): Ignore DR_STMT where lhs is NULL. * gcc.dg/vect/vect-cond-11.c: New test. * gcc.target/i386/vect-cond-1.c: New test. * gcc.target/i386/avx2-gather-5.c: New test. * gcc.target/i386/avx2-gather-6.c: New test. * gcc.dg/vect/vect-mask-loadstore-1.c: New test. * gcc.dg/vect/vect-mask-load-1.c: New test. Jakub --- gcc/tree-vectorizer.h.jj 2013-11-28 09:18:11.771774932 +0100 +++ gcc/tree-vectorizer.h 2013-11-28 14:14:35.827362293 +0100 @@ -344,6 +344,10 @@ typedef struct _loop_vec_info { fix it up. */ bool operands_swapped; + /* If if-conversion versioned this loop before conversion, this is the + loop version without if-conversion. */ + struct loop *scalar_loop; + } *loop_vec_info; /* Access Functions. */ @@ -376,6 +380,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_OPERANDS_SWAPPED(L) (L)->operands_swapped #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ (L)->may_misalign_stmts.length () > 0 @@ -934,7 +939,8 @@ extern source_location vect_location; in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); -struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, edge); +struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, + struct loop *, edge); extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree, tree, unsigned int, bool); --- gcc/config/i386/sse.md.jj 2013-11-23 15:20:47.452606456 +0100 +++ gcc/config/i386/sse.md 2013-11-28 14:13:57.562572366 +0100 @@ -14218,6 +14218,23 @@ (define_insn "_maskstore")]) +(define_expand "maskload" + [(set (match_operand:V48_AVX2 0 "register_operand") + (unspec:V48_AVX2 + [(match_operand: 2 "register_operand") + (match_operand:V48_AVX2 1 "memory_operand")] + UNSPEC_MASKMOV))] + "TARGET_AVX") + +(define_expand "maskstore" + [(set (match_operand:V48_AVX2 0 "memory_operand") + (unspec:V48_AVX2 + [(match_operand: 2 "register_operand") + (match_operand:V48_AVX2 1 "register_operand") + (match_dup 0)] + UNSPEC_MASKMOV))] + "TARGET_AVX") + (define_insn_and_split "avx__" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P --- gcc/tree-data-ref.c.jj 2013-11-27 18:02:48.050814182 +0100 +++ gcc/tree-data-ref.c 2013-11-28 14:13:57.592572476 +0100 @@ -4320,8 +4320,8 @@ compute_all_dependences (vecloop_father; - tree uid = gimple_call_arg (stmt, 0); - gcc_assert (TREE_CODE (uid) == SSA_NAME); - if (loop == NULL - || loop->simduid != SSA_NAME_VAR (uid)) + if (gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_GOMP_SIMD_LANE: + { + struct loop *loop = gimple_bb (stmt)->loop_father; + tree uid = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (uid) == SSA_NAME); + if (loop == NULL + || loop->simduid != SSA_NAME_VAR (uid)) + clobbers_memory = true; + break; + } + case IFN_MASK_LOAD: + case IFN_MASK_STORE: + break; + default: clobbers_memory = true; - } + break; + } else clobbers_memory = true; } @@ -4369,15 +4379,15 @@ get_references_in_stmt (gimple stmt, vec if (stmt_code == GIMPLE_ASSIGN) { tree base; - op0 = gimple_assign_lhs_ptr (stmt); - op1 = gimple_assign_rhs1_ptr (stmt); + op0 = gimple_assign_lhs (stmt); + op1 = gimple_assign_rhs1 (stmt); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) - && (base = get_base_address (*op1)) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) + && (base = get_base_address (op1)) && TREE_CODE (base) != SSA_NAME)) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4386,16 +4396,35 @@ get_references_in_stmt (gimple stmt, vec { unsigned i, n; - op0 = gimple_call_lhs_ptr (stmt); + ref.is_read = false; + if (gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_LOAD: + ref.is_read = true; + case IFN_MASK_STORE: + ref.ref = build2 (MEM_REF, + ref.is_read + ? TREE_TYPE (gimple_call_lhs (stmt)) + : TREE_TYPE (gimple_call_arg (stmt, 3)), + gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + references->safe_push (ref); + return false; + default: + break; + } + + op0 = gimple_call_lhs (stmt); n = gimple_call_num_args (stmt); for (i = 0; i < n; i++) { - op1 = gimple_call_arg_ptr (stmt, i); + op1 = gimple_call_arg (stmt, i); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1))) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) && get_base_address (op1))) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4404,11 +4433,11 @@ get_references_in_stmt (gimple stmt, vec else return clobbers_memory; - if (*op0 - && (DECL_P (*op0) - || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0)))) + if (op0 + && (DECL_P (op0) + || (REFERENCE_CLASS_P (op0) && get_base_address (op0)))) { - ref.pos = op0; + ref.ref = op0; ref.is_read = false; references->safe_push (ref); } @@ -4435,7 +4464,7 @@ find_data_references_in_stmt (struct loo FOR_EACH_VEC_ELT (references, i, ref) { dr = create_data_ref (nest, loop_containing_stmt (stmt), - *ref->pos, stmt, ref->is_read); + ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } @@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l FOR_EACH_VEC_ELT (references, i, ref) { - dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read); + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } --- gcc/internal-fn.def.jj 2013-11-26 21:36:14.018329932 +0100 +++ gcc/internal-fn.def 2013-11-28 14:13:57.517569949 +0100 @@ -43,5 +43,8 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF) +DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF) DEF_INTERNAL_FN (ANNOTATE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW) --- gcc/tree-if-conv.c.jj 2013-11-22 21:03:14.527852266 +0100 +++ gcc/tree-if-conv.c 2013-11-28 14:13:57.668572084 +0100 @@ -110,8 +110,12 @@ along with GCC; see the file COPYING3. #include "tree-chrec.h" #include "tree-data-ref.h" #include "tree-scalar-evolution.h" +#include "tree-ssa-address.h" #include "tree-pass.h" #include "dbgcnt.h" +#include "target.h" +#include "expr.h" +#include "optabs.h" /* List of basic blocks in if-conversion-suitable order. */ static basic_block *ifc_bbs; @@ -194,39 +198,48 @@ init_bb_predicate (basic_block bb) set_bb_predicate (bb, boolean_true_node); } -/* Free the predicate of basic block BB. */ +/* Release the SSA_NAMEs associated with the predicate of basic block BB, + but don't actually free it. */ static inline void -free_bb_predicate (basic_block bb) +release_bb_predicate (basic_block bb) { - gimple_seq stmts; - - if (!bb_has_predicate (bb)) - return; - - /* Release the SSA_NAMEs created for the gimplification of the - predicate. */ - stmts = bb_predicate_gimplified_stmts (bb); + gimple_seq stmts = bb_predicate_gimplified_stmts (bb); if (stmts) { gimple_stmt_iterator i; for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i)) free_stmt_operands (gsi_stmt (i)); + set_bb_predicate_gimplified_stmts (bb, NULL); } +} +/* Free the predicate of basic block BB. */ + +static inline void +free_bb_predicate (basic_block bb) +{ + if (!bb_has_predicate (bb)) + return; + + release_bb_predicate (bb); free (bb->aux); bb->aux = NULL; } -/* Free the predicate of BB and reinitialize it with the true - predicate. */ +/* Reinitialize predicate of BB with the true predicate. */ static inline void reset_bb_predicate (basic_block bb) { - free_bb_predicate (bb); - init_bb_predicate (bb); + if (!bb_has_predicate (bb)) + init_bb_predicate (bb); + else + { + release_bb_predicate (bb); + set_bb_predicate (bb, boolean_true_node); + } } /* Returns a new SSA_NAME of type TYPE that is assigned the value of @@ -464,7 +477,8 @@ bb_with_exit_edge_p (struct loop *loop, - there is a virtual PHI in a BB other than the loop->header. */ static bool -if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi) +if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi, + bool any_mask_load_store) { if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -479,7 +493,7 @@ if_convertible_phi_p (struct loop *loop, return false; } - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) return true; /* When the flag_tree_loop_if_convert_stores is not set, check @@ -695,6 +709,78 @@ ifcvt_could_trap_p (gimple stmt, vecloop_father->force_vect) + || bb->loop_father->dont_vectorize + || !gimple_assign_single_p (stmt) + || gimple_has_volatile_ops (stmt)) + return false; + + /* Check whether this is a load or store. */ + lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) != SSA_NAME) + { + if (!is_gimple_val (gimple_assign_rhs1 (stmt))) + return false; + op = maskstore_optab; + ref = lhs; + } + else if (gimple_assign_load_p (stmt)) + { + op = maskload_optab; + ref = gimple_assign_rhs1 (stmt); + } + else + return false; + + /* And whether REF isn't a MEM_REF with non-addressable decl. */ + if (TREE_CODE (ref) == MEM_REF + && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR + && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)) + && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))) + return false; + + /* Mask should be integer mode of the same size as the load/store + mode. */ + mode = TYPE_MODE (TREE_TYPE (lhs)); + if (int_mode_for_mode (mode) == BLKmode) + return false; + + /* See if there is any chance the mask load or store might be + vectorized. If not, punt. */ + vmode = targetm.vectorize.preferred_simd_mode (mode); + if (!VECTOR_MODE_P (vmode)) + return false; + + if (optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + + vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); + while (vector_sizes != 0) + { + unsigned int cur = 1 << floor_log2 (vector_sizes); + vector_sizes &= ~cur; + if (cur <= GET_MODE_SIZE (mode)) + continue; + vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); + if (VECTOR_MODE_P (vmode) + && optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + } + return false; +} + /* Return true when STMT is if-convertible. GIMPLE_ASSIGN statement is not if-convertible if, @@ -704,7 +790,8 @@ ifcvt_could_trap_p (gimple stmt, vec refs) + vec refs, + bool *any_mask_load_store) { tree lhs = gimple_assign_lhs (stmt); basic_block bb; @@ -730,10 +817,21 @@ if_convertible_gimple_assign_stmt_p (gim return false; } + /* tree-into-ssa.c uses GF_PLF_1, so avoid it, because + in between if_convertible_loop_p and combine_blocks + we can perform loop versioning. */ + gimple_set_plf (stmt, GF_PLF_2, false); + if (flag_tree_loop_if_convert_stores) { if (ifcvt_could_trap_p (stmt, refs)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -743,6 +841,12 @@ if_convertible_gimple_assign_stmt_p (gim if (gimple_assign_rhs_could_trap_p (stmt)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -754,6 +858,12 @@ if_convertible_gimple_assign_stmt_p (gim && bb != bb->loop_father->header && !bb_with_exit_edge_p (bb->loop_father, bb)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "LHS is not var\n"); @@ -772,7 +882,8 @@ if_convertible_gimple_assign_stmt_p (gim - it is a GIMPLE_LABEL or a GIMPLE_COND. */ static bool -if_convertible_stmt_p (gimple stmt, vec refs) +if_convertible_stmt_p (gimple stmt, vec refs, + bool *any_mask_load_store) { switch (gimple_code (stmt)) { @@ -782,7 +893,8 @@ if_convertible_stmt_p (gimple stmt, vec< return true; case GIMPLE_ASSIGN: - return if_convertible_gimple_assign_stmt_p (stmt, refs); + return if_convertible_gimple_assign_stmt_p (stmt, refs, + any_mask_load_store); case GIMPLE_CALL: { @@ -984,7 +1096,7 @@ get_loop_body_in_if_conv_order (const st S1 will be predicated with "x", and S2 will be predicated with "!x". */ -static bool +static void predicate_bbs (loop_p loop) { unsigned int i; @@ -996,7 +1108,7 @@ predicate_bbs (loop_p loop) { basic_block bb = ifc_bbs[i]; tree cond; - gimple_stmt_iterator itr; + gimple stmt; /* The loop latch is always executed and has no extra conditions to be processed: skip it. */ @@ -1006,53 +1118,38 @@ predicate_bbs (loop_p loop) continue; } + /* If dominance tells us this basic block is always executed, force + the condition to be true, this might help simplify other + conditions. */ + if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) + reset_bb_predicate (bb); cond = bb_predicate (bb); - - for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) + stmt = last_stmt (bb); + if (stmt && gimple_code (stmt) == GIMPLE_COND) { - gimple stmt = gsi_stmt (itr); - - switch (gimple_code (stmt)) - { - case GIMPLE_LABEL: - case GIMPLE_ASSIGN: - case GIMPLE_CALL: - case GIMPLE_DEBUG: - break; - - case GIMPLE_COND: - { - tree c2; - edge true_edge, false_edge; - location_t loc = gimple_location (stmt); - tree c = fold_build2_loc (loc, gimple_cond_code (stmt), - boolean_type_node, - gimple_cond_lhs (stmt), - gimple_cond_rhs (stmt)); - - /* Add new condition into destination's predicate list. */ - extract_true_false_edges_from_block (gimple_bb (stmt), - &true_edge, &false_edge); - - /* If C is true, then TRUE_EDGE is taken. */ - add_to_dst_predicate_list (loop, true_edge, - unshare_expr (cond), - unshare_expr (c)); - - /* If C is false, then FALSE_EDGE is taken. */ - c2 = build1_loc (loc, TRUTH_NOT_EXPR, - boolean_type_node, unshare_expr (c)); - add_to_dst_predicate_list (loop, false_edge, - unshare_expr (cond), c2); - - cond = NULL_TREE; - break; - } + tree c2; + edge true_edge, false_edge; + location_t loc = gimple_location (stmt); + tree c = fold_build2_loc (loc, gimple_cond_code (stmt), + boolean_type_node, + gimple_cond_lhs (stmt), + gimple_cond_rhs (stmt)); + + /* Add new condition into destination's predicate list. */ + extract_true_false_edges_from_block (gimple_bb (stmt), + &true_edge, &false_edge); + + /* If C is true, then TRUE_EDGE is taken. */ + add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond), + unshare_expr (c)); + + /* If C is false, then FALSE_EDGE is taken. */ + c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, + unshare_expr (c)); + add_to_dst_predicate_list (loop, false_edge, + unshare_expr (cond), c2); - default: - /* Not handled yet in if-conversion. */ - return false; - } + cond = NULL_TREE; } /* If current bb has only one successor, then consider it as an @@ -1075,8 +1172,6 @@ predicate_bbs (loop_p loop) reset_bb_predicate (loop->header); gcc_assert (bb_predicate_gimplified_stmts (loop->header) == NULL && bb_predicate_gimplified_stmts (loop->latch) == NULL); - - return true; } /* Return true when LOOP is if-convertible. This is a helper function @@ -1087,7 +1182,7 @@ static bool if_convertible_loop_p_1 (struct loop *loop, vec *loop_nest, vec *refs, - vec *ddrs) + vec *ddrs, bool *any_mask_load_store) { bool res; unsigned int i; @@ -1121,9 +1216,24 @@ if_convertible_loop_p_1 (struct loop *lo exit_bb = bb; } - res = predicate_bbs (loop); - if (!res) - return false; + for (i = 0; i < loop->num_nodes; i++) + { + basic_block bb = ifc_bbs[i]; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + switch (gimple_code (gsi_stmt (gsi))) + { + case GIMPLE_LABEL: + case GIMPLE_ASSIGN: + case GIMPLE_CALL: + case GIMPLE_DEBUG: + case GIMPLE_COND: + break; + default: + return false; + } + } if (flag_tree_loop_if_convert_stores) { @@ -1135,6 +1245,7 @@ if_convertible_loop_p_1 (struct loop *lo DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; DR_RW_UNCONDITIONALLY (dr) = -1; } + predicate_bbs (loop); } for (i = 0; i < loop->num_nodes; i++) @@ -1142,17 +1253,31 @@ if_convertible_loop_p_1 (struct loop *lo basic_block bb = ifc_bbs[i]; gimple_stmt_iterator itr; - for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr))) - return false; - /* Check the if-convertibility of statements in predicated BBs. */ - if (is_predicated (bb)) + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs)) + if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, + any_mask_load_store)) return false; } + if (flag_tree_loop_if_convert_stores) + for (i = 0; i < loop->num_nodes; i++) + free_bb_predicate (ifc_bbs[i]); + + /* Checking PHIs needs to be done after stmts, as the fact whether there + are any masked loads or stores affects the tests. */ + for (i = 0; i < loop->num_nodes; i++) + { + basic_block bb = ifc_bbs[i]; + gimple_stmt_iterator itr; + + for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) + if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr), + *any_mask_load_store)) + return false; + } + if (dump_file) fprintf (dump_file, "Applying if-conversion\n"); @@ -1168,7 +1293,7 @@ if_convertible_loop_p_1 (struct loop *lo - if its basic blocks and phi nodes are if convertible. */ static bool -if_convertible_loop_p (struct loop *loop) +if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) { edge e; edge_iterator ei; @@ -1209,7 +1334,8 @@ if_convertible_loop_p (struct loop *loop refs.create (5); ddrs.create (25); stack_vec loop_nest; - res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs); + res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs, + any_mask_load_store); if (flag_tree_loop_if_convert_stores) { @@ -1395,7 +1521,7 @@ predicate_all_scalar_phis (struct loop * gimplification of the predicates. */ static void -insert_gimplified_predicates (loop_p loop) +insert_gimplified_predicates (loop_p loop, bool any_mask_load_store) { unsigned int i; @@ -1404,7 +1530,8 @@ insert_gimplified_predicates (loop_p loo basic_block bb = ifc_bbs[i]; gimple_seq stmts; - if (!is_predicated (bb)) + if (!is_predicated (bb) + || dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) { /* Do not insert statements for a basic block that is not predicated. Also make sure that the predicate of the @@ -1416,7 +1543,8 @@ insert_gimplified_predicates (loop_p loo stmts = bb_predicate_gimplified_stmts (bb); if (stmts) { - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores + || any_mask_load_store) { /* Insert the predicate of the BB just after the label, as the if-conversion of memory writes will use this @@ -1575,9 +1703,49 @@ predicate_mem_writes (loop_p loop) } for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - if ((stmt = gsi_stmt (gsi)) - && gimple_assign_single_p (stmt) - && gimple_vdef (stmt)) + if ((stmt = gsi_stmt (gsi)) == NULL + || !gimple_assign_single_p (stmt)) + continue; + else if (gimple_plf (stmt, GF_PLF_2)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask; + gimple new_stmt; + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); + + masktype = build_nonstandard_integer_type (bitsize, 1); + mask_op0 = build_int_cst (masktype, swap ? 0 : -1); + mask_op1 = build_int_cst (masktype, swap ? -1 : 0); + ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; + addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), + true, NULL_TREE, true, + GSI_SAME_STMT); + cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), + is_gimple_condexpr, NULL_TREE, + true, GSI_SAME_STMT); + mask = fold_build_cond_expr (masktype, unshare_expr (cond), + mask_op0, mask_op1); + mask = ifc_temp_var (masktype, mask, &gsi); + ptr = build_int_cst (reference_alias_ptr_type (ref), 0); + /* Copy points-to info if possible. */ + if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), + ref); + if (TREE_CODE (lhs) == SSA_NAME) + { + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, + ptr, mask); + gimple_call_set_lhs (new_stmt, lhs); + } + else + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, + mask, rhs); + gsi_replace (&gsi, new_stmt, false); + } + else if (gimple_vdef (stmt)) { tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); @@ -1647,7 +1815,7 @@ remove_conditions_and_labels (loop_p loo blocks. Replace PHI nodes with conditional modify expressions. */ static void -combine_blocks (struct loop *loop) +combine_blocks (struct loop *loop, bool any_mask_load_store) { basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; @@ -1655,11 +1823,12 @@ combine_blocks (struct loop *loop) edge e; edge_iterator ei; + predicate_bbs (loop); remove_conditions_and_labels (loop); - insert_gimplified_predicates (loop); + insert_gimplified_predicates (loop, any_mask_load_store); predicate_all_scalar_phis (loop); - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) predicate_mem_writes (loop); /* Merge basic blocks: first remove all the edges in the loop, @@ -1749,28 +1918,146 @@ combine_blocks (struct loop *loop) ifc_bbs = NULL; } -/* If-convert LOOP when it is legal. For the moment this pass has no - profitability analysis. Returns true when something changed. */ +/* Version LOOP before if-converting it, the original loop + will be then if-converted, the new copy of the loop will not, + and the LOOP_VECTORIZED internal call will be guarding which + loop to execute. The vectorizer pass will fold this + internal call into either true or false. */ static bool +version_loop_for_if_conversion (struct loop *loop, bool *do_outer) +{ + struct loop *outer = loop_outer (loop); + basic_block cond_bb; + tree cond = make_ssa_name (boolean_type_node, NULL); + struct loop *new_loop; + gimple g; + gimple_stmt_iterator gsi; + + if (do_outer) + { + *do_outer = false; + if (loop->inner == NULL + && outer->inner == loop + && loop->next == NULL + && loop_outer (outer) + && outer->num_nodes == 3 + loop->num_nodes + && loop_preheader_edge (loop)->src == outer->header + && single_exit (loop) + && outer->latch + && single_exit (loop)->dest == EDGE_PRED (outer->latch, 0)->src) + *do_outer = true; + } + + g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, + build_int_cst (integer_type_node, loop->num), + integer_zero_node); + gimple_call_set_lhs (g, cond); + + initialize_original_copy_tables (); + new_loop = loop_version (loop, cond, &cond_bb, + REG_BR_PROB_BASE, REG_BR_PROB_BASE, + REG_BR_PROB_BASE, true); + free_original_copy_tables (); + if (new_loop == NULL) + return false; + new_loop->dont_vectorize = true; + new_loop->force_vect = false; + gsi = gsi_last_bb (cond_bb); + gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num)); + gsi_insert_before (&gsi, g, GSI_SAME_STMT); + update_ssa (TODO_update_ssa); + if (do_outer == NULL) + { + gcc_assert (single_succ_p (loop->header)); + gsi = gsi_last_bb (single_succ (loop->header)); + gimple cond_stmt = gsi_stmt (gsi); + gsi_prev (&gsi); + g = gsi_stmt (gsi); + gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND + && is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g)); + gimple_cond_set_lhs (cond_stmt, boolean_true_node); + update_stmt (cond_stmt); + gcc_assert (has_zero_uses (gimple_call_lhs (g))); + gsi_remove (&gsi, false); + gcc_assert (single_succ_p (new_loop->header)); + gsi = gsi_last_bb (single_succ (new_loop->header)); + cond_stmt = gsi_stmt (gsi); + gsi_prev (&gsi); + g = gsi_stmt (gsi); + gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND + && is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g) + && new_loop->inner + && new_loop->inner->next + && new_loop->inner->next->next == NULL); + struct loop *inner = new_loop->inner; + basic_block empty_bb = loop_preheader_edge (inner)->src; + gcc_assert (empty_block_p (empty_bb) + && single_pred_p (empty_bb) + && single_succ_p (empty_bb) + && single_pred (empty_bb) == single_succ (new_loop->header)); + if (single_pred_edge (empty_bb)->flags & EDGE_TRUE_VALUE) + { + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->num)); + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->next->num)); + inner->next->dont_vectorize = true; + } + else + { + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->next->num)); + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->num)); + inner->dont_vectorize = true; + } + } + return true; +} + +/* If-convert LOOP when it is legal. For the moment this pass has no + profitability analysis. Returns non-zero todo flags when something + changed. */ + +static unsigned int tree_if_conversion (struct loop *loop) { - bool changed = false; + unsigned int todo = 0; + bool version_outer_loop = false; ifc_bbs = NULL; + bool any_mask_load_store = false; - if (!if_convertible_loop_p (loop) + if (!if_convertible_loop_p (loop, &any_mask_load_store) || !dbg_cnt (if_conversion_tree)) goto cleanup; + if (any_mask_load_store + && ((!flag_tree_loop_vectorize && !loop->force_vect) + || loop->dont_vectorize)) + goto cleanup; + + if (any_mask_load_store + && !version_loop_for_if_conversion (loop, &version_outer_loop)) + goto cleanup; + /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ - combine_blocks (loop); - - if (flag_tree_loop_if_convert_stores) - mark_virtual_operands_for_renaming (cfun); + combine_blocks (loop, any_mask_load_store); - changed = true; + todo |= TODO_cleanup_cfg; + if (flag_tree_loop_if_convert_stores || any_mask_load_store) + { + mark_virtual_operands_for_renaming (cfun); + todo |= TODO_update_ssa_only_virtuals; + } cleanup: if (ifc_bbs) @@ -1784,7 +2071,16 @@ tree_if_conversion (struct loop *loop) ifc_bbs = NULL; } - return changed; + if (todo && version_outer_loop) + { + if (todo & TODO_update_ssa_only_virtuals) + { + update_ssa (TODO_update_ssa_only_virtuals); + todo &= ~TODO_update_ssa_only_virtuals; + } + version_loop_for_if_conversion (loop_outer (loop), NULL); + } + return todo; } /* Tree if-conversion pass management. */ @@ -1793,7 +2089,6 @@ static unsigned int main_tree_if_conversion (void) { struct loop *loop; - bool changed = false; unsigned todo = 0; if (number_of_loops (cfun) <= 1) @@ -1802,15 +2097,9 @@ main_tree_if_conversion (void) FOR_EACH_LOOP (loop, 0) if (flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1 - || flag_tree_loop_vectorize - || loop->force_vect) - changed |= tree_if_conversion (loop); - - if (changed) - todo |= TODO_cleanup_cfg; - - if (changed && flag_tree_loop_if_convert_stores) - todo |= TODO_update_ssa_only_virtuals; + || ((flag_tree_loop_vectorize || loop->force_vect) + && !loop->dont_vectorize)) + todo |= tree_if_conversion (loop); #ifdef ENABLE_CHECKING { --- gcc/tree-vect-data-refs.c.jj 2013-11-28 09:18:11.784774865 +0100 +++ gcc/tree-vect-data-refs.c 2013-11-28 14:13:57.617572349 +0100 @@ -2959,6 +2959,24 @@ vect_check_gather (gimple stmt, loop_vec enum machine_mode pmode; int punsignedp, pvolatilep; + base = DR_REF (dr); + /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, + see if we can use the def stmt of the address. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + && TREE_CODE (base) == MEM_REF + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME + && integer_zerop (TREE_OPERAND (base, 1)) + && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0))) + { + gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0)); + if (is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR) + base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0); + } + /* The gather builtins need address of the form loop_invariant + vector * {1, 2, 4, 8} or @@ -2971,7 +2989,7 @@ vect_check_gather (gimple stmt, loop_vec vectorized. The following code attempts to find such a preexistng SSA_NAME OFF and put the loop invariants into a tree BASE that can be gimplified before the loop. */ - base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off, + base = get_inner_reference (base, &pbitsize, &pbitpos, &off, &pmode, &punsignedp, &pvolatilep, false); gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0); @@ -3468,7 +3486,10 @@ again: offset = unshare_expr (DR_OFFSET (dr)); init = unshare_expr (DR_INIT (dr)); - if (is_gimple_call (stmt)) + if (is_gimple_call (stmt) + && (!gimple_call_internal_p (stmt) + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) { if (dump_enabled_p ()) { @@ -5119,6 +5140,14 @@ vect_supportable_dr_alignment (struct da if (aligned_access_p (dr) && !check_aligned_accesses) return dr_aligned; + /* For now assume all conditional loads/stores support unaligned + access without any special code. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return dr_unaligned_supported; + if (loop_vinfo) { vect_loop = LOOP_VINFO_LOOP (loop_vinfo); --- gcc/gimple.h.jj 2013-11-27 12:10:46.932896086 +0100 +++ gcc/gimple.h 2013-11-28 14:13:57.603572422 +0100 @@ -5670,7 +5670,13 @@ gimple_expr_type (const_gimple stmt) useless conversion involved. That means returning the original RHS type as far as we can reconstruct it. */ if (code == GIMPLE_CALL) - type = gimple_call_return_type (stmt); + { + if (gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + type = gimple_call_return_type (stmt); + } else switch (gimple_assign_rhs_code (stmt)) { --- gcc/internal-fn.c.jj 2013-11-26 21:36:14.218328913 +0100 +++ gcc/internal-fn.c 2013-11-28 14:13:57.661572121 +0100 @@ -153,6 +153,60 @@ expand_UBSAN_NULL (gimple stmt ATTRIBUTE gcc_unreachable (); } +/* This should get folded in tree-vectorizer.c. */ + +static void +expand_LOOP_VECTORIZED (gimple stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + +static void +expand_MASK_LOAD (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, target, mask; + + maskt = gimple_call_arg (stmt, 2); + lhs = gimple_call_lhs (stmt); + type = TREE_TYPE (lhs); + rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + create_output_operand (&ops[0], target, TYPE_MODE (type)); + create_fixed_operand (&ops[1], mem); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); +} + +static void +expand_MASK_STORE (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, reg, mask; + + maskt = gimple_call_arg (stmt, 2); + rhs = gimple_call_arg (stmt, 3); + type = TREE_TYPE (rhs); + lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + reg = expand_normal (rhs); + create_fixed_operand (&ops[0], mem); + create_input_operand (&ops[1], reg, TYPE_MODE (type)); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: --- gcc/tree-vectorizer.c.jj 2013-11-22 21:03:14.525852274 +0100 +++ gcc/tree-vectorizer.c 2013-11-28 15:10:33.364872892 +0100 @@ -75,11 +75,13 @@ along with GCC; see the file COPYING3. #include "tree-phinodes.h" #include "ssa-iterators.h" #include "tree-ssa-loop-manip.h" +#include "tree-cfg.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "tree-pass.h" #include "tree-ssa-propagate.h" #include "dbgcnt.h" +#include "gimple-fold.h" /* Loop or bb location. */ source_location vect_location; @@ -317,6 +319,68 @@ vect_destroy_datarefs (loop_vec_info loo } +/* If LOOP has been versioned during ifcvt, return the internal call + guarding it. */ + +static gimple +vect_loop_vectorized_call (struct loop *loop) +{ + basic_block bb = loop_preheader_edge (loop)->src; + gimple g; + do + { + g = last_stmt (bb); + if (g) + break; + if (!single_pred_p (bb)) + break; + bb = single_pred (bb); + } + while (1); + if (g && gimple_code (g) == GIMPLE_COND) + { + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gsi_prev (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && (tree_to_shwi (gimple_call_arg (g, 0)) == loop->num + || tree_to_shwi (gimple_call_arg (g, 1)) == loop->num)) + return g; + } + } + return NULL; +} + +/* Helper function of vectorize_loops. If LOOP is non-if-converted + loop that has if-converted counterpart, return the if-converted + counterpart, so that we try vectorizing if-converted loops before + inner loops of non-if-converted loops. */ + +static struct loop * +vect_loop_select (struct loop *loop) +{ + if (!loop->dont_vectorize) + return loop; + + gimple g = vect_loop_vectorized_call (loop); + if (g == NULL) + return loop; + + if (tree_to_shwi (gimple_call_arg (g, 1)) != loop->num) + return loop; + + struct loop *ifcvt_loop + = get_loop (cfun, tree_to_shwi (gimple_call_arg (g, 0))); + if (ifcvt_loop && !ifcvt_loop->dont_vectorize) + return ifcvt_loop; + return loop; +} + + /* Function vectorize_loops. Entry point to loop vectorization phase. */ @@ -327,9 +391,11 @@ vectorize_loops (void) unsigned int i; unsigned int num_vectorized_loops = 0; unsigned int vect_loops_num; - struct loop *loop; + struct loop *loop, *iloop; hash_table simduid_to_vf_htab; hash_table simd_array_to_simduid_htab; + bool any_ifcvt_loops = false; + unsigned ret = 0; vect_loops_num = number_of_loops (cfun); @@ -351,9 +417,12 @@ vectorize_loops (void) /* If some loop was duplicated, it gets bigger number than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ - FOR_EACH_LOOP (loop, 0) - if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop)) - || loop->force_vect) + FOR_EACH_LOOP (iloop, 0) + if ((loop = vect_loop_select (iloop))->dont_vectorize) + any_ifcvt_loops = true; + else if ((flag_tree_loop_vectorize + && optimize_loop_nest_for_speed_p (loop)) + || loop->force_vect) { loop_vec_info loop_vinfo; vect_location = find_loop_location (loop); @@ -363,6 +432,10 @@ vectorize_loops (void) LOCATION_FILE (vect_location), LOCATION_LINE (vect_location)); + /* Make sure we don't try to vectorize this loop + more than once. */ + loop->dont_vectorize = true; + loop_vinfo = vect_analyze_loop (loop); loop->aux = loop_vinfo; @@ -372,6 +445,45 @@ vectorize_loops (void) if (!dbg_cnt (vect_loop)) break; + gimple loop_vectorized_call = vect_loop_vectorized_call (loop); + if (loop_vectorized_call) + { + tree arg = gimple_call_arg (loop_vectorized_call, 1); + basic_block *bbs; + unsigned int i; + struct loop *scalar_loop = get_loop (cfun, tree_to_shwi (arg)); + struct loop *inner; + + LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop; + gcc_checking_assert (vect_loop_vectorized_call + (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) + == loop_vectorized_call); + bbs = get_loop_body (scalar_loop); + for (i = 0; i < scalar_loop->num_nodes; i++) + { + basic_block bb = bbs[i]; + gimple_stmt_iterator gsi; + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + gimple_set_uid (phi, 0); + } + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, 0); + } + } + free (bbs); + /* If we have successfully vectorized an if-converted outer + loop, don't attempt to vectorize the if-converted inner + loop of the alternate loop. */ + for (inner = scalar_loop->inner; inner; inner = inner->next) + inner->dont_vectorize = true; + } + if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION && dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, @@ -392,7 +504,29 @@ vectorize_loops (void) *simduid_to_vf_htab.find_slot (simduid_to_vf_data, INSERT) = simduid_to_vf_data; } + + if (loop_vectorized_call) + { + gimple g = loop_vectorized_call; + tree lhs = gimple_call_lhs (g); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gimplify_and_update_call_from_tree (&gsi, boolean_true_node); + gsi_next (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (gimple_code (g) == GIMPLE_COND + && gimple_cond_lhs (g) == lhs) + { + gimple_cond_set_lhs (g, boolean_true_node); + update_stmt (g); + ret |= TODO_cleanup_cfg; + } + } + } } + else + loop->dont_vectorize = true; vect_location = UNKNOWN_LOCATION; @@ -405,6 +539,34 @@ vectorize_loops (void) /* ----------- Finalize. ----------- */ + if (any_ifcvt_loops) + for (i = 1; i < vect_loops_num; i++) + { + loop = get_loop (cfun, i); + if (loop && loop->dont_vectorize) + { + gimple g = vect_loop_vectorized_call (loop); + if (g) + { + tree lhs = gimple_call_lhs (g); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gimplify_and_update_call_from_tree (&gsi, boolean_false_node); + gsi_next (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (gimple_code (g) == GIMPLE_COND + && gimple_cond_lhs (g) == lhs) + { + gimple_cond_set_lhs (g, boolean_false_node); + update_stmt (g); + ret |= TODO_cleanup_cfg; + } + } + } + } + } + for (i = 1; i < vect_loops_num; i++) { loop_vec_info loop_vinfo; @@ -462,7 +624,7 @@ vectorize_loops (void) return TODO_cleanup_cfg; } - return 0; + return ret; } --- gcc/tree-vect-loop-manip.c.jj 2013-11-22 21:03:08.418882641 +0100 +++ gcc/tree-vect-loop-manip.c 2013-11-28 14:54:01.621096704 +0100 @@ -703,12 +703,42 @@ slpeel_make_loop_iterate_ntimes (struct loop->nb_iterations = niters; } +/* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg. + For all PHI arguments in FROM->dest and TO->dest from those + edges ensure that TO->dest PHI arguments have current_def + to that in from. */ + +static void +slpeel_duplicate_current_defs_from_edges (edge from, edge to) +{ + gimple_stmt_iterator gsi_from, gsi_to; + + for (gsi_from = gsi_start_phis (from->dest), + gsi_to = gsi_start_phis (to->dest); + !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple from_phi = gsi_stmt (gsi_from); + gimple to_phi = gsi_stmt (gsi_to); + tree from_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, from); + tree to_arg = PHI_ARG_DEF_FROM_EDGE (to_phi, to); + if (TREE_CODE (from_arg) == SSA_NAME + && TREE_CODE (to_arg) == SSA_NAME + && get_current_def (to_arg) == NULL_TREE) + set_current_def (to_arg, get_current_def (from_arg)); + } +} + /* Given LOOP this function generates a new copy of it and puts it - on E which is either the entry or exit of LOOP. */ + on E which is either the entry or exit of LOOP. If SCALAR_LOOP is + non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the + basic blocks from SCALAR_LOOP instead of LOOP, but to either the + entry or exit of LOOP. */ struct loop * -slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e) +slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, + struct loop *scalar_loop, edge e) { struct loop *new_loop; basic_block *new_bbs, *bbs; @@ -722,19 +752,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( if (!at_exit && e != loop_preheader_edge (loop)) return NULL; - bbs = XNEWVEC (basic_block, loop->num_nodes + 1); - get_loop_body_with_size (loop, bbs, loop->num_nodes); + if (scalar_loop == NULL) + scalar_loop = loop; + + bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); + get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes); /* Check whether duplication is possible. */ - if (!can_copy_bbs_p (bbs, loop->num_nodes)) + if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes)) { free (bbs); return NULL; } /* Generate new loop structure. */ - new_loop = duplicate_loop (loop, loop_outer (loop)); - duplicate_subloops (loop, new_loop); + new_loop = duplicate_loop (scalar_loop, loop_outer (scalar_loop)); + duplicate_subloops (scalar_loop, new_loop); exit_dest = exit->dest; was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS, @@ -744,35 +777,80 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( /* Also copy the pre-header, this avoids jumping through hoops to duplicate the loop entry PHI arguments. Create an empty pre-header unconditionally for this. */ - basic_block preheader = split_edge (loop_preheader_edge (loop)); + basic_block preheader = split_edge (loop_preheader_edge (scalar_loop)); edge entry_e = single_pred_edge (preheader); - bbs[loop->num_nodes] = preheader; - new_bbs = XNEWVEC (basic_block, loop->num_nodes + 1); + bbs[scalar_loop->num_nodes] = preheader; + new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); - copy_bbs (bbs, loop->num_nodes + 1, new_bbs, + exit = single_exit (scalar_loop); + copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs, &exit, 1, &new_exit, NULL, e->src, true); - basic_block new_preheader = new_bbs[loop->num_nodes]; + exit = single_exit (loop); + basic_block new_preheader = new_bbs[scalar_loop->num_nodes]; - add_phi_args_after_copy (new_bbs, loop->num_nodes + 1, NULL); + add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL); + + if (scalar_loop != loop) + { + /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from + SCALAR_LOOP will have current_def set to SSA_NAMEs in the new_loop, + but LOOP will not. slpeel_update_phi_nodes_for_guard{1,2} expects + the LOOP SSA_NAMEs (on the exit edge and edge from latch to + header) to have current_def set, so copy them over. */ + slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop), + exit); + slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch, + 0), + EDGE_SUCC (loop->latch, 0)); + } if (at_exit) /* Add the loop copy at exit. */ { + if (scalar_loop != loop) + { + gimple_stmt_iterator gsi; + new_exit = redirect_edge_and_branch (new_exit, exit_dest); + + for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e); + location_t orig_locus + = gimple_phi_arg_location_from_edge (phi, e); + + add_phi_arg (phi, orig_arg, new_exit, orig_locus); + } + } redirect_edge_and_branch_force (e, new_preheader); flush_pending_stmts (e); set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src); if (was_imm_dom) - set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header); + set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src); /* And remove the non-necessary forwarder again. Keep the other one so we have a proper pre-header for the loop at the exit edge. */ - redirect_edge_pred (single_succ_edge (preheader), single_pred (preheader)); + redirect_edge_pred (single_succ_edge (preheader), + single_pred (preheader)); delete_basic_block (preheader); - set_immediate_dominator (CDI_DOMINATORS, loop->header, - loop_preheader_edge (loop)->src); + set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, + loop_preheader_edge (scalar_loop)->src); } else /* Add the copy at entry. */ { + if (scalar_loop != loop) + { + /* Remove the non-necessary forwarder of scalar_loop again. */ + redirect_edge_pred (single_succ_edge (preheader), + single_pred (preheader)); + delete_basic_block (preheader); + set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, + loop_preheader_edge (scalar_loop)->src); + preheader = split_edge (loop_preheader_edge (loop)); + entry_e = single_pred_edge (preheader); + } + redirect_edge_and_branch_force (entry_e, new_preheader); flush_pending_stmts (entry_e); set_immediate_dominator (CDI_DOMINATORS, new_preheader, entry_e->src); @@ -783,15 +861,39 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( /* And remove the non-necessary forwarder again. Keep the other one so we have a proper pre-header for the loop at the exit edge. */ - redirect_edge_pred (single_succ_edge (new_preheader), single_pred (new_preheader)); + redirect_edge_pred (single_succ_edge (new_preheader), + single_pred (new_preheader)); delete_basic_block (new_preheader); set_immediate_dominator (CDI_DOMINATORS, new_loop->header, loop_preheader_edge (new_loop)->src); } - for (unsigned i = 0; i < loop->num_nodes+1; i++) + for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++) rename_variables_in_bb (new_bbs[i]); + if (scalar_loop != loop) + { + /* Update new_loop->header PHIs, so that on the preheader + edge they are the ones from loop rather than scalar_loop. */ + gimple_stmt_iterator gsi_orig, gsi_new; + edge orig_e = loop_preheader_edge (loop); + edge new_e = loop_preheader_edge (new_loop); + + for (gsi_orig = gsi_start_phis (loop->header), + gsi_new = gsi_start_phis (new_loop->header); + !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new); + gsi_next (&gsi_orig), gsi_next (&gsi_new)) + { + gimple orig_phi = gsi_stmt (gsi_orig); + gimple new_phi = gsi_stmt (gsi_new); + tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e); + location_t orig_locus + = gimple_phi_arg_location_from_edge (orig_phi, orig_e); + + add_phi_arg (new_phi, orig_arg, new_e, orig_locus); + } + } + free (new_bbs); free (bbs); @@ -1002,6 +1104,8 @@ set_prologue_iterations (basic_block bb_ Input: - LOOP: the loop to be peeled. + - SCALAR_LOOP: if non-NULL, the alternate loop from which basic blocks + should be copied. - E: the exit or entry edge of LOOP. If it is the entry edge, we peel the first iterations of LOOP. In this case first-loop is LOOP, and second-loop is the newly created loop. @@ -1043,8 +1147,8 @@ set_prologue_iterations (basic_block bb_ FORNOW the resulting code will not be in loop-closed-ssa form. */ -static struct loop* -slpeel_tree_peel_loop_to_edge (struct loop *loop, +static struct loop * +slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop, edge e, tree *first_niters, tree niters, bool update_first_loop_count, unsigned int th, bool check_profitability, @@ -1129,7 +1233,8 @@ slpeel_tree_peel_loop_to_edge (struct lo orig_exit_bb: */ - if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e))) + if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, + e))) { loop_loc = find_loop_location (loop); dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc, @@ -1625,6 +1730,7 @@ vect_do_peeling_for_loop_bound (loop_vec unsigned int th, bool check_profitability) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); struct loop *new_loop; edge update_e; basic_block preheader; @@ -1641,11 +1747,12 @@ vect_do_peeling_for_loop_bound (loop_vec loop_num = loop->num; - new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop), - &ratio_mult_vf_name, ni_name, false, - th, check_profitability, - cond_expr, cond_expr_stmt_list, - 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo)); + new_loop + = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop), + &ratio_mult_vf_name, ni_name, false, + th, check_profitability, + cond_expr, cond_expr_stmt_list, + 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo)); gcc_assert (new_loop); gcc_assert (loop_num == loop->num); #ifdef ENABLE_CHECKING @@ -1878,6 +1985,7 @@ vect_do_peeling_for_alignment (loop_vec_ unsigned int th, bool check_profitability) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); tree niters_of_prolog_loop; tree wide_prolog_niters; struct loop *new_loop; @@ -1899,11 +2007,11 @@ vect_do_peeling_for_alignment (loop_vec_ /* Peel the prolog loop and iterate it niters_of_prolog_loop. */ new_loop = - slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop), + slpeel_tree_peel_loop_to_edge (loop, scalar_loop, + loop_preheader_edge (loop), &niters_of_prolog_loop, ni_name, true, th, check_profitability, NULL_TREE, NULL, - bound, - 0); + bound, 0); gcc_assert (new_loop); #ifdef ENABLE_CHECKING @@ -2187,6 +2295,7 @@ vect_loop_versioning (loop_vec_info loop unsigned int th, bool check_profitability) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); basic_block condition_bb; gimple_stmt_iterator gsi, cond_exp_gsi; basic_block merge_bb; @@ -2222,8 +2331,43 @@ vect_loop_versioning (loop_vec_info loop gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list); initialize_original_copy_tables (); - loop_version (loop, cond_expr, &condition_bb, - prob, prob, REG_BR_PROB_BASE - prob, true); + if (scalar_loop) + { + edge scalar_e; + basic_block preheader, scalar_preheader; + + /* We don't want to scale SCALAR_LOOP's frequencies, we need to + scale LOOP's frequencies instead. */ + loop_version (scalar_loop, cond_expr, &condition_bb, + prob, REG_BR_PROB_BASE, REG_BR_PROB_BASE - prob, true); + scale_loop_frequencies (loop, prob, REG_BR_PROB_BASE); + /* CONDITION_BB was created above SCALAR_LOOP's preheader, + while we need to move it above LOOP's preheader. */ + e = loop_preheader_edge (loop); + scalar_e = loop_preheader_edge (scalar_loop); + gcc_assert (empty_block_p (e->src) + && single_pred_p (e->src)); + gcc_assert (empty_block_p (scalar_e->src) + && single_pred_p (scalar_e->src)); + gcc_assert (single_pred_p (condition_bb)); + preheader = e->src; + scalar_preheader = scalar_e->src; + scalar_e = find_edge (condition_bb, scalar_preheader); + e = single_pred_edge (preheader); + redirect_edge_and_branch_force (single_pred_edge (condition_bb), + scalar_preheader); + redirect_edge_and_branch_force (scalar_e, preheader); + redirect_edge_and_branch_force (e, condition_bb); + set_immediate_dominator (CDI_DOMINATORS, condition_bb, + single_pred (condition_bb)); + set_immediate_dominator (CDI_DOMINATORS, scalar_preheader, + single_pred (scalar_preheader)); + set_immediate_dominator (CDI_DOMINATORS, preheader, + condition_bb); + } + else + loop_version (loop, cond_expr, &condition_bb, + prob, prob, REG_BR_PROB_BASE - prob, true); if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION && dump_enabled_p ()) @@ -2246,24 +2390,29 @@ vect_loop_versioning (loop_vec_info loop basic block (i.e. it has two predecessors). Just in order to simplify following transformations in the vectorizer, we fix this situation here by adding a new (empty) block on the exit-edge of the loop, - with the proper loop-exit phis to maintain loop-closed-form. */ + with the proper loop-exit phis to maintain loop-closed-form. + If loop versioning wasn't done from loop, but scalar_loop instead, + merge_bb will have already just a single successor. */ merge_bb = single_exit (loop)->dest; - gcc_assert (EDGE_COUNT (merge_bb->preds) == 2); - new_exit_bb = split_edge (single_exit (loop)); - new_exit_e = single_exit (loop); - e = EDGE_SUCC (new_exit_bb, 0); - - for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi)) - { - tree new_res; - orig_phi = gsi_stmt (gsi); - new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL); - new_phi = create_phi_node (new_res, new_exit_bb); - arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e); - add_phi_arg (new_phi, arg, new_exit_e, - gimple_phi_arg_location_from_edge (orig_phi, e)); - adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); + if (scalar_loop == NULL || EDGE_COUNT (merge_bb->preds) >= 2) + { + gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2); + new_exit_bb = split_edge (single_exit (loop)); + new_exit_e = single_exit (loop); + e = EDGE_SUCC (new_exit_bb, 0); + + for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + tree new_res; + orig_phi = gsi_stmt (gsi); + new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL); + new_phi = create_phi_node (new_res, new_exit_bb); + arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e); + add_phi_arg (new_phi, arg, new_exit_e, + gimple_phi_arg_location_from_edge (orig_phi, e)); + adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); + } } --- gcc/tree-vect-loop.c.jj 2013-11-28 09:18:11.772774927 +0100 +++ gcc/tree-vect-loop.c 2013-11-28 14:13:57.643572214 +0100 @@ -374,7 +374,11 @@ vect_determine_vectorization_factor (loo analyze_pattern_stmt = false; } - if (gimple_get_lhs (stmt) == NULL_TREE) + if (gimple_get_lhs (stmt) == NULL_TREE + /* MASK_STORE has no lhs, but is ok. */ + && (!is_gimple_call (stmt) + || !gimple_call_internal_p (stmt) + || gimple_call_internal_fn (stmt) != IFN_MASK_STORE)) { if (is_gimple_call (stmt)) { @@ -426,7 +430,12 @@ vect_determine_vectorization_factor (loo else { gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); - scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, --- gcc/cfgloop.h.jj 2013-11-19 21:56:40.389335752 +0100 +++ gcc/cfgloop.h 2013-11-28 14:13:57.602572427 +0100 @@ -176,6 +176,9 @@ struct GTY ((chain_next ("%h.next"))) lo /* True if we should try harder to vectorize this loop. */ bool force_vect; + /* True if this loop should never be vectorized. */ + bool dont_vectorize; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ --- gcc/tree-loop-distribution.c.jj 2013-11-22 21:03:05.696896177 +0100 +++ gcc/tree-loop-distribution.c 2013-11-28 14:13:57.632572271 +0100 @@ -588,7 +588,7 @@ copy_loop_before (struct loop *loop) edge preheader = loop_preheader_edge (loop); initialize_original_copy_tables (); - res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, preheader); + res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader); gcc_assert (res != NULL); free_original_copy_tables (); delete_update_ssa (); --- gcc/optabs.def.jj 2013-11-26 21:36:14.066329682 +0100 +++ gcc/optabs.def 2013-11-28 14:13:57.624572312 +0100 @@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") +OPTAB_D (maskload_optab, "maskload$a") +OPTAB_D (maskstore_optab, "maskstore$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") --- gcc/testsuite/gcc.target/i386/avx2-gather-6.c.jj 2013-11-28 14:13:57.633572267 +0100 +++ gcc/testsuite/gcc.target/i386/avx2-gather-6.c 2013-11-28 14:13:57.633572267 +0100 @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details" } */ + +#include "avx2-gather-5.c" + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops in function" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.target/i386/vect-cond-1.c.jj 2013-11-28 14:57:58.182864189 +0100 +++ gcc/testsuite/gcc.target/i386/vect-cond-1.c 2013-11-28 14:57:58.182864189 +0100 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -mavx2" { target avx2 } } */ + +int a[1024]; + +int +foo (int *p) +{ + int i; + for (i = 0; i < 1024; i++) + { + int t; + if (a[i] < 30) + t = *p; + else + t = a[i] + 12; + a[i] = t; + } +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.target/i386/avx2-gather-5.c.jj 2013-11-28 14:13:57.633572267 +0100 +++ gcc/testsuite/gcc.target/i386/avx2-gather-5.c 2013-11-28 14:13:57.633572267 +0100 @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx2 } */ +/* { dg-options "-O3 -mavx2 -fno-common" } */ + +#include "avx2-check.h" + +#define N 1024 +float vf1[N+16], vf2[N], vf3[N]; +int k[N]; + +__attribute__((noinline, noclone)) void +foo (void) +{ + int i; + for (i = 0; i < N; i++) + { + float f; + if (vf3[i] < 0.0f) + f = vf1[k[i]]; + else + f = 7.0f; + vf2[i] = f; + } +} + +static void +avx2_test (void) +{ + int i; + for (i = 0; i < N + 16; i++) + { + vf1[i] = 5.5f * i; + if (i >= N) + continue; + vf2[i] = 2.0f; + vf3[i] = (i & 1) ? i : -i - 1; + k[i] = (i & 1) ? ((i & 2) ? -i : N / 2 + i) : (i * 7) % N; + asm (""); + } + foo (); + for (i = 0; i < N; i++) + if (vf1[i] != 5.5 * i + || vf2[i] != ((i & 1) ? 7.0f : 5.5f * ((i * 7) % N)) + || vf3[i] != ((i & 1) ? i : -i - 1) + || k[i] != ((i & 1) ? ((i & 2) ? -i : N / 2 + i) : ((i * 7) % N))) + abort (); +} --- gcc/testsuite/gcc.dg/vect/vect-cond-11.c.jj 2013-11-28 14:13:57.634572262 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-cond-11.c 2013-11-28 14:13:57.634572262 +0100 @@ -0,0 +1,116 @@ +#include "tree-vect.h" + +#define N 1024 +typedef int V __attribute__((vector_size (4))); +unsigned int a[N * 2] __attribute__((aligned)); +unsigned int b[N * 2] __attribute__((aligned)); +V c[N]; + +__attribute__((noinline, noclone)) unsigned int +foo (unsigned int *a, unsigned int *b) +{ + int i; + unsigned int r = 0; + for (i = 0; i < N; i++) + { + unsigned int x = a[i], y = b[i]; + if (x < 32) + { + x = x + 127; + y = y * 2; + } + else + { + x = x - 16; + y = y + 1; + } + a[i] = x; + b[i] = y; + r += x; + } + return r; +} + +__attribute__((noinline, noclone)) unsigned int +bar (unsigned int *a, unsigned int *b) +{ + int i; + unsigned int r = 0; + for (i = 0; i < N; i++) + { + unsigned int x = a[i], y = b[i]; + if (x < 32) + { + x = x + 127; + y = y * 2; + } + else + { + x = x - 16; + y = y + 1; + } + a[i] = x; + b[i] = y; + c[i] = c[i] + 1; + r += x; + } + return r; +} + +void +baz (unsigned int *a, unsigned int *b, + unsigned int (*fn) (unsigned int *, unsigned int *)) +{ + int i; + for (i = -64; i < 0; i++) + { + a[i] = 19; + b[i] = 17; + } + for (; i < N; i++) + { + a[i] = i - 512; + b[i] = i; + } + for (; i < N + 64; i++) + { + a[i] = 27; + b[i] = 19; + } + if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U) + __builtin_abort (); + for (i = -64; i < 0; i++) + if (a[i] != 19 || b[i] != 17) + __builtin_abort (); + for (; i < N; i++) + if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16) + || b[i] != (i - 512U < 32U ? i * 2U : i + 1U)) + __builtin_abort (); + for (; i < N + 64; i++) + if (a[i] != 27 || b[i] != 19) + __builtin_abort (); +} + +int +main () +{ + int i; + check_vect (); + baz (a + 512, b + 512, foo); + baz (a + 512, b + 512, bar); + baz (a + 512 + 1, b + 512 + 1, foo); + baz (a + 512 + 1, b + 512 + 1, bar); + baz (a + 512 + 31, b + 512 + 31, foo); + baz (a + 512 + 31, b + 512 + 31, bar); + baz (a + 512 + 1, b + 512, foo); + baz (a + 512 + 1, b + 512, bar); + baz (a + 512 + 31, b + 512, foo); + baz (a + 512 + 31, b + 512, bar); + baz (a + 512, b + 512 + 1, foo); + baz (a + 512, b + 512 + 1, bar); + baz (a + 512, b + 512 + 31, foo); + baz (a + 512, b + 512 + 31, bar); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c.jj 2013-11-28 14:13:57.633572267 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c 2013-11-28 14:13:57.633572267 +0100 @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-additional-options "-Ofast -fno-common" } */ +/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */ + +#include +#include "tree-vect.h" + +__attribute__((noinline, noclone)) void +foo (double *x, double *y) +{ + double *p = __builtin_assume_aligned (x, 16); + double *q = __builtin_assume_aligned (y, 16); + double z, h; + int i; + for (i = 0; i < 1024; i++) + { + if (p[i] < 0.0) + z = q[i], h = q[i] * 7.0 + 3.0; + else + z = p[i] + 6.0, h = p[1024 + i]; + p[i] = z + 2.0 * h; + } +} + +double a[2048] __attribute__((aligned (16))); +double b[1024] __attribute__((aligned (16))); + +int +main () +{ + int i; + check_vect (); + for (i = 0; i < 1024; i++) + { + a[i] = (i & 1) ? -i : 2 * i; + a[i + 1024] = i; + b[i] = 7 * i; + asm (""); + } + foo (a, b); + for (i = 0; i < 1024; i++) + if (a[i] != ((i & 1) + ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0) + : 2 * i + 6.0 + 2.0 * i) + || b[i] != 7 * i + || a[i + 1024] != i) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c.jj 2013-11-28 14:13:57.634572262 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c 2013-11-28 14:13:57.634572262 +0100 @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-additional-options "-Ofast -fno-common" } */ +/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */ + +#include +#include "tree-vect.h" + +__attribute__((noinline, noclone)) void +foo (float *__restrict x, float *__restrict y, float *__restrict z) +{ + float *__restrict p = __builtin_assume_aligned (x, 32); + float *__restrict q = __builtin_assume_aligned (y, 32); + float *__restrict r = __builtin_assume_aligned (z, 32); + int i; + for (i = 0; i < 1024; i++) + { + if (p[i] < 0.0f) + q[i] = p[i] + 2.0f; + else + p[i] = r[i] + 3.0f; + } +} + +float a[1024] __attribute__((aligned (32))); +float b[1024] __attribute__((aligned (32))); +float c[1024] __attribute__((aligned (32))); + +int +main () +{ + int i; + check_vect (); + for (i = 0; i < 1024; i++) + { + a[i] = (i & 1) ? -i : i; + b[i] = 7 * i; + c[i] = a[i] - 3.0f; + asm (""); + } + foo (a, b, c); + for (i = 0; i < 1024; i++) + if (a[i] != ((i & 1) ? -i : i) + || b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i) + || c[i] != a[i] - 3.0f) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/passes.def.jj 2013-11-27 12:15:13.999517045 +0100 +++ gcc/passes.def 2013-11-28 14:13:57.602572427 +0100 @@ -217,6 +217,8 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_if_conversion); + /* pass_vectorize must immediately follow pass_if_conversion. + Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) NEXT_PASS (pass_dce_loop); --- gcc/tree-predcom.c.jj 2013-11-22 21:03:14.589851957 +0100 +++ gcc/tree-predcom.c 2013-11-28 14:59:15.529464377 +0100 @@ -732,6 +732,9 @@ split_data_refs_to_components (struct lo just fail. */ goto end; } + /* predcom pass isn't prepared to handle calls with data references. */ + if (is_gimple_call (DR_STMT (dr))) + goto end; dr->aux = (void *) (size_t) i; comp_father[i] = i; comp_size[i] = 1; --- gcc/tree-vect-stmts.c.jj 2013-11-27 12:15:14.038516844 +0100 +++ gcc/tree-vect-stmts.c 2013-11-28 14:57:58.182864189 +0100 @@ -235,7 +235,7 @@ vect_mark_relevant (vec *worklis /* This use is out of pattern use, if LHS has other uses that are pattern uses, we should mark the stmt itself, and not the pattern stmt. */ - if (TREE_CODE (lhs) == SSA_NAME) + if (lhs && TREE_CODE (lhs) == SSA_NAME) FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) { if (is_gimple_debug (USE_STMT (use_p))) @@ -393,7 +393,27 @@ exist_non_indexing_operands_for_use_p (t first case, and whether var corresponds to USE. */ if (!gimple_assign_copy_p (stmt)) - return false; + { + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_STORE: + operand = gimple_call_arg (stmt, 3); + if (operand == use) + return true; + /* FALLTHRU */ + case IFN_MASK_LOAD: + operand = gimple_call_arg (stmt, 2); + if (operand == use) + return true; + break; + default: + break; + } + return false; + } + if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) return false; operand = gimple_assign_rhs1 (stmt); @@ -1696,6 +1716,413 @@ vectorizable_function (gimple call, tree vectype_in); } + +static tree permute_vec_elements (tree, tree, tree, gimple, + gimple_stmt_iterator *); + + +/* Function vectorizable_mask_load_store. + + Check if STMT performs a conditional load or store that can be vectorized. + If VEC_STMT is also passed, vectorize the STMT: create a vectorized + stmt to replace it, put it in VEC_STMT, and insert it at GSI. + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ + +static bool +vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, slp_tree slp_node) +{ + tree vec_dest = NULL; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + stmt_vec_info prev_stmt_info; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree elem_type; + gimple new_stmt; + tree dummy; + tree dataref_ptr = NULL_TREE; + gimple ptr_incr; + int nunits = TYPE_VECTOR_SUBPARTS (vectype); + int ncopies; + int i, j; + bool inv_p; + tree gather_base = NULL_TREE, gather_off = NULL_TREE; + tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE; + int gather_scale = 1; + enum vect_def_type gather_dt = vect_unknown_def_type; + bool is_store; + tree mask; + gimple def_stmt; + tree def; + enum vect_def_type dt; + + if (slp_node != NULL) + return false; + + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + gcc_assert (ncopies >= 1); + + is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; + mask = gimple_call_arg (stmt, 2); + if (TYPE_PRECISION (TREE_TYPE (mask)) + != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) + return false; + + /* FORNOW. This restriction should be relaxed. */ + if (nested_in_vect_loop && ncopies > 1) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "multiple types in nested loop."); + return false; + } + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + if (!STMT_VINFO_DATA_REF (stmt_info)) + return false; + + elem_type = TREE_TYPE (vectype); + + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + return false; + + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) + return false; + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + gimple def_stmt; + tree def; + gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base, + &gather_off, &gather_scale); + gcc_assert (gather_decl); + if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL, + &def_stmt, &def, &gather_dt, + &gather_off_vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "gather index use not simple."); + return false; + } + } + else if (tree_int_cst_compare (nested_in_vect_loop + ? STMT_VINFO_DR_STEP (stmt_info) + : DR_STEP (dr), size_zero_node) <= 0) + return false; + else if (optab_handler (is_store ? maskstore_optab : maskload_optab, + TYPE_MODE (vectype)) == CODE_FOR_nothing) + return false; + + if (TREE_CODE (mask) != SSA_NAME) + return false; + + if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + + if (is_store) + { + tree rhs = gimple_call_arg (stmt, 3); + if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + } + + if (!vec_stmt) /* transformation not required. */ + { + STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; + if (is_store) + vect_model_store_cost (stmt_info, ncopies, false, dt, + NULL, NULL, NULL); + else + vect_model_load_cost (stmt_info, ncopies, false, NULL, NULL, NULL); + return true; + } + + /** Transform. **/ + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + tree vec_oprnd0 = NULL_TREE, op; + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl)); + tree rettype, srctype, ptrtype, idxtype, masktype, scaletype; + tree ptr, vec_mask = NULL_TREE, mask_op, var, scale; + tree perm_mask = NULL_TREE, prev_res = NULL_TREE; + edge pe = loop_preheader_edge (loop); + gimple_seq seq; + basic_block new_bb; + enum { NARROW, NONE, WIDEN } modifier; + int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype); + + if (nunits == gather_off_nunits) + modifier = NONE; + else if (nunits == gather_off_nunits / 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); + modifier = WIDEN; + + for (i = 0; i < gather_off_nunits; ++i) + sel[i] = i | nunits; + + perm_mask = vect_gen_perm_mask (gather_off_vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + } + else if (nunits == gather_off_nunits * 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, nunits); + modifier = NARROW; + + for (i = 0; i < nunits; ++i) + sel[i] = i < gather_off_nunits + ? i : i + nunits - gather_off_nunits; + + perm_mask = vect_gen_perm_mask (vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + ncopies *= 2; + } + else + gcc_unreachable (); + + rettype = TREE_TYPE (TREE_TYPE (gather_decl)); + srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + scaletype = TREE_VALUE (arglist); + gcc_checking_assert (types_compatible_p (srctype, rettype) + && types_compatible_p (srctype, masktype)); + + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + + ptr = fold_convert (ptrtype, gather_base); + if (!is_gimple_min_invariant (ptr)) + { + ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); + gcc_assert (!new_bb); + } + + scale = build_int_cst (scaletype, gather_scale); + + prev_stmt_info = NULL; + for (j = 0; j < ncopies; ++j) + { + if (modifier == WIDEN && (j & 1)) + op = permute_vec_elements (vec_oprnd0, vec_oprnd0, + perm_mask, stmt, gsi); + else if (j == 0) + op = vec_oprnd0 + = vect_get_vec_def_for_operand (gather_off, stmt, NULL); + else + op = vec_oprnd0 + = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0); + + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)) + == TYPE_VECTOR_SUBPARTS (idxtype)); + var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + op = var; + } + + if (j == 0) + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + } + + mask_op = vec_mask; + if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)) + == TYPE_VECTOR_SUBPARTS (masktype)); + var = vect_get_new_vect_var (masktype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + mask_op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + mask_op = var; + } + + new_stmt + = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op, + scale); + + if (!useless_type_conversion_p (vectype, rettype)) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (vectype) + == TYPE_VECTOR_SUBPARTS (rettype)); + var = vect_get_new_vect_var (rettype, vect_simple_var, NULL); + op = make_ssa_name (var, new_stmt); + gimple_call_set_lhs (new_stmt, op); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + var = make_ssa_name (vec_dest, NULL); + op = build1 (VIEW_CONVERT_EXPR, vectype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op, + NULL_TREE); + } + else + { + var = make_ssa_name (vec_dest, new_stmt); + gimple_call_set_lhs (new_stmt, var); + } + + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + if (modifier == NARROW) + { + if ((j & 1) == 0) + { + prev_res = var; + continue; + } + var = permute_vec_elements (prev_res, var, + perm_mask, stmt, gsi); + new_stmt = SSA_NAME_DEF_STMT (var); + } + + if (prev_stmt_info == NULL) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + return true; + } + else if (is_store) + { + tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; + prev_stmt_info = NULL; + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + tree rhs = gimple_call_arg (stmt, 3); + vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL); + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + /* We should have catched mismatched types earlier. */ + gcc_assert (useless_type_conversion_p (vectype, + TREE_TYPE (vec_rhs))); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs); + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask, vec_rhs); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + else + { + tree vec_mask = NULL_TREE; + prev_stmt_info = NULL; + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask); + gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL)); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + + return true; +} + + /* Function vectorizable_call. Check if STMT performs a function call that can be vectorized. @@ -1738,6 +2165,12 @@ vectorizable_call (gimple stmt, gimple_s if (!is_gimple_call (stmt)) return false; + if (gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return vectorizable_mask_load_store (stmt, gsi, vec_stmt, + slp_node); + if (gimple_call_lhs (stmt) == NULL_TREE || TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) return false; @@ -4051,10 +4484,6 @@ vectorizable_shift (gimple stmt, gimple_ } -static tree permute_vec_elements (tree, tree, tree, gimple, - gimple_stmt_iterator *); - - /* Function vectorizable_operation. Check if STMT performs a binary, unary or ternary operation that can @@ -6567,6 +6996,10 @@ vect_transform_stmt (gimple stmt, gimple case call_vec_info_type: done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + is_store = true; break; case call_simd_clone_vec_info_type: --- gcc/tree-ssa-phiopt.c.jj 2013-11-22 21:03:14.569852057 +0100 +++ gcc/tree-ssa-phiopt.c 2013-11-28 15:01:39.825688128 +0100 @@ -1706,7 +1706,7 @@ cond_if_else_store_replacement (basic_bl == chrec_dont_know) || !then_datarefs.length () || (find_data_references_in_bb (NULL, else_bb, &else_datarefs) - == chrec_dont_know) + == chrec_dont_know) || !else_datarefs.length ()) { free_data_refs (then_datarefs); @@ -1723,6 +1723,8 @@ cond_if_else_store_replacement (basic_bl then_store = DR_STMT (then_dr); then_lhs = gimple_get_lhs (then_store); + if (then_lhs == NULL_TREE) + continue; found = false; FOR_EACH_VEC_ELT (else_datarefs, j, else_dr) @@ -1732,6 +1734,8 @@ cond_if_else_store_replacement (basic_bl else_store = DR_STMT (else_dr); else_lhs = gimple_get_lhs (else_store); + if (else_lhs == NULL_TREE) + continue; if (operand_equal_p (then_lhs, else_lhs, 0)) {