From patchwork Wed Oct 23 17:22:20 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 285712 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 85E822C00E3 for ; Thu, 24 Oct 2013 04:22:42 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=ZH0r/IRLimFQbVrMU kWhlFaQZoNUUiwe02Ni4A1vplTX92y7rWSHX9VZ1R4zGNK9TRM6+5KbqIw3J1qif G2smvLF0LZb3cf7nSNlzfTBtDcY5hF41ch1kg1dKB5SPkOzfTy6DyZtC+VNHEOGr kV8BpccbsZ00ol1o6HZmScu8QU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=WrtyVqtAHpJvZkzM2oPexyP Ql28=; b=pood0TWvdXBWbLshKqrFxMg16iqAC2gHR/3Ui0HIMmC99g2deSbtDRZ ByI0dGzy8l0IdRqKzx95beSVBZEfnh9huLi1aadDxSQBrJDeN2gAmpsZaTRxQBeI 1ATZ3MFEGmeafK17ZGbkVmgRzEOcRfWTS5sHAdQWmhNzpiwwU6PA= Received: (qmail 12732 invoked by alias); 23 Oct 2013 17:22:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12716 invoked by uid 89); 23 Oct 2013 17:22:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL, BAYES_50, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Oct 2013 17:22:26 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9NHMOBw027208 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 23 Oct 2013 13:22:24 -0400 Received: from tucnak.zalov.cz (vpn1-7-40.ams2.redhat.com [10.36.7.40]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r9NHMLac012683 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 23 Oct 2013 13:22:22 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id r9NHMKLV014932; Wed, 23 Oct 2013 19:22:20 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id r9NHMKw9014931; Wed, 23 Oct 2013 19:22:20 +0200 Date: Wed, 23 Oct 2013 19:22:20 +0200 From: Jakub Jelinek To: Sergey Ostanevich Cc: Richard Biener , "gcc-patches@gcc.gnu.org" Subject: [RFC] By default if-convert only basic blocks that will be vectorized (take 4) Message-ID: <20131023172220.GW30970@tucnak.zalov.cz> Reply-To: Jakub Jelinek References: <20131015123225.GO30970@tucnak.zalov.cz> <20131017165556.GI30970@tucnak.zalov.cz> <20131022105614.GK30970@tucnak.zalov.cz> <20131022132658.GM30970@tucnak.zalov.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes On Tue, Oct 22, 2013 at 08:27:54PM +0400, Sergey Ostanevich wrote: > still fails on 403 et al. Ok, reproduced, unfortunately the pending stmt sequences already pretty much assume that they will end up in a single combined basic block. I went through various alternatives (deferring update_ssa (TODO_update_ssa) call until after combine_blocks - doesn't work, because it is unhappy about basic blocks being removed, temporarily putting all the stmts into latch (doesn't work, because there are no PHIs for it in the loop), so the final fix as discussed with Richard on IRC is not to predicate_bbs early before versioning (unless -ftree-loop-if-convert-stores it is easily achievable by just using a better dominance check for that), or for the stores stuff doing it and freeing again (at least for now). The predicate_bbs stuff would certainly appreciate more TLC in the future. Attaching whole new patchset, the above mentioned fix is mostly in the first patch (which also contains a tree-cfg.h include that is needed for today's header reshufling), the other two patches are just tweaked to apply on top of that. All 3 patches together have been bootstrapped/regtested on x86_64-linux and i686-linux, the first one and first+second just compile time tested. Jakub 2013-10-23 Jakub Jelinek * tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field. (LOOP_VINFO_SCALAR_LOOP): Define. (slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument. * internal-fn.def (LOOP_VECTORIZED): New internal fn code. * tree-if-conv.c (release_bb_predicate): New function. (free_bb_predicate): Use it. (reset_bb_predicate): Likewise. Don't unallocate bb->aux just to immediately allocate it again. (predicate_bbs): Don't return bool, only check if the last stmt of a basic block is GIMPLE_COND and handle that. For basic blocks that dominate loop->latch assume they don't need to be predicated. (if_convertible_loop_p_1): Only call predicate_bbs if flag_tree_loop_if_convert_stores and free_bb_predicate in that case afterwards, check gimple_code of stmts here. Replace is_predicated check with dominance check. (insert_gimplified_predicates): If bb dominates loop->latch, call reset_bb_predicate. (combine_blocks): Call predicate_bbs. (version_loop_for_if_conversion): New function. (tree_if_conversion): Return todo flags instead of bool, call version_loop_for_if_conversion if if-conversion should be just for the vectorized loops and nothing else. (main_tree_if_conversion): Adjust caller. Don't call tree_if_conversion if flag_tree_loop_if_convert isn't 1 and the loop isn't going to be vectorized. (gate_tree_if_conversion): Don't turn on if-conversion just because of flag_tree_loop_if_convert_stores == 1. * internal-fn.c (expand_LOOP_VECTORIZED): New function. * tree-vectorizer.c (vect_loop_vectorized_call): New function. (vectorize_loops): Don't try to vectorize loops with loop->dont_vectorize set. Set LOOP_VINFO_SCALAR_LOOP for if-converted loops, fold LOOP_VECTORIZED internal call depending on if loop has been vectorized or not. * tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges): New function. (slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument. If non-NULL, copy basic blocks from scalar_loop instead of loop, but still to loop's entry or exit edge. (slpeel_tree_peel_loop_to_edge): Add scalar_loop argument, pass it down to slpeel_tree_duplicate_loop_to_edge_cfg. (vect_do_peeling_for_loop_bound, vect_do_peeling_for_loop_alignment): Adjust callers. (vect_loop_versioning): If LOOP_VINFO_SCALAR_LOOP, perform loop versioning from that loop instead of LOOP_VINFO_LOOP, move it to the right place in the CFG afterwards. * cfgloop.h (struct loop): Add dont_vectorize field. * tree-loop-distribution.c (copy_loop_before): Adjust slpeel_tree_duplicate_loop_to_edge_cfg caller. * passes.def: Add a note that pass_vectorize must immediately follow pass_if_conversion. * gcc.dg/vect/bb-slp-cond-1.c: Add dg-additional-options -ftree-loop-if-convert. * gcc.dg/vect/bb-slp-pattern-2.c: Likewise. * gcc.dg/vect/vect-cond-11.c: New testcase. 2013-10-23 Jakub Jelinek * tree-if-conv.c (version_loop_for_if_conversion): Add DO_OUTER argument. Store to what it points whether outer loop should be versioned. If it is NULL, optimize LOOP_VERSION internal call in loop into boolean_true_node and in new_loop update arguments of it to the inner loop copies and set dont_vectorize flag. (tree_if_conversion): Adjust caller. If the flag was set, call version_loop_for_if_conversion once again on the outer loop at the end. * tree-vectorizer.c (vect_loop_select): New function. (vectorize_loops): Use it to attempt to vectorize an if-converted loop before it's non-if-converted counterpart. If outer loop vectorization is successful in that case, ensure the loop in the soon to be dead non-if-converted loop is not vectorized. --- gcc/tree-if-conv.c.jj 2013-10-23 18:38:00.739772777 +0200 +++ gcc/tree-if-conv.c 2013-10-23 18:46:09.334284914 +0200 @@ -1763,14 +1763,30 @@ combine_blocks (struct loop *loop) internal call into either true or false. */ static bool -version_loop_for_if_conversion (struct loop *loop) +version_loop_for_if_conversion (struct loop *loop, bool *do_outer) { + struct loop *outer = loop_outer (loop); basic_block cond_bb; tree cond = make_ssa_name (boolean_type_node, NULL); struct loop *new_loop; gimple g; gimple_stmt_iterator gsi; + if (do_outer) + { + *do_outer = false; + if (loop->inner == NULL + && outer->inner == loop + && loop->next == NULL + && loop_outer (outer) + && outer->num_nodes == 3 + loop->num_nodes + && loop_preheader_edge (loop)->src == outer->header + && single_exit (loop) + && outer->latch + && single_exit (loop)->dest == EDGE_PRED (outer->latch, 0)->src) + *do_outer = true; + } + g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, build_int_cst (integer_type_node, loop->num), integer_zero_node); @@ -1789,6 +1805,58 @@ version_loop_for_if_conversion (struct l gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num)); gsi_insert_before (&gsi, g, GSI_SAME_STMT); update_ssa (TODO_update_ssa); + if (do_outer == NULL) + { + gcc_assert (single_succ_p (loop->header)); + gsi = gsi_last_bb (single_succ (loop->header)); + gimple cond_stmt = gsi_stmt (gsi); + gsi_prev (&gsi); + g = gsi_stmt (gsi); + gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND + && is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g)); + gimple_cond_set_lhs (cond_stmt, boolean_true_node); + update_stmt (cond_stmt); + gcc_assert (has_zero_uses (gimple_call_lhs (g))); + gsi_remove (&gsi, false); + gcc_assert (single_succ_p (new_loop->header)); + gsi = gsi_last_bb (single_succ (new_loop->header)); + cond_stmt = gsi_stmt (gsi); + gsi_prev (&gsi); + g = gsi_stmt (gsi); + gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND + && is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g) + && new_loop->inner + && new_loop->inner->next + && new_loop->inner->next->next == NULL); + struct loop *inner = new_loop->inner; + basic_block empty_bb = loop_preheader_edge (inner)->src; + gcc_assert (empty_block_p (empty_bb) + && single_pred_p (empty_bb) + && single_succ_p (empty_bb) + && single_pred (empty_bb) == single_succ (new_loop->header)); + if (single_pred_edge (empty_bb)->flags & EDGE_TRUE_VALUE) + { + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->num)); + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->next->num)); + inner->next->dont_vectorize = true; + } + else + { + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->next->num)); + gimple_call_set_arg (g, 0, build_int_cst (integer_type_node, + inner->num)); + inner->dont_vectorize = true; + } + } return true; } @@ -1800,6 +1868,7 @@ static unsigned int tree_if_conversion (struct loop *loop) { unsigned int todo = 0; + bool version_outer_loop = false; ifc_bbs = NULL; if (!if_convertible_loop_p (loop) @@ -1808,7 +1877,7 @@ tree_if_conversion (struct loop *loop) if ((flag_tree_loop_vectorize || loop->force_vect) && flag_tree_loop_if_convert == -1 - && !version_loop_for_if_conversion (loop)) + && !version_loop_for_if_conversion (loop, &version_outer_loop)) goto cleanup; /* Now all statements are if-convertible. Combine all the basic @@ -1835,6 +1904,15 @@ tree_if_conversion (struct loop *loop) ifc_bbs = NULL; } + if (todo && version_outer_loop) + { + if (todo & TODO_update_ssa_only_virtuals) + { + update_ssa (TODO_update_ssa_only_virtuals); + todo &= ~TODO_update_ssa_only_virtuals; + } + version_loop_for_if_conversion (loop_outer (loop), NULL); + } return todo; } --- gcc/tree-vectorizer.c.jj 2013-10-23 18:30:29.914021368 +0200 +++ gcc/tree-vectorizer.c 2013-10-23 18:41:49.143612005 +0200 @@ -348,6 +348,31 @@ vect_loop_vectorized_call (struct loop * return NULL; } +/* Helper function of vectorize_loops. If LOOP is non-if-converted + loop that has if-converted counterpart, return the if-converted + counterpart, so that we try vectorizing if-converted loops before + inner loops of non-if-converted loops. */ + +static struct loop * +vect_loop_select (struct loop *loop) +{ + if (!loop->dont_vectorize) + return loop; + + gimple g = vect_loop_vectorized_call (loop); + if (g == NULL) + return loop; + + if (tree_low_cst (gimple_call_arg (g, 1), 0) != loop->num) + return loop; + + struct loop *ifcvt_loop + = get_loop (cfun, tree_low_cst (gimple_call_arg (g, 0), 0)); + if (ifcvt_loop && !ifcvt_loop->dont_vectorize) + return ifcvt_loop; + return loop; +} + /* Function vectorize_loops. @@ -360,7 +385,7 @@ vectorize_loops (void) unsigned int num_vectorized_loops = 0; unsigned int vect_loops_num; loop_iterator li; - struct loop *loop; + struct loop *loop, *iloop; hash_table simduid_to_vf_htab; hash_table simd_array_to_simduid_htab; bool any_ifcvt_loops = false; @@ -386,8 +411,8 @@ vectorize_loops (void) /* If some loop was duplicated, it gets bigger number than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ - FOR_EACH_LOOP (li, loop, 0) - if (loop->dont_vectorize) + FOR_EACH_LOOP (li, iloop, 0) + if ((loop = vect_loop_select (iloop))->dont_vectorize) any_ifcvt_loops = true; else if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop)) @@ -400,6 +425,10 @@ vectorize_loops (void) dump_printf (MSG_NOTE, "\nAnalyzing loop at %s:%d\n", LOC_FILE (vect_location), LOC_LINE (vect_location)); + /* Make sure we don't try to vectorize this loop + more than once. */ + loop->dont_vectorize = true; + loop_vinfo = vect_analyze_loop (loop); loop->aux = loop_vinfo; @@ -416,6 +445,7 @@ vectorize_loops (void) basic_block *bbs; unsigned int i; struct loop *scalar_loop = get_loop (cfun, tree_low_cst (arg, 0)); + struct loop *inner; LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop; gcc_checking_assert (vect_loop_vectorized_call @@ -440,6 +470,11 @@ vectorize_loops (void) } } free (bbs); + /* If we have successfully vectorized an if-converted outer + loop, don't attempt to vectorize the if-converted inner + loop of the alternate loop. */ + for (inner = scalar_loop->inner; inner; inner = inner->next) + inner->dont_vectorize = true; } if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC && dump_enabled_p ()) @@ -482,6 +517,8 @@ vectorize_loops (void) } } } + else + loop->dont_vectorize = true; vect_location = UNKNOWN_LOC; 2013-10-23 Jakub Jelinek * config/i386/sse.md (maskload, maskstore): New expanders. * tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref. (get_references_in_stmt): Don't record operand addresses, but operands themselves. Handle MASK_LOAD and MASK_STORE. (find_data_references_in_stmt, graphite_find_data_references_in_stmt, * internal-fn.def (MASK_LOAD, MASK_STORE): New internal fns. * tree-if-conv.c: Include target.h, expr.h, optabs.h and tree-ssa-address.h. (if_convertible_phi_p, insert_gimplified_predicates): Add any_mask_load_store argument, if true, handle it like flag_tree_loop_if_convert_stores. (ifcvt_can_use_mask_load_store): New function. (if_convertible_gimple_assign_stmt_p): Add any_mask_load_store argument, check if some conditional loads or stores can't be converted into MASK_LOAD or MASK_STORE. (if_convertible_stmt_p): Add any_mask_load_store argument, pass it down to if_convertible_gimple_assign_stmt_p. (if_convertible_loop_p_1): Add any_mask_load_store argument, pass it down to if_convertible_stmt_p and if_convertible_phi_p, call if_convertible_phi_p only after all if_convertible_stmt_p calls. (if_convertible_loop_p): Add any_mask_load_store argument, pass it down to if_convertible_loop_p_1. (predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls. (combine_blocks): Add any_mask_load_store argument, pass it down to insert_gimplified_predicates and call predicate_mem_writes if it is set. (tree_if_conversion): Adjust if_convertible_loop_p and combine_blocks calls, set TODO_update_ssa_only_virtuals in todo also if any_mask_load_store has been set for the loop. * tree-vect-data-refs.c (vect_check_gather): Handle MASK_LOAD/MASK_STORE. (vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise. * gimple.h (gimple_expr_type): Handle MASK_STORE. * internal-fn.c (expand_MASK_LOAD, expand_MASK_STORE): New functions. * tree-vect-loop.c (vect_determine_vectorization_factor): Handle MASK_STORE. * optabs.def (maskload_optab, maskstore_optab): New optabs. * tree-vect-stmts.c (vect_mark_relevant): Don't crash if lhs is NULL. (exist_non_indexing_operands_for_use_p): Handle MASK_LOAD and MASK_STORE. (vectorizable_mask_load_store): New function. (vectorizable_call): Call it for MASK_LOAD or MASK_STORE. (vect_transform_stmt): Handle MASK_STORE. * gcc.target/i386/avx2-gather-5.c: New test. * gcc.target/i386/avx2-gather-6.c: New test. * gcc.dg/vect/vect-mask-loadstore-1.c: New test. * gcc.dg/vect/vect-mask-load-1.c: New test. --- gcc/config/i386/sse.md.jj 2013-10-23 14:43:09.660920594 +0200 +++ gcc/config/i386/sse.md 2013-10-23 18:50:33.292952867 +0200 @@ -12391,6 +12391,23 @@ (define_insn "_maskstore")]) +(define_expand "maskload" + [(set (match_operand:V48_AVX2 0 "register_operand") + (unspec:V48_AVX2 + [(match_operand: 2 "register_operand") + (match_operand:V48_AVX2 1 "memory_operand")] + UNSPEC_MASKMOV))] + "TARGET_AVX") + +(define_expand "maskstore" + [(set (match_operand:V48_AVX2 0 "memory_operand") + (unspec:V48_AVX2 + [(match_operand: 2 "register_operand") + (match_operand:V48_AVX2 1 "register_operand") + (match_dup 0)] + UNSPEC_MASKMOV))] + "TARGET_AVX") + (define_insn_and_split "avx__" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P --- gcc/internal-fn.c.jj 2013-10-23 18:29:24.189348915 +0200 +++ gcc/internal-fn.c 2013-10-23 18:50:33.293952865 +0200 @@ -141,6 +141,52 @@ expand_LOOP_VECTORIZED (gimple stmt ATTR gcc_unreachable (); } +static void +expand_MASK_LOAD (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, target, mask; + + maskt = gimple_call_arg (stmt, 2); + lhs = gimple_call_lhs (stmt); + type = TREE_TYPE (lhs); + rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + create_output_operand (&ops[0], target, TYPE_MODE (type)); + create_fixed_operand (&ops[1], mem); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); +} + +static void +expand_MASK_STORE (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, reg, mask; + + maskt = gimple_call_arg (stmt, 2); + rhs = gimple_call_arg (stmt, 3); + type = TREE_TYPE (rhs); + lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + reg = expand_normal (rhs); + create_fixed_operand (&ops[0], mem); + create_input_operand (&ops[1], reg, TYPE_MODE (type)); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: --- gcc/tree-data-ref.c.jj 2013-10-23 14:43:15.985887980 +0200 +++ gcc/tree-data-ref.c 2013-10-23 18:50:33.294952863 +0200 @@ -4312,8 +4312,8 @@ compute_all_dependences (vecloop_father; - tree uid = gimple_call_arg (stmt, 0); - gcc_assert (TREE_CODE (uid) == SSA_NAME); - if (loop == NULL - || loop->simduid != SSA_NAME_VAR (uid)) + if (gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_GOMP_SIMD_LANE: + { + struct loop *loop = gimple_bb (stmt)->loop_father; + tree uid = gimple_call_arg (stmt, 0); + gcc_assert (TREE_CODE (uid) == SSA_NAME); + if (loop == NULL + || loop->simduid != SSA_NAME_VAR (uid)) + clobbers_memory = true; + break; + } + case IFN_MASK_LOAD: + case IFN_MASK_STORE: + break; + default: clobbers_memory = true; - } + break; + } else clobbers_memory = true; } @@ -4361,15 +4371,15 @@ get_references_in_stmt (gimple stmt, vec if (stmt_code == GIMPLE_ASSIGN) { tree base; - op0 = gimple_assign_lhs_ptr (stmt); - op1 = gimple_assign_rhs1_ptr (stmt); + op0 = gimple_assign_lhs (stmt); + op1 = gimple_assign_rhs1 (stmt); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) - && (base = get_base_address (*op1)) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) + && (base = get_base_address (op1)) && TREE_CODE (base) != SSA_NAME)) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4378,16 +4388,35 @@ get_references_in_stmt (gimple stmt, vec { unsigned i, n; - op0 = gimple_call_lhs_ptr (stmt); + ref.is_read = false; + if (gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_LOAD: + ref.is_read = true; + case IFN_MASK_STORE: + ref.ref = build2 (MEM_REF, + ref.is_read + ? TREE_TYPE (gimple_call_lhs (stmt)) + : TREE_TYPE (gimple_call_arg (stmt, 3)), + gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + references->safe_push (ref); + return false; + default: + break; + } + + op0 = gimple_call_lhs (stmt); n = gimple_call_num_args (stmt); for (i = 0; i < n; i++) { - op1 = gimple_call_arg_ptr (stmt, i); + op1 = gimple_call_arg (stmt, i); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1))) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) && get_base_address (op1))) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4396,11 +4425,11 @@ get_references_in_stmt (gimple stmt, vec else return clobbers_memory; - if (*op0 - && (DECL_P (*op0) - || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0)))) + if (op0 + && (DECL_P (op0) + || (REFERENCE_CLASS_P (op0) && get_base_address (op0)))) { - ref.pos = op0; + ref.ref = op0; ref.is_read = false; references->safe_push (ref); } @@ -4431,7 +4460,7 @@ find_data_references_in_stmt (struct loo FOR_EACH_VEC_ELT (references, i, ref) { dr = create_data_ref (nest, loop_containing_stmt (stmt), - *ref->pos, stmt, ref->is_read); + ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } @@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l FOR_EACH_VEC_ELT (references, i, ref) { - dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read); + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } --- gcc/tree-if-conv.c.jj 2013-10-23 18:46:09.334284914 +0200 +++ gcc/tree-if-conv.c 2013-10-23 19:02:30.235317171 +0200 @@ -100,8 +100,12 @@ along with GCC; see the file COPYING3. #include "tree-chrec.h" #include "tree-data-ref.h" #include "tree-scalar-evolution.h" +#include "tree-ssa-address.h" #include "tree-pass.h" #include "dbgcnt.h" +#include "target.h" +#include "expr.h" +#include "optabs.h" /* List of basic blocks in if-conversion-suitable order. */ static basic_block *ifc_bbs; @@ -463,7 +467,8 @@ bb_with_exit_edge_p (struct loop *loop, - there is a virtual PHI in a BB other than the loop->header. */ static bool -if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi) +if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi, + bool any_mask_load_store) { if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -478,7 +483,7 @@ if_convertible_phi_p (struct loop *loop, return false; } - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) return true; /* When the flag_tree_loop_if_convert_stores is not set, check @@ -694,6 +699,78 @@ ifcvt_could_trap_p (gimple stmt, vecloop_father->force_vect) + || bb->loop_father->dont_vectorize + || !gimple_assign_single_p (stmt) + || gimple_has_volatile_ops (stmt)) + return false; + + /* Check whether this is a load or store. */ + lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) != SSA_NAME) + { + if (!is_gimple_val (gimple_assign_rhs1 (stmt))) + return false; + op = maskstore_optab; + ref = lhs; + } + else if (gimple_assign_load_p (stmt)) + { + op = maskload_optab; + ref = gimple_assign_rhs1 (stmt); + } + else + return false; + + /* And whether REF isn't a MEM_REF with non-addressable decl. */ + if (TREE_CODE (ref) == MEM_REF + && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR + && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)) + && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))) + return false; + + /* Mask should be integer mode of the same size as the load/store + mode. */ + mode = TYPE_MODE (TREE_TYPE (lhs)); + if (int_mode_for_mode (mode) == BLKmode) + return false; + + /* See if there is any chance the mask load or store might be + vectorized. If not, punt. */ + vmode = targetm.vectorize.preferred_simd_mode (mode); + if (!VECTOR_MODE_P (vmode)) + return false; + + if (optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + + vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); + while (vector_sizes != 0) + { + unsigned int cur = 1 << floor_log2 (vector_sizes); + vector_sizes &= ~cur; + if (cur <= GET_MODE_SIZE (mode)) + continue; + vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); + if (VECTOR_MODE_P (vmode) + && optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + } + return false; +} + /* Return true when STMT is if-convertible. GIMPLE_ASSIGN statement is not if-convertible if, @@ -703,7 +780,8 @@ ifcvt_could_trap_p (gimple stmt, vec refs) + vec refs, + bool *any_mask_load_store) { tree lhs = gimple_assign_lhs (stmt); basic_block bb; @@ -729,10 +807,21 @@ if_convertible_gimple_assign_stmt_p (gim return false; } + /* tree-into-ssa.c uses GF_PLF_1, so avoid it, because + in between if_convertible_loop_p and combine_blocks + we can perform loop versioning. */ + gimple_set_plf (stmt, GF_PLF_2, false); + if (flag_tree_loop_if_convert_stores) { if (ifcvt_could_trap_p (stmt, refs)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -742,6 +831,12 @@ if_convertible_gimple_assign_stmt_p (gim if (gimple_assign_rhs_could_trap_p (stmt)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -753,6 +848,12 @@ if_convertible_gimple_assign_stmt_p (gim && bb != bb->loop_father->header && !bb_with_exit_edge_p (bb->loop_father, bb)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_2, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "LHS is not var\n"); @@ -771,7 +872,8 @@ if_convertible_gimple_assign_stmt_p (gim - it is a GIMPLE_LABEL or a GIMPLE_COND. */ static bool -if_convertible_stmt_p (gimple stmt, vec refs) +if_convertible_stmt_p (gimple stmt, vec refs, + bool *any_mask_load_store) { switch (gimple_code (stmt)) { @@ -781,7 +883,8 @@ if_convertible_stmt_p (gimple stmt, vec< return true; case GIMPLE_ASSIGN: - return if_convertible_gimple_assign_stmt_p (stmt, refs); + return if_convertible_gimple_assign_stmt_p (stmt, refs, + any_mask_load_store); case GIMPLE_CALL: { @@ -1069,7 +1172,7 @@ static bool if_convertible_loop_p_1 (struct loop *loop, vec *loop_nest, vec *refs, - vec *ddrs) + vec *ddrs, bool *any_mask_load_store) { bool res; unsigned int i; @@ -1140,14 +1243,11 @@ if_convertible_loop_p_1 (struct loop *lo basic_block bb = ifc_bbs[i]; gimple_stmt_iterator itr; - for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr))) - return false; - /* Check the if-convertibility of statements in predicated BBs. */ if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs)) + if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, + any_mask_load_store)) return false; } @@ -1155,6 +1255,19 @@ if_convertible_loop_p_1 (struct loop *lo for (i = 0; i < loop->num_nodes; i++) free_bb_predicate (ifc_bbs[i]); + /* Checking PHIs needs to be done after stmts, as the fact whether there + are any masked loads or stores affects the tests. */ + for (i = 0; i < loop->num_nodes; i++) + { + basic_block bb = ifc_bbs[i]; + gimple_stmt_iterator itr; + + for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) + if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr), + *any_mask_load_store)) + return false; + } + if (dump_file) fprintf (dump_file, "Applying if-conversion\n"); @@ -1170,7 +1283,7 @@ if_convertible_loop_p_1 (struct loop *lo - if its basic blocks and phi nodes are if convertible. */ static bool -if_convertible_loop_p (struct loop *loop) +if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) { edge e; edge_iterator ei; @@ -1212,7 +1325,8 @@ if_convertible_loop_p (struct loop *loop refs.create (5); ddrs.create (25); loop_nest.create (3); - res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs); + res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs, + any_mask_load_store); if (flag_tree_loop_if_convert_stores) { @@ -1400,7 +1514,7 @@ predicate_all_scalar_phis (struct loop * gimplification of the predicates. */ static void -insert_gimplified_predicates (loop_p loop) +insert_gimplified_predicates (loop_p loop, bool any_mask_load_store) { unsigned int i; @@ -1422,7 +1536,8 @@ insert_gimplified_predicates (loop_p loo stmts = bb_predicate_gimplified_stmts (bb); if (stmts) { - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores + || any_mask_load_store) { /* Insert the predicate of the BB just after the label, as the if-conversion of memory writes will use this @@ -1581,9 +1696,49 @@ predicate_mem_writes (loop_p loop) } for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - if ((stmt = gsi_stmt (gsi)) - && gimple_assign_single_p (stmt) - && gimple_vdef (stmt)) + if ((stmt = gsi_stmt (gsi)) == NULL + || !gimple_assign_single_p (stmt)) + continue; + else if (gimple_plf (stmt, GF_PLF_2)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask; + gimple new_stmt; + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); + + masktype = build_nonstandard_integer_type (bitsize, 1); + mask_op0 = build_int_cst (masktype, swap ? 0 : -1); + mask_op1 = build_int_cst (masktype, swap ? -1 : 0); + ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; + addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), + true, NULL_TREE, true, + GSI_SAME_STMT); + cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), + is_gimple_condexpr, NULL_TREE, + true, GSI_SAME_STMT); + mask = fold_build_cond_expr (masktype, unshare_expr (cond), + mask_op0, mask_op1); + mask = ifc_temp_var (masktype, mask, &gsi); + ptr = build_int_cst (reference_alias_ptr_type (ref), 0); + /* Copy points-to info if possible. */ + if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), + ref); + if (TREE_CODE (lhs) == SSA_NAME) + { + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, + ptr, mask); + gimple_call_set_lhs (new_stmt, lhs); + } + else + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, + mask, rhs); + gsi_replace (&gsi, new_stmt, false); + } + else if (gimple_vdef (stmt)) { tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); @@ -1653,7 +1808,7 @@ remove_conditions_and_labels (loop_p loo blocks. Replace PHI nodes with conditional modify expressions. */ static void -combine_blocks (struct loop *loop) +combine_blocks (struct loop *loop, bool any_mask_load_store) { basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; @@ -1663,10 +1818,10 @@ combine_blocks (struct loop *loop) predicate_bbs (loop); remove_conditions_and_labels (loop); - insert_gimplified_predicates (loop); + insert_gimplified_predicates (loop, any_mask_load_store); predicate_all_scalar_phis (loop); - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) predicate_mem_writes (loop); /* Merge basic blocks: first remove all the edges in the loop, @@ -1870,23 +2025,29 @@ tree_if_conversion (struct loop *loop) unsigned int todo = 0; bool version_outer_loop = false; ifc_bbs = NULL; + bool any_mask_load_store = false; - if (!if_convertible_loop_p (loop) + if (!if_convertible_loop_p (loop, &any_mask_load_store) || !dbg_cnt (if_conversion_tree)) goto cleanup; + if (any_mask_load_store + && ((!flag_tree_loop_vectorize && !loop->force_vect) + || loop->dont_vectorize)) + goto cleanup; + if ((flag_tree_loop_vectorize || loop->force_vect) - && flag_tree_loop_if_convert == -1 + && (flag_tree_loop_if_convert == -1 || any_mask_load_store) && !version_loop_for_if_conversion (loop, &version_outer_loop)) goto cleanup; /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ - combine_blocks (loop); + combine_blocks (loop, any_mask_load_store); todo |= TODO_cleanup_cfg; - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) { mark_virtual_operands_for_renaming (cfun); todo |= TODO_update_ssa_only_virtuals; --- gcc/optabs.def.jj 2013-10-23 14:43:09.908919315 +0200 +++ gcc/optabs.def 2013-10-23 18:50:33.296952852 +0200 @@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") +OPTAB_D (maskload_optab, "maskload$a") +OPTAB_D (maskstore_optab, "maskstore$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") --- gcc/tree-vect-data-refs.c.jj 2013-10-23 14:43:15.986887975 +0200 +++ gcc/tree-vect-data-refs.c 2013-10-23 18:50:33.297952847 +0200 @@ -2747,6 +2747,24 @@ vect_check_gather (gimple stmt, loop_vec enum machine_mode pmode; int punsignedp, pvolatilep; + base = DR_REF (dr); + /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, + see if we can use the def stmt of the address. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + && TREE_CODE (base) == MEM_REF + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME + && integer_zerop (TREE_OPERAND (base, 1)) + && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0))) + { + gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0)); + if (is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR) + base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0); + } + /* The gather builtins need address of the form loop_invariant + vector * {1, 2, 4, 8} or @@ -2759,7 +2777,7 @@ vect_check_gather (gimple stmt, loop_vec vectorized. The following code attempts to find such a preexistng SSA_NAME OFF and put the loop invariants into a tree BASE that can be gimplified before the loop. */ - base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off, + base = get_inner_reference (base, &pbitsize, &pbitpos, &off, &pmode, &punsignedp, &pvolatilep, false); gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0); @@ -3205,7 +3223,10 @@ again: offset = unshare_expr (DR_OFFSET (dr)); init = unshare_expr (DR_INIT (dr)); - if (is_gimple_call (stmt)) + if (is_gimple_call (stmt) + && (!gimple_call_internal_p (stmt) + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) { if (dump_enabled_p ()) { @@ -4856,6 +4877,14 @@ vect_supportable_dr_alignment (struct da if (aligned_access_p (dr) && !check_aligned_accesses) return dr_aligned; + /* For now assume all conditional loads/stores support unaligned + access without any special code. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return dr_unaligned_supported; + if (loop_vinfo) { vect_loop = LOOP_VINFO_LOOP (loop_vinfo); --- gcc/tree-vect-loop.c.jj 2013-10-23 14:43:15.984887985 +0200 +++ gcc/tree-vect-loop.c 2013-10-23 18:50:33.299952836 +0200 @@ -364,7 +364,11 @@ vect_determine_vectorization_factor (loo analyze_pattern_stmt = false; } - if (gimple_get_lhs (stmt) == NULL_TREE) + if (gimple_get_lhs (stmt) == NULL_TREE + /* MASK_STORE has no lhs, but is ok. */ + && (!is_gimple_call (stmt) + || !gimple_call_internal_p (stmt) + || gimple_call_internal_fn (stmt) != IFN_MASK_STORE)) { if (dump_enabled_p ()) { @@ -403,7 +407,12 @@ vect_determine_vectorization_factor (loo else { gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); - scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, --- gcc/testsuite/gcc.target/i386/avx2-gather-5.c.jj 2013-10-23 18:50:33.299952836 +0200 +++ gcc/testsuite/gcc.target/i386/avx2-gather-5.c 2013-10-23 18:50:33.299952836 +0200 @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx2 } */ +/* { dg-options "-O3 -mavx2 -fno-common" } */ + +#include "avx2-check.h" + +#define N 1024 +float vf1[N+16], vf2[N], vf3[N]; +int k[N]; + +__attribute__((noinline, noclone)) void +foo (void) +{ + int i; + for (i = 0; i < N; i++) + { + float f; + if (vf3[i] < 0.0f) + f = vf1[k[i]]; + else + f = 7.0f; + vf2[i] = f; + } +} + +static void +avx2_test (void) +{ + int i; + for (i = 0; i < N + 16; i++) + { + vf1[i] = 5.5f * i; + if (i >= N) + continue; + vf2[i] = 2.0f; + vf3[i] = (i & 1) ? i : -i - 1; + k[i] = (i & 1) ? ((i & 2) ? -i : N / 2 + i) : (i * 7) % N; + asm (""); + } + foo (); + for (i = 0; i < N; i++) + if (vf1[i] != 5.5 * i + || vf2[i] != ((i & 1) ? 7.0f : 5.5f * ((i * 7) % N)) + || vf3[i] != ((i & 1) ? i : -i - 1) + || k[i] != ((i & 1) ? ((i & 2) ? -i : N / 2 + i) : ((i * 7) % N))) + abort (); +} --- gcc/testsuite/gcc.target/i386/avx2-gather-6.c.jj 2013-10-23 18:50:33.299952836 +0200 +++ gcc/testsuite/gcc.target/i386/avx2-gather-6.c 2013-10-23 18:50:33.299952836 +0200 @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details" } */ + +#include "avx2-gather-5.c" + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops in function" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c.jj 2013-10-23 18:50:33.300952831 +0200 +++ gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c 2013-10-23 18:50:33.300952831 +0200 @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-additional-options "-Ofast -fno-common" } */ +/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */ + +#include +#include "tree-vect.h" + +__attribute__((noinline, noclone)) void +foo (float *__restrict x, float *__restrict y, float *__restrict z) +{ + float *__restrict p = __builtin_assume_aligned (x, 32); + float *__restrict q = __builtin_assume_aligned (y, 32); + float *__restrict r = __builtin_assume_aligned (z, 32); + int i; + for (i = 0; i < 1024; i++) + { + if (p[i] < 0.0f) + q[i] = p[i] + 2.0f; + else + p[i] = r[i] + 3.0f; + } +} + +float a[1024] __attribute__((aligned (32))); +float b[1024] __attribute__((aligned (32))); +float c[1024] __attribute__((aligned (32))); + +int +main () +{ + int i; + check_vect (); + for (i = 0; i < 1024; i++) + { + a[i] = (i & 1) ? -i : i; + b[i] = 7 * i; + c[i] = a[i] - 3.0f; + asm (""); + } + foo (a, b, c); + for (i = 0; i < 1024; i++) + if (a[i] != ((i & 1) ? -i : i) + || b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i) + || c[i] != a[i] - 3.0f) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c.jj 2013-10-23 18:50:33.300952831 +0200 +++ gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c 2013-10-23 18:50:33.300952831 +0200 @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-additional-options "-Ofast -fno-common" } */ +/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */ + +#include +#include "tree-vect.h" + +__attribute__((noinline, noclone)) void +foo (double *x, double *y) +{ + double *p = __builtin_assume_aligned (x, 16); + double *q = __builtin_assume_aligned (y, 16); + double z, h; + int i; + for (i = 0; i < 1024; i++) + { + if (p[i] < 0.0) + z = q[i], h = q[i] * 7.0 + 3.0; + else + z = p[i] + 6.0, h = p[1024 + i]; + p[i] = z + 2.0 * h; + } +} + +double a[2048] __attribute__((aligned (16))); +double b[1024] __attribute__((aligned (16))); + +int +main () +{ + int i; + check_vect (); + for (i = 0; i < 1024; i++) + { + a[i] = (i & 1) ? -i : 2 * i; + a[i + 1024] = i; + b[i] = 7 * i; + asm (""); + } + foo (a, b); + for (i = 0; i < 1024; i++) + if (a[i] != ((i & 1) + ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0) + : 2 * i + 6.0 + 2.0 * i) + || b[i] != 7 * i + || a[i + 1024] != i) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/gimple.h.jj 2013-10-23 14:43:09.916919274 +0200 +++ gcc/gimple.h 2013-10-23 18:50:33.301952826 +0200 @@ -5331,7 +5331,13 @@ gimple_expr_type (const_gimple stmt) useless conversion involved. That means returning the original RHS type as far as we can reconstruct it. */ if (code == GIMPLE_CALL) - type = gimple_call_return_type (stmt); + { + if (gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + type = gimple_call_return_type (stmt); + } else switch (gimple_assign_rhs_code (stmt)) { --- gcc/tree-vect-stmts.c.jj 2013-10-23 14:43:16.036887717 +0200 +++ gcc/tree-vect-stmts.c 2013-10-23 18:50:33.303952817 +0200 @@ -223,7 +223,7 @@ vect_mark_relevant (vec *worklis /* This use is out of pattern use, if LHS has other uses that are pattern uses, we should mark the stmt itself, and not the pattern stmt. */ - if (TREE_CODE (lhs) == SSA_NAME) + if (lhs && TREE_CODE (lhs) == SSA_NAME) FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) { if (is_gimple_debug (USE_STMT (use_p))) @@ -381,7 +381,27 @@ exist_non_indexing_operands_for_use_p (t first case, and whether var corresponds to USE. */ if (!gimple_assign_copy_p (stmt)) - return false; + { + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_STORE: + operand = gimple_call_arg (stmt, 3); + if (operand == use) + return true; + /* FALLTHRU */ + case IFN_MASK_LOAD: + operand = gimple_call_arg (stmt, 2); + if (operand == use) + return true; + break; + default: + break; + } + return false; + } + if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) return false; operand = gimple_assign_rhs1 (stmt); @@ -1709,6 +1729,401 @@ vectorizable_function (gimple call, tree vectype_in); } + +static tree permute_vec_elements (tree, tree, tree, gimple, + gimple_stmt_iterator *); + + +static bool +vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, slp_tree slp_node) +{ + tree vec_dest = NULL; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + stmt_vec_info prev_stmt_info; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree elem_type; + gimple new_stmt; + tree dummy; + tree dataref_ptr = NULL_TREE; + gimple ptr_incr; + int nunits = TYPE_VECTOR_SUBPARTS (vectype); + int ncopies; + int i, j; + bool inv_p; + tree gather_base = NULL_TREE, gather_off = NULL_TREE; + tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE; + int gather_scale = 1; + enum vect_def_type gather_dt = vect_unknown_def_type; + bool is_store; + tree mask; + gimple def_stmt; + tree def; + enum vect_def_type dt; + + if (slp_node != NULL) + return false; + + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + gcc_assert (ncopies >= 1); + + is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; + mask = gimple_call_arg (stmt, 2); + if (TYPE_PRECISION (TREE_TYPE (mask)) + != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) + return false; + + /* FORNOW. This restriction should be relaxed. */ + if (nested_in_vect_loop && ncopies > 1) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "multiple types in nested loop."); + return false; + } + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + if (!STMT_VINFO_DATA_REF (stmt_info)) + return false; + + elem_type = TREE_TYPE (vectype); + + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + return false; + + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) + return false; + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + gimple def_stmt; + tree def; + gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base, + &gather_off, &gather_scale); + gcc_assert (gather_decl); + if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL, + &def_stmt, &def, &gather_dt, + &gather_off_vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "gather index use not simple."); + return false; + } + } + else if (tree_int_cst_compare (nested_in_vect_loop + ? STMT_VINFO_DR_STEP (stmt_info) + : DR_STEP (dr), size_zero_node) < 0) + return false; + else if (optab_handler (is_store ? maskstore_optab : maskload_optab, + TYPE_MODE (vectype)) == CODE_FOR_nothing) + return false; + + if (TREE_CODE (mask) != SSA_NAME) + return false; + + if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + + if (is_store) + { + tree rhs = gimple_call_arg (stmt, 3); + if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + } + + if (!vec_stmt) /* transformation not required. */ + { + STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; + return true; + } + + /** Transform. **/ + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + tree vec_oprnd0 = NULL_TREE, op; + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl)); + tree rettype, srctype, ptrtype, idxtype, masktype, scaletype; + tree ptr, vec_mask = NULL_TREE, mask_op, var, scale; + tree perm_mask = NULL_TREE, prev_res = NULL_TREE; + edge pe = loop_preheader_edge (loop); + gimple_seq seq; + basic_block new_bb; + enum { NARROW, NONE, WIDEN } modifier; + int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype); + + if (nunits == gather_off_nunits) + modifier = NONE; + else if (nunits == gather_off_nunits / 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); + modifier = WIDEN; + + for (i = 0; i < gather_off_nunits; ++i) + sel[i] = i | nunits; + + perm_mask = vect_gen_perm_mask (gather_off_vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + } + else if (nunits == gather_off_nunits * 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, nunits); + modifier = NARROW; + + for (i = 0; i < nunits; ++i) + sel[i] = i < gather_off_nunits + ? i : i + nunits - gather_off_nunits; + + perm_mask = vect_gen_perm_mask (vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + ncopies *= 2; + } + else + gcc_unreachable (); + + rettype = TREE_TYPE (TREE_TYPE (gather_decl)); + srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + scaletype = TREE_VALUE (arglist); + gcc_checking_assert (types_compatible_p (srctype, rettype) + && types_compatible_p (srctype, masktype)); + + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + + ptr = fold_convert (ptrtype, gather_base); + if (!is_gimple_min_invariant (ptr)) + { + ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); + gcc_assert (!new_bb); + } + + scale = build_int_cst (scaletype, gather_scale); + + prev_stmt_info = NULL; + for (j = 0; j < ncopies; ++j) + { + if (modifier == WIDEN && (j & 1)) + op = permute_vec_elements (vec_oprnd0, vec_oprnd0, + perm_mask, stmt, gsi); + else if (j == 0) + op = vec_oprnd0 + = vect_get_vec_def_for_operand (gather_off, stmt, NULL); + else + op = vec_oprnd0 + = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0); + + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)) + == TYPE_VECTOR_SUBPARTS (idxtype)); + var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + op = var; + } + + if (j == 0) + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + } + + mask_op = vec_mask; + if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)) + == TYPE_VECTOR_SUBPARTS (masktype)); + var = vect_get_new_vect_var (masktype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + mask_op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + mask_op = var; + } + + new_stmt + = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op, + scale); + + if (!useless_type_conversion_p (vectype, rettype)) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (vectype) + == TYPE_VECTOR_SUBPARTS (rettype)); + var = vect_get_new_vect_var (rettype, vect_simple_var, NULL); + op = make_ssa_name (var, new_stmt); + gimple_call_set_lhs (new_stmt, op); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + var = make_ssa_name (vec_dest, NULL); + op = build1 (VIEW_CONVERT_EXPR, vectype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op, + NULL_TREE); + } + else + { + var = make_ssa_name (vec_dest, new_stmt); + gimple_call_set_lhs (new_stmt, var); + } + + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + if (modifier == NARROW) + { + if ((j & 1) == 0) + { + prev_res = var; + continue; + } + var = permute_vec_elements (prev_res, var, + perm_mask, stmt, gsi); + new_stmt = SSA_NAME_DEF_STMT (var); + } + + if (prev_stmt_info == NULL) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + return true; + } + else if (is_store) + { + tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; + prev_stmt_info = NULL; + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + tree rhs = gimple_call_arg (stmt, 3); + vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL); + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + /* We should have catched mismatched types earlier. */ + gcc_assert (useless_type_conversion_p (vectype, + TREE_TYPE (vec_rhs))); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs); + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask, vec_rhs); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + else + { + tree vec_mask = NULL_TREE; + prev_stmt_info = NULL; + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask); + gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL)); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + + return true; +} + + /* Function vectorizable_call. Check if STMT performs a function call that can be vectorized. @@ -1751,10 +2166,16 @@ vectorizable_call (gimple stmt, gimple_s if (!is_gimple_call (stmt)) return false; - if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) + if (stmt_can_throw_internal (stmt)) return false; - if (stmt_can_throw_internal (stmt)) + if (gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return vectorizable_mask_load_store (stmt, gsi, vec_stmt, + slp_node); + + if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) return false; vectype_out = STMT_VINFO_VECTYPE (stmt_info); @@ -3474,10 +3895,6 @@ vectorizable_shift (gimple stmt, gimple_ } -static tree permute_vec_elements (tree, tree, tree, gimple, - gimple_stmt_iterator *); - - /* Function vectorizable_operation. Check if STMT performs a binary, unary or ternary operation that can @@ -5988,6 +6405,10 @@ vect_transform_stmt (gimple stmt, gimple case call_vec_info_type: done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + is_store = true; break; case reduc_vec_info_type: --- gcc/internal-fn.def.jj 2013-10-23 18:29:24.188348927 +0200 +++ gcc/internal-fn.def 2013-10-23 18:50:33.304952811 +0200 @@ -44,3 +44,5 @@ DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOV DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF) +DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF) --- gcc/tree-vectorizer.h.jj 2013-10-23 14:43:09.667920558 +0200 +++ gcc/tree-vectorizer.h 2013-10-23 18:29:24.187348942 +0200 @@ -314,6 +314,10 @@ typedef struct _loop_vec_info { fix it up. */ bool operands_swapped; + /* If if-conversion versioned this loop before conversion, this is the + loop version without if-conversion. */ + struct loop *scalar_loop; + } *loop_vec_info; /* Access Functions. */ @@ -345,6 +349,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_TARGET_COST_DATA(L) (L)->target_cost_data #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_OPERANDS_SWAPPED(L) (L)->operands_swapped +#define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ (L)->may_misalign_stmts.length () > 0 @@ -899,7 +904,8 @@ extern LOC vect_location; in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); -struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, edge); +struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, + struct loop *, edge); extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *, unsigned int, bool); --- gcc/internal-fn.def.jj 2013-10-23 14:43:09.560921110 +0200 +++ gcc/internal-fn.def 2013-10-23 18:29:24.188348927 +0200 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) --- gcc/tree-if-conv.c.jj 2013-10-23 14:43:15.981888001 +0200 +++ gcc/tree-if-conv.c 2013-10-23 18:38:00.739772777 +0200 @@ -184,39 +184,48 @@ init_bb_predicate (basic_block bb) set_bb_predicate (bb, boolean_true_node); } -/* Free the predicate of basic block BB. */ +/* Release the SSA_NAMEs associated with the predicate of basic block BB, + but don't actually free it. */ static inline void -free_bb_predicate (basic_block bb) +release_bb_predicate (basic_block bb) { - gimple_seq stmts; - - if (!bb_has_predicate (bb)) - return; - - /* Release the SSA_NAMEs created for the gimplification of the - predicate. */ - stmts = bb_predicate_gimplified_stmts (bb); + gimple_seq stmts = bb_predicate_gimplified_stmts (bb); if (stmts) { gimple_stmt_iterator i; for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i)) free_stmt_operands (gsi_stmt (i)); + set_bb_predicate_gimplified_stmts (bb, NULL); } +} +/* Free the predicate of basic block BB. */ + +static inline void +free_bb_predicate (basic_block bb) +{ + if (!bb_has_predicate (bb)) + return; + + release_bb_predicate (bb); free (bb->aux); bb->aux = NULL; } -/* Free the predicate of BB and reinitialize it with the true - predicate. */ +/* Reinitialize predicate of BB with the true predicate. */ static inline void reset_bb_predicate (basic_block bb) { - free_bb_predicate (bb); - init_bb_predicate (bb); + if (!bb_has_predicate (bb)) + init_bb_predicate (bb); + else + { + release_bb_predicate (bb); + set_bb_predicate (bb, boolean_true_node); + } } /* Returns a new SSA_NAME of type TYPE that is assigned the value of @@ -974,7 +983,7 @@ get_loop_body_in_if_conv_order (const st S1 will be predicated with "x", and S2 will be predicated with "!x". */ -static bool +static void predicate_bbs (loop_p loop) { unsigned int i; @@ -986,7 +995,7 @@ predicate_bbs (loop_p loop) { basic_block bb = ifc_bbs[i]; tree cond; - gimple_stmt_iterator itr; + gimple stmt; /* The loop latch is always executed and has no extra conditions to be processed: skip it. */ @@ -996,53 +1005,38 @@ predicate_bbs (loop_p loop) continue; } + /* If dominance tells us this basic block is always executed, force + the condition to be true, this might help simplify other + conditions. */ + if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) + reset_bb_predicate (bb); cond = bb_predicate (bb); - - for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) + stmt = last_stmt (bb); + if (stmt && gimple_code (stmt) == GIMPLE_COND) { - gimple stmt = gsi_stmt (itr); - - switch (gimple_code (stmt)) - { - case GIMPLE_LABEL: - case GIMPLE_ASSIGN: - case GIMPLE_CALL: - case GIMPLE_DEBUG: - break; - - case GIMPLE_COND: - { - tree c2; - edge true_edge, false_edge; - location_t loc = gimple_location (stmt); - tree c = fold_build2_loc (loc, gimple_cond_code (stmt), - boolean_type_node, - gimple_cond_lhs (stmt), - gimple_cond_rhs (stmt)); - - /* Add new condition into destination's predicate list. */ - extract_true_false_edges_from_block (gimple_bb (stmt), - &true_edge, &false_edge); - - /* If C is true, then TRUE_EDGE is taken. */ - add_to_dst_predicate_list (loop, true_edge, - unshare_expr (cond), - unshare_expr (c)); - - /* If C is false, then FALSE_EDGE is taken. */ - c2 = build1_loc (loc, TRUTH_NOT_EXPR, - boolean_type_node, unshare_expr (c)); - add_to_dst_predicate_list (loop, false_edge, - unshare_expr (cond), c2); + tree c2; + edge true_edge, false_edge; + location_t loc = gimple_location (stmt); + tree c = fold_build2_loc (loc, gimple_cond_code (stmt), + boolean_type_node, + gimple_cond_lhs (stmt), + gimple_cond_rhs (stmt)); + + /* Add new condition into destination's predicate list. */ + extract_true_false_edges_from_block (gimple_bb (stmt), + &true_edge, &false_edge); + + /* If C is true, then TRUE_EDGE is taken. */ + add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond), + unshare_expr (c)); + + /* If C is false, then FALSE_EDGE is taken. */ + c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node, + unshare_expr (c)); + add_to_dst_predicate_list (loop, false_edge, + unshare_expr (cond), c2); - cond = NULL_TREE; - break; - } - - default: - /* Not handled yet in if-conversion. */ - return false; - } + cond = NULL_TREE; } /* If current bb has only one successor, then consider it as an @@ -1065,8 +1059,6 @@ predicate_bbs (loop_p loop) reset_bb_predicate (loop->header); gcc_assert (bb_predicate_gimplified_stmts (loop->header) == NULL && bb_predicate_gimplified_stmts (loop->latch) == NULL); - - return true; } /* Return true when LOOP is if-convertible. This is a helper function @@ -1111,9 +1103,24 @@ if_convertible_loop_p_1 (struct loop *lo exit_bb = bb; } - res = predicate_bbs (loop); - if (!res) - return false; + for (i = 0; i < loop->num_nodes; i++) + { + basic_block bb = ifc_bbs[i]; + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + switch (gimple_code (gsi_stmt (gsi))) + { + case GIMPLE_LABEL: + case GIMPLE_ASSIGN: + case GIMPLE_CALL: + case GIMPLE_DEBUG: + case GIMPLE_COND: + break; + default: + return false; + } + } if (flag_tree_loop_if_convert_stores) { @@ -1125,6 +1132,7 @@ if_convertible_loop_p_1 (struct loop *lo DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; DR_RW_UNCONDITIONALLY (dr) = -1; } + predicate_bbs (loop); } for (i = 0; i < loop->num_nodes; i++) @@ -1137,12 +1145,16 @@ if_convertible_loop_p_1 (struct loop *lo return false; /* Check the if-convertibility of statements in predicated BBs. */ - if (is_predicated (bb)) + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) if (!if_convertible_stmt_p (gsi_stmt (itr), *refs)) return false; } + if (flag_tree_loop_if_convert_stores) + for (i = 0; i < loop->num_nodes; i++) + free_bb_predicate (ifc_bbs[i]); + if (dump_file) fprintf (dump_file, "Applying if-conversion\n"); @@ -1397,7 +1409,8 @@ insert_gimplified_predicates (loop_p loo basic_block bb = ifc_bbs[i]; gimple_seq stmts; - if (!is_predicated (bb)) + if (!is_predicated (bb) + || dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) { /* Do not insert statements for a basic block that is not predicated. Also make sure that the predicate of the @@ -1648,6 +1661,7 @@ combine_blocks (struct loop *loop) edge e; edge_iterator ei; + predicate_bbs (loop); remove_conditions_and_labels (loop); insert_gimplified_predicates (loop); predicate_all_scalar_phis (loop); @@ -1742,28 +1756,72 @@ combine_blocks (struct loop *loop) ifc_bbs = NULL; } -/* If-convert LOOP when it is legal. For the moment this pass has no - profitability analysis. Returns true when something changed. */ +/* Version LOOP before if-converting it, the original loop + will be then if-converted, the new copy of the loop will not, + and the LOOP_VECTORIZED internal call will be guarding which + loop to execute. The vectorizer pass will fold this + internal call into either true or false. */ static bool +version_loop_for_if_conversion (struct loop *loop) +{ + basic_block cond_bb; + tree cond = make_ssa_name (boolean_type_node, NULL); + struct loop *new_loop; + gimple g; + gimple_stmt_iterator gsi; + + g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2, + build_int_cst (integer_type_node, loop->num), + integer_zero_node); + gimple_call_set_lhs (g, cond); + + initialize_original_copy_tables (); + new_loop = loop_version (loop, cond, &cond_bb, + REG_BR_PROB_BASE, REG_BR_PROB_BASE, + REG_BR_PROB_BASE, true); + free_original_copy_tables (); + if (new_loop == NULL) + return false; + new_loop->dont_vectorize = true; + new_loop->force_vect = false; + gsi = gsi_last_bb (cond_bb); + gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num)); + gsi_insert_before (&gsi, g, GSI_SAME_STMT); + update_ssa (TODO_update_ssa); + return true; +} + +/* If-convert LOOP when it is legal. For the moment this pass has no + profitability analysis. Returns non-zero todo flags when something + changed. */ + +static unsigned int tree_if_conversion (struct loop *loop) { - bool changed = false; + unsigned int todo = 0; ifc_bbs = NULL; if (!if_convertible_loop_p (loop) || !dbg_cnt (if_conversion_tree)) goto cleanup; + if ((flag_tree_loop_vectorize || loop->force_vect) + && flag_tree_loop_if_convert == -1 + && !version_loop_for_if_conversion (loop)) + goto cleanup; + /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ combine_blocks (loop); + todo |= TODO_cleanup_cfg; if (flag_tree_loop_if_convert_stores) - mark_virtual_operands_for_renaming (cfun); - - changed = true; + { + mark_virtual_operands_for_renaming (cfun); + todo |= TODO_update_ssa_only_virtuals; + } cleanup: if (ifc_bbs) @@ -1777,7 +1835,7 @@ tree_if_conversion (struct loop *loop) ifc_bbs = NULL; } - return changed; + return todo; } /* Tree if-conversion pass management. */ @@ -1787,7 +1845,6 @@ main_tree_if_conversion (void) { loop_iterator li; struct loop *loop; - bool changed = false; unsigned todo = 0; if (number_of_loops (cfun) <= 1) @@ -1795,16 +1852,9 @@ main_tree_if_conversion (void) FOR_EACH_LOOP (li, loop, 0) if (flag_tree_loop_if_convert == 1 - || flag_tree_loop_if_convert_stores == 1 - || flag_tree_loop_vectorize - || loop->force_vect) - changed |= tree_if_conversion (loop); - - if (changed) - todo |= TODO_cleanup_cfg; - - if (changed && flag_tree_loop_if_convert_stores) - todo |= TODO_update_ssa_only_virtuals; + || ((flag_tree_loop_vectorize || loop->force_vect) + && !loop->dont_vectorize)) + todo |= tree_if_conversion (loop); #ifdef ENABLE_CHECKING { @@ -1824,8 +1874,7 @@ gate_tree_if_conversion (void) { return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops) && flag_tree_loop_if_convert != 0) - || flag_tree_loop_if_convert == 1 - || flag_tree_loop_if_convert_stores == 1); + || flag_tree_loop_if_convert == 1); } namespace { --- gcc/internal-fn.c.jj 2013-10-23 14:43:09.579921012 +0200 +++ gcc/internal-fn.c 2013-10-23 18:29:24.189348915 +0200 @@ -133,6 +133,14 @@ expand_GOMP_SIMD_LAST_LANE (gimple stmt gcc_unreachable (); } +/* This should get folded in tree-vectorizer.c. */ + +static void +expand_LOOP_VECTORIZED (gimple stmt ATTRIBUTE_UNUSED) +{ + gcc_unreachable (); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: --- gcc/tree-vectorizer.c.jj 2013-10-23 14:43:15.978888016 +0200 +++ gcc/tree-vectorizer.c 2013-10-23 18:30:29.914021368 +0200 @@ -68,6 +68,7 @@ along with GCC; see the file COPYING3. #include "tree-phinodes.h" #include "ssa-iterators.h" #include "tree-ssa-loop.h" +#include "tree-cfg.h" #include "cfgloop.h" #include "tree-vectorizer.h" #include "tree-pass.h" @@ -311,6 +312,43 @@ vect_destroy_datarefs (loop_vec_info loo } +/* If LOOP has been versioned during ifcvt, return the internal call + guarding it. */ + +static gimple +vect_loop_vectorized_call (struct loop *loop) +{ + basic_block bb = loop_preheader_edge (loop)->src; + gimple g; + do + { + g = last_stmt (bb); + if (g) + break; + if (!single_pred_p (bb)) + break; + bb = single_pred (bb); + } + while (1); + if (g && gimple_code (g) == GIMPLE_COND) + { + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gsi_prev (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (is_gimple_call (g) + && gimple_call_internal_p (g) + && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED + && (tree_low_cst (gimple_call_arg (g, 0), 0) == loop->num + || tree_low_cst (gimple_call_arg (g, 1), 0) == loop->num)) + return g; + } + } + return NULL; +} + + /* Function vectorize_loops. Entry point to loop vectorization phase. */ @@ -325,6 +363,8 @@ vectorize_loops (void) struct loop *loop; hash_table simduid_to_vf_htab; hash_table simd_array_to_simduid_htab; + bool any_ifcvt_loops = false; + unsigned ret = 0; vect_loops_num = number_of_loops (cfun); @@ -347,8 +387,11 @@ vectorize_loops (void) than all previously defined loops. This fact allows us to run only over initial loops skipping newly generated ones. */ FOR_EACH_LOOP (li, loop, 0) - if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop)) - || loop->force_vect) + if (loop->dont_vectorize) + any_ifcvt_loops = true; + else if ((flag_tree_loop_vectorize + && optimize_loop_nest_for_speed_p (loop)) + || loop->force_vect) { loop_vec_info loop_vinfo; vect_location = find_loop_location (loop); @@ -366,6 +409,38 @@ vectorize_loops (void) if (!dbg_cnt (vect_loop)) break; + gimple loop_vectorized_call = vect_loop_vectorized_call (loop); + if (loop_vectorized_call) + { + tree arg = gimple_call_arg (loop_vectorized_call, 1); + basic_block *bbs; + unsigned int i; + struct loop *scalar_loop = get_loop (cfun, tree_low_cst (arg, 0)); + + LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop; + gcc_checking_assert (vect_loop_vectorized_call + (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) + == loop_vectorized_call); + bbs = get_loop_body (scalar_loop); + for (i = 0; i < scalar_loop->num_nodes; i++) + { + basic_block bb = bbs[i]; + gimple_stmt_iterator gsi; + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + gimple_set_uid (phi, 0); + } + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, 0); + } + } + free (bbs); + } if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC && dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, @@ -386,6 +461,26 @@ vectorize_loops (void) *simduid_to_vf_htab.find_slot (simduid_to_vf_data, INSERT) = simduid_to_vf_data; } + + if (loop_vectorized_call) + { + gimple g = loop_vectorized_call; + tree lhs = gimple_call_lhs (g); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gimplify_and_update_call_from_tree (&gsi, boolean_true_node); + gsi_next (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (gimple_code (g) == GIMPLE_COND + && gimple_cond_lhs (g) == lhs) + { + gimple_cond_set_lhs (g, boolean_true_node); + update_stmt (g); + ret |= TODO_cleanup_cfg; + } + } + } } vect_location = UNKNOWN_LOC; @@ -399,6 +494,34 @@ vectorize_loops (void) /* ----------- Finalize. ----------- */ + if (any_ifcvt_loops) + for (i = 1; i < vect_loops_num; i++) + { + loop = get_loop (cfun, i); + if (loop && loop->dont_vectorize) + { + gimple g = vect_loop_vectorized_call (loop); + if (g) + { + tree lhs = gimple_call_lhs (g); + gimple_stmt_iterator gsi = gsi_for_stmt (g); + gimplify_and_update_call_from_tree (&gsi, boolean_false_node); + gsi_next (&gsi); + if (!gsi_end_p (gsi)) + { + g = gsi_stmt (gsi); + if (gimple_code (g) == GIMPLE_COND + && gimple_cond_lhs (g) == lhs) + { + gimple_cond_set_lhs (g, boolean_false_node); + update_stmt (g); + ret |= TODO_cleanup_cfg; + } + } + } + } + } + for (i = 1; i < vect_loops_num; i++) { loop_vec_info loop_vinfo; @@ -456,7 +579,7 @@ vectorize_loops (void) return TODO_cleanup_cfg; } - return 0; + return ret; } --- gcc/tree-vect-loop-manip.c.jj 2013-10-23 14:43:12.791904450 +0200 +++ gcc/tree-vect-loop-manip.c 2013-10-23 18:29:24.190348902 +0200 @@ -696,12 +696,42 @@ slpeel_make_loop_iterate_ntimes (struct loop->nb_iterations = niters; } +/* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg. + For all PHI arguments in FROM->dest and TO->dest from those + edges ensure that TO->dest PHI arguments have current_def + to that in from. */ + +static void +slpeel_duplicate_current_defs_from_edges (edge from, edge to) +{ + gimple_stmt_iterator gsi_from, gsi_to; + + for (gsi_from = gsi_start_phis (from->dest), + gsi_to = gsi_start_phis (to->dest); + !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple from_phi = gsi_stmt (gsi_from); + gimple to_phi = gsi_stmt (gsi_to); + tree from_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, from); + tree to_arg = PHI_ARG_DEF_FROM_EDGE (to_phi, to); + if (TREE_CODE (from_arg) == SSA_NAME + && TREE_CODE (to_arg) == SSA_NAME + && get_current_def (to_arg) == NULL_TREE) + set_current_def (to_arg, get_current_def (from_arg)); + } +} + /* Given LOOP this function generates a new copy of it and puts it - on E which is either the entry or exit of LOOP. */ + on E which is either the entry or exit of LOOP. If SCALAR_LOOP is + non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the + basic blocks from SCALAR_LOOP instead of LOOP, but to either the + entry or exit of LOOP. */ struct loop * -slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e) +slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, + struct loop *scalar_loop, edge e) { struct loop *new_loop; basic_block *new_bbs, *bbs; @@ -715,19 +745,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( if (!at_exit && e != loop_preheader_edge (loop)) return NULL; - bbs = XNEWVEC (basic_block, loop->num_nodes + 1); - get_loop_body_with_size (loop, bbs, loop->num_nodes); + if (scalar_loop == NULL) + scalar_loop = loop; + + bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); + get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes); /* Check whether duplication is possible. */ - if (!can_copy_bbs_p (bbs, loop->num_nodes)) + if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes)) { free (bbs); return NULL; } /* Generate new loop structure. */ - new_loop = duplicate_loop (loop, loop_outer (loop)); - duplicate_subloops (loop, new_loop); + new_loop = duplicate_loop (scalar_loop, loop_outer (scalar_loop)); + duplicate_subloops (scalar_loop, new_loop); exit_dest = exit->dest; was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS, @@ -737,35 +770,80 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( /* Also copy the pre-header, this avoids jumping through hoops to duplicate the loop entry PHI arguments. Create an empty pre-header unconditionally for this. */ - basic_block preheader = split_edge (loop_preheader_edge (loop)); + basic_block preheader = split_edge (loop_preheader_edge (scalar_loop)); edge entry_e = single_pred_edge (preheader); - bbs[loop->num_nodes] = preheader; - new_bbs = XNEWVEC (basic_block, loop->num_nodes + 1); + bbs[scalar_loop->num_nodes] = preheader; + new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); - copy_bbs (bbs, loop->num_nodes + 1, new_bbs, + exit = single_exit (scalar_loop); + copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs, &exit, 1, &new_exit, NULL, e->src, true); - basic_block new_preheader = new_bbs[loop->num_nodes]; + exit = single_exit (loop); + basic_block new_preheader = new_bbs[scalar_loop->num_nodes]; - add_phi_args_after_copy (new_bbs, loop->num_nodes + 1, NULL); + add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL); + + if (scalar_loop != loop) + { + /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from + SCALAR_LOOP will have current_def set to SSA_NAMEs in the new_loop, + but LOOP will not. slpeel_update_phi_nodes_for_guard{1,2} expects + the LOOP SSA_NAMEs (on the exit edge and edge from latch to + header) to have current_def set, so copy them over. */ + slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop), + exit); + slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch, + 0), + EDGE_SUCC (loop->latch, 0)); + } if (at_exit) /* Add the loop copy at exit. */ { + if (scalar_loop != loop) + { + gimple_stmt_iterator gsi; + new_exit = redirect_edge_and_branch (new_exit, exit_dest); + + for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e); + location_t orig_locus + = gimple_phi_arg_location_from_edge (phi, e); + + add_phi_arg (phi, orig_arg, new_exit, orig_locus); + } + } redirect_edge_and_branch_force (e, new_preheader); flush_pending_stmts (e); set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src); if (was_imm_dom) - set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header); + set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src); /* And remove the non-necessary forwarder again. Keep the other one so we have a proper pre-header for the loop at the exit edge. */ - redirect_edge_pred (single_succ_edge (preheader), single_pred (preheader)); + redirect_edge_pred (single_succ_edge (preheader), + single_pred (preheader)); delete_basic_block (preheader); - set_immediate_dominator (CDI_DOMINATORS, loop->header, - loop_preheader_edge (loop)->src); + set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, + loop_preheader_edge (scalar_loop)->src); } else /* Add the copy at entry. */ { + if (scalar_loop != loop) + { + /* Remove the non-necessary forwarder of scalar_loop again. */ + redirect_edge_pred (single_succ_edge (preheader), + single_pred (preheader)); + delete_basic_block (preheader); + set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, + loop_preheader_edge (scalar_loop)->src); + preheader = split_edge (loop_preheader_edge (loop)); + entry_e = single_pred_edge (preheader); + } + redirect_edge_and_branch_force (entry_e, new_preheader); flush_pending_stmts (entry_e); set_immediate_dominator (CDI_DOMINATORS, new_preheader, entry_e->src); @@ -776,15 +854,39 @@ slpeel_tree_duplicate_loop_to_edge_cfg ( /* And remove the non-necessary forwarder again. Keep the other one so we have a proper pre-header for the loop at the exit edge. */ - redirect_edge_pred (single_succ_edge (new_preheader), single_pred (new_preheader)); + redirect_edge_pred (single_succ_edge (new_preheader), + single_pred (new_preheader)); delete_basic_block (new_preheader); set_immediate_dominator (CDI_DOMINATORS, new_loop->header, loop_preheader_edge (new_loop)->src); } - for (unsigned i = 0; i < loop->num_nodes+1; i++) + for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++) rename_variables_in_bb (new_bbs[i]); + if (scalar_loop != loop) + { + /* Update new_loop->header PHIs, so that on the preheader + edge they are the ones from loop rather than scalar_loop. */ + gimple_stmt_iterator gsi_orig, gsi_new; + edge orig_e = loop_preheader_edge (loop); + edge new_e = loop_preheader_edge (new_loop); + + for (gsi_orig = gsi_start_phis (loop->header), + gsi_new = gsi_start_phis (new_loop->header); + !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new); + gsi_next (&gsi_orig), gsi_next (&gsi_new)) + { + gimple orig_phi = gsi_stmt (gsi_orig); + gimple new_phi = gsi_stmt (gsi_new); + tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e); + location_t orig_locus + = gimple_phi_arg_location_from_edge (orig_phi, orig_e); + + add_phi_arg (new_phi, orig_arg, new_e, orig_locus); + } + } + free (new_bbs); free (bbs); @@ -995,6 +1097,8 @@ set_prologue_iterations (basic_block bb_ Input: - LOOP: the loop to be peeled. + - SCALAR_LOOP: if non-NULL, the alternate loop from which basic blocks + should be copied. - E: the exit or entry edge of LOOP. If it is the entry edge, we peel the first iterations of LOOP. In this case first-loop is LOOP, and second-loop is the newly created loop. @@ -1036,8 +1140,8 @@ set_prologue_iterations (basic_block bb_ FORNOW the resulting code will not be in loop-closed-ssa form. */ -static struct loop* -slpeel_tree_peel_loop_to_edge (struct loop *loop, +static struct loop * +slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop, edge e, tree *first_niters, tree niters, bool update_first_loop_count, unsigned int th, bool check_profitability, @@ -1122,7 +1226,8 @@ slpeel_tree_peel_loop_to_edge (struct lo orig_exit_bb: */ - if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e))) + if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, + e))) { loop_loc = find_loop_location (loop); dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc, @@ -1763,6 +1868,7 @@ vect_do_peeling_for_loop_bound (loop_vec { tree ni_name, ratio_mult_vf_name; struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); struct loop *new_loop; edge update_e; basic_block preheader; @@ -1788,11 +1894,12 @@ vect_do_peeling_for_loop_bound (loop_vec loop_num = loop->num; - new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop), - &ratio_mult_vf_name, ni_name, false, - th, check_profitability, - cond_expr, cond_expr_stmt_list, - 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo)); + new_loop + = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop), + &ratio_mult_vf_name, ni_name, false, + th, check_profitability, + cond_expr, cond_expr_stmt_list, + 0, LOOP_VINFO_VECT_FACTOR (loop_vinfo)); gcc_assert (new_loop); gcc_assert (loop_num == loop->num); #ifdef ENABLE_CHECKING @@ -2025,6 +2132,7 @@ vect_do_peeling_for_alignment (loop_vec_ unsigned int th, bool check_profitability) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); tree niters_of_prolog_loop, ni_name; tree n_iters; tree wide_prolog_niters; @@ -2046,11 +2154,11 @@ vect_do_peeling_for_alignment (loop_vec_ /* Peel the prolog loop and iterate it niters_of_prolog_loop. */ new_loop = - slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop), + slpeel_tree_peel_loop_to_edge (loop, scalar_loop, + loop_preheader_edge (loop), &niters_of_prolog_loop, ni_name, true, th, check_profitability, NULL_TREE, NULL, - bound, - 0); + bound, 0); gcc_assert (new_loop); #ifdef ENABLE_CHECKING @@ -2406,6 +2514,7 @@ vect_loop_versioning (loop_vec_info loop unsigned int th, bool check_profitability) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); basic_block condition_bb; gimple_stmt_iterator gsi, cond_exp_gsi; basic_block merge_bb; @@ -2441,8 +2550,43 @@ vect_loop_versioning (loop_vec_info loop gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list); initialize_original_copy_tables (); - loop_version (loop, cond_expr, &condition_bb, - prob, prob, REG_BR_PROB_BASE - prob, true); + if (scalar_loop) + { + edge scalar_e; + basic_block preheader, scalar_preheader; + + /* We don't want to scale SCALAR_LOOP's frequencies, we need to + scale LOOP's frequencies instead. */ + loop_version (scalar_loop, cond_expr, &condition_bb, + prob, REG_BR_PROB_BASE, REG_BR_PROB_BASE - prob, true); + scale_loop_frequencies (loop, prob, REG_BR_PROB_BASE); + /* CONDITION_BB was created above SCALAR_LOOP's preheader, + while we need to move it above LOOP's preheader. */ + e = loop_preheader_edge (loop); + scalar_e = loop_preheader_edge (scalar_loop); + gcc_assert (empty_block_p (e->src) + && single_pred_p (e->src)); + gcc_assert (empty_block_p (scalar_e->src) + && single_pred_p (scalar_e->src)); + gcc_assert (single_pred_p (condition_bb)); + preheader = e->src; + scalar_preheader = scalar_e->src; + scalar_e = find_edge (condition_bb, scalar_preheader); + e = single_pred_edge (preheader); + redirect_edge_and_branch_force (single_pred_edge (condition_bb), + scalar_preheader); + redirect_edge_and_branch_force (scalar_e, preheader); + redirect_edge_and_branch_force (e, condition_bb); + set_immediate_dominator (CDI_DOMINATORS, condition_bb, + single_pred (condition_bb)); + set_immediate_dominator (CDI_DOMINATORS, scalar_preheader, + single_pred (scalar_preheader)); + set_immediate_dominator (CDI_DOMINATORS, preheader, + condition_bb); + } + else + loop_version (loop, cond_expr, &condition_bb, + prob, prob, REG_BR_PROB_BASE - prob, true); if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC && dump_enabled_p ()) @@ -2465,24 +2609,29 @@ vect_loop_versioning (loop_vec_info loop basic block (i.e. it has two predecessors). Just in order to simplify following transformations in the vectorizer, we fix this situation here by adding a new (empty) block on the exit-edge of the loop, - with the proper loop-exit phis to maintain loop-closed-form. */ + with the proper loop-exit phis to maintain loop-closed-form. + If loop versioning wasn't done from loop, but scalar_loop instead, + merge_bb will have already just a single successor. */ merge_bb = single_exit (loop)->dest; - gcc_assert (EDGE_COUNT (merge_bb->preds) == 2); - new_exit_bb = split_edge (single_exit (loop)); - new_exit_e = single_exit (loop); - e = EDGE_SUCC (new_exit_bb, 0); - - for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi)) - { - tree new_res; - orig_phi = gsi_stmt (gsi); - new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL); - new_phi = create_phi_node (new_res, new_exit_bb); - arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e); - add_phi_arg (new_phi, arg, new_exit_e, - gimple_phi_arg_location_from_edge (orig_phi, e)); - adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); + if (scalar_loop == NULL || EDGE_COUNT (merge_bb->preds) >= 2) + { + gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2); + new_exit_bb = split_edge (single_exit (loop)); + new_exit_e = single_exit (loop); + e = EDGE_SUCC (new_exit_bb, 0); + + for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + tree new_res; + orig_phi = gsi_stmt (gsi); + new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL); + new_phi = create_phi_node (new_res, new_exit_bb); + arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e); + add_phi_arg (new_phi, arg, new_exit_e, + gimple_phi_arg_location_from_edge (orig_phi, e)); + adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); + } } --- gcc/cfgloop.h.jj 2013-10-23 14:43:09.538921223 +0200 +++ gcc/cfgloop.h 2013-10-23 18:29:24.191348892 +0200 @@ -176,6 +176,9 @@ struct GTY ((chain_next ("%h.next"))) lo /* True if we should try harder to vectorize this loop. */ bool force_vect; + /* True if this loop should never be vectorized. */ + bool dont_vectorize; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ --- gcc/tree-loop-distribution.c.jj 2013-10-23 14:43:12.757904625 +0200 +++ gcc/tree-loop-distribution.c 2013-10-23 18:29:24.192348882 +0200 @@ -671,7 +671,7 @@ copy_loop_before (struct loop *loop) edge preheader = loop_preheader_edge (loop); initialize_original_copy_tables (); - res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, preheader); + res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader); gcc_assert (res != NULL); free_original_copy_tables (); delete_update_ssa (); --- gcc/passes.def.jj 2013-10-23 14:43:09.915919279 +0200 +++ gcc/passes.def 2013-10-23 18:29:24.192348882 +0200 @@ -213,6 +213,8 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_if_conversion); + /* pass_vectorize must immediately follow pass_if_conversion. + Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) NEXT_PASS (pass_dce_loop); --- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c.jj 2013-10-23 14:43:09.694920419 +0200 +++ gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c 2013-10-23 18:29:24.192348882 +0200 @@ -1,4 +1,5 @@ /* { dg-require-effective-target vect_condition } */ +/* { dg-additional-options "-ftree-loop-if-convert" } */ #include "tree-vect.h" --- gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c.jj 2013-10-23 14:43:09.695920414 +0200 +++ gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c 2013-10-23 18:29:24.192348882 +0200 @@ -1,4 +1,5 @@ /* { dg-require-effective-target vect_condition } */ +/* { dg-additional-options "-ftree-loop-if-convert" } */ #include "tree-vect.h" --- gcc/testsuite/gcc.dg/vect/vect-cond-11.c.jj 2013-10-23 18:29:24.193348876 +0200 +++ gcc/testsuite/gcc.dg/vect/vect-cond-11.c 2013-10-23 18:29:24.193348876 +0200 @@ -0,0 +1,116 @@ +#include "tree-vect.h" + +#define N 1024 +typedef int V __attribute__((vector_size (4))); +unsigned int a[N * 2] __attribute__((aligned)); +unsigned int b[N * 2] __attribute__((aligned)); +V c[N]; + +__attribute__((noinline, noclone)) unsigned int +foo (unsigned int *a, unsigned int *b) +{ + int i; + unsigned int r = 0; + for (i = 0; i < N; i++) + { + unsigned int x = a[i], y = b[i]; + if (x < 32) + { + x = x + 127; + y = y * 2; + } + else + { + x = x - 16; + y = y + 1; + } + a[i] = x; + b[i] = y; + r += x; + } + return r; +} + +__attribute__((noinline, noclone)) unsigned int +bar (unsigned int *a, unsigned int *b) +{ + int i; + unsigned int r = 0; + for (i = 0; i < N; i++) + { + unsigned int x = a[i], y = b[i]; + if (x < 32) + { + x = x + 127; + y = y * 2; + } + else + { + x = x - 16; + y = y + 1; + } + a[i] = x; + b[i] = y; + c[i] = c[i] + 1; + r += x; + } + return r; +} + +void +baz (unsigned int *a, unsigned int *b, + unsigned int (*fn) (unsigned int *, unsigned int *)) +{ + int i; + for (i = -64; i < 0; i++) + { + a[i] = 19; + b[i] = 17; + } + for (; i < N; i++) + { + a[i] = i - 512; + b[i] = i; + } + for (; i < N + 64; i++) + { + a[i] = 27; + b[i] = 19; + } + if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U) + __builtin_abort (); + for (i = -64; i < 0; i++) + if (a[i] != 19 || b[i] != 17) + __builtin_abort (); + for (; i < N; i++) + if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16) + || b[i] != (i - 512U < 32U ? i * 2U : i + 1U)) + __builtin_abort (); + for (; i < N + 64; i++) + if (a[i] != 27 || b[i] != 19) + __builtin_abort (); +} + +int +main () +{ + int i; + check_vect (); + baz (a + 512, b + 512, foo); + baz (a + 512, b + 512, bar); + baz (a + 512 + 1, b + 512 + 1, foo); + baz (a + 512 + 1, b + 512 + 1, bar); + baz (a + 512 + 31, b + 512 + 31, foo); + baz (a + 512 + 31, b + 512 + 31, bar); + baz (a + 512 + 1, b + 512, foo); + baz (a + 512 + 1, b + 512, bar); + baz (a + 512 + 31, b + 512, foo); + baz (a + 512 + 31, b + 512, bar); + baz (a + 512, b + 512 + 1, foo); + baz (a + 512, b + 512 + 1, bar); + baz (a + 512, b + 512 + 31, foo); + baz (a + 512, b + 512 + 31, bar); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */