Message ID | 20121120111049.GT2315@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Tue, 20 Nov 2012, Jakub Jelinek wrote: > 2012-11-20 Jakub Jelinek <jakub@redhat.com> > > * Makefile.in (tree-if-conv.o): Depend on $(TARGET_H), $(EXPR_H) > and $(OPTABS_H). > * config/i386/sse.md (maskload<mode>, maskstore<mode>): New expanders. (etc., new patterns, but nothing for md.texi) Missing documentation? brgds, H-P
Hi Jakub, I assume that you should go ahead and commit your patch. About your example. I only know that this loop is vectorized by icc compiler for AVX. I will investigate the problem you reported. Best regards. Yuri. 2012/11/20 Jakub Jelinek <jakub@redhat.com>: > On Tue, Nov 20, 2012 at 02:14:43PM +0400, Yuri Rumyantsev wrote: >> As example of missed vectorization with chain of conditions I can >> propose to to look at 462.libquantum. > > That is roughly: > > struct T > { > float __complex__ t1; > unsigned long long t2; > }; > struct S > { > int s1; > struct T *s2; > }; > > void > foo (struct S *s, int x, int y, int z) > { > int i; > for (i = 0; i < s->s1; i++) > { > if (s->s2[i].t2 & (1ULL << x)) > if(s->s2[i].t2 & (1ULL << y)) > s->s2[i].t2 ^= (1ULL << z); > } > } > isn't it? There aren't after optimizations two conditions, but just one, > (1ULL << x) | (1ULL << y) (and also 1ULL << z) are hoisted before the loop > by PRE, so the loop just does if (s->s2[i].t2 & something) s->s2[i].t2 ^= somethingelse; > > This isn't vectorized, but not because of the if-conv part which actually > puts there a masked store, but because of data refs analysis issues: > Creating dr for _10->t2 > analyze_innermost: success. > base_address: pretmp_28 > offset from base address: 0 > constant offset from base address: 8 > step: 16 > aligned to: 256 > base_object: *pretmp_28 > Access function 0: 64 > Access function 1: {0B, +, 16}_1 > Creating dr for MEM[(struct T *)_23] > analyze_innermost: success. > base_address: pretmp_28 > offset from base address: 0 > constant offset from base address: 8 > step: 16 > aligned to: 256 > base_object: MEM[(struct T *)(long long unsigned int *) pretmp_28] > Access function 0: {8B, +, 16}_1 > (compute_affine_dependence > stmt_a: _11 = _10->t2; > stmt_b: MASK_STORE (_23, 0B, _ifc__25, _20); > > (no idea why, _23 is _23 = &_10->t2; and so it should hopefully figure out > that the two do (if written at all) overlap, and then > 16: === vect_analyze_data_ref_accesses === > 16: Detected single element interleaving _10->t2 step 16 > 16: Data access with gaps requires scalar epilogue loop > 16: not consecutive access MASK_STORE (_23, 0B, _ifc__25, _20); > > 16: not vectorized: complicated access pattern. > 16: bad data access. > > The current masked load/store code isn't prepared to handle masked > loads/stores with gaps, but vectorize_masked_load_store isn't even called > in this case, it is shot down somewhere in tree-vect-data-refs.c. > > That said, is vectorization actually a win on this loop? I mean, pre-AVX > it can't be, it is working on every second DImode value, and with AVX (even > with that it could use vxorpd/vandpd) and with AVX2, it would mean vpmaskmov > with DImode for every second DImode, so vectorization factor 2, but with the > higher cost of conditional store. > > Slightly adjusted testcase above (with the float __complex__ t1; > field removed) gets us further, it is actually vectorized, but with > versioning for alias: > 15: versioning for alias required: can't determine dependence between _10->t2 and MEM[(struct T *)_23] > 15: mark for run-time aliasing test between _10->t2 and MEM[(struct T *)_23] > where obviously the two do alias (but it is access to the exact same memory > location and the (conditional) store comes after the load), thus while we > still emit the vectorized loop at expand time, it is optimized away later > on. > > I'm attaching updated version of the patch, as the older one no longer > applied after Diego's vec.h changes. > > 2012-11-20 Jakub Jelinek <jakub@redhat.com> > > * Makefile.in (tree-if-conv.o): Depend on $(TARGET_H), $(EXPR_H) > and $(OPTABS_H). > * config/i386/sse.md (maskload<mode>, maskstore<mode>): New expanders. > * tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref. > (get_references_in_stmt): Don't record operand addresses, but > operands themselves. Handle MASK_LOAD and MASK_STORE. > (find_data_references_in_stmt, graphite_find_data_references_in_stmt, > create_rdg_vertices): Adjust users of pos field of data_ref_loc_d. > * internal-fn.def (MASK_LOAD, MASK_STORE): New internal fns. > * tree-if-conv.c: Add target.h, expr.h and optabs.h includes. > (if_convertible_phi_p, insert_gimplified_predicates): Add > any_mask_load_store argument, if true, handle it like > flag_tree_loop_if_convert_stores. > (ifcvt_can_use_mask_load_store): New function. > (if_convertible_gimple_assign_stmt_p): Add any_mask_load_store > argument, check if some conditional loads or stores can't be > converted into MASK_LOAD or MASK_STORE. > (if_convertible_stmt_p): Add any_mask_load_store argument, > pass it down to if_convertible_gimple_assign_stmt_p. > (if_convertible_loop_p_1): Add any_mask_load_store argument, > pass it down to if_convertible_stmt_p and if_convertible_phi_p, > call if_convertible_phi_p only after all if_convertible_stmt_p > calls. > (if_convertible_loop_p): Add any_mask_load_store argument, > pass it down to if_convertible_loop_p_1. > (predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls. > (combine_blocks): Add any_mask_load_store argument, pass > it down to insert_gimplified_predicates and call predicate_mem_writes > if it is set. > (tree_if_conversion): Add any_mask_load_store_p argument, > adjust if_convertible_loop_p, combine_blocks calls and gather > whether any mask loads/stores have been generated. > (need_if_unconversion): New variable. > (main_tree_if_conversion): Adjust tree_if_conversion caller, > if any masked loads/stores have been created, set > need_if_unconversion and return TODO_update_ssa_only_virtuals. > (gate_tree_if_unconversion, main_tree_if_unconversion): New > functions. > (pass_if_unconversion): New pass descriptor. > * tree-vect-data-refs.c (vect_check_gather): Handle > MASK_LOAD/MASK_STORE. > (vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise. > * gimple.h (gimple_expr_type): Handle MASK_STORE. > * internal-fn.c (expand_MASK_LOAD, expand_MASK_STORE): New functions. > * tree-vect-loop.c (vect_determine_vectorization_factor): Handle > MASK_STORE. > * passes.c (init_optimization_passes): Add pass_if_unconversion. > * optabs.def (maskload_optab, maskstore_optab): New optabs. > * tree-pass.h (pass_if_unconversion): New extern decl. > * tree-vect-stmts.c (vect_mark_relevant): Don't crash if lhs > is NULL. > (exist_non_indexing_operands_for_use_p): Handle MASK_LOAD > and MASK_STORE. > (vectorizable_mask_load_store): New function. > (vectorizable_call): Call it for MASK_LOAD or MASK_STORE. > (vect_transform_stmt): Handle MASK_STORE. > > --- gcc/Makefile.in.jj 2012-11-19 14:41:26.182898959 +0100 > +++ gcc/Makefile.in 2012-11-20 11:36:51.527174629 +0100 > @@ -2398,7 +2398,7 @@ tree-nested.o: tree-nested.c $(CONFIG_H) > tree-if-conv.o: tree-if-conv.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ > $(TREE_H) $(FLAGS_H) $(BASIC_BLOCK_H) $(TREE_FLOW_H) \ > $(CFGLOOP_H) $(TREE_DATA_REF_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \ > - $(DBGCNT_H) $(GIMPLE_PRETTY_PRINT_H) > + $(DBGCNT_H) $(GIMPLE_PRETTY_PRINT_H) $(TARGET_H) $(EXPR_H) $(OPTABS_H) > tree-iterator.o : tree-iterator.c $(CONFIG_H) $(SYSTEM_H) $(TREE_H) \ > coretypes.h $(GGC_H) tree-iterator.h $(GIMPLE_H) gt-tree-iterator.h > tree-dfa.o : tree-dfa.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ > --- gcc/config/i386/sse.md.jj 2012-11-16 12:39:17.489959499 +0100 > +++ gcc/config/i386/sse.md 2012-11-20 11:36:51.530174926 +0100 > @@ -11080,6 +11080,23 @@ (define_insn "<avx_avx2>_maskstore<ssemo > (set_attr "prefix" "vex") > (set_attr "mode" "<sseinsnmode>")]) > > +(define_expand "maskload<mode>" > + [(set (match_operand:V48_AVX2 0 "register_operand") > + (unspec:V48_AVX2 > + [(match_operand:<sseintvecmode> 2 "register_operand") > + (match_operand:V48_AVX2 1 "memory_operand")] > + UNSPEC_MASKMOV))] > + "TARGET_AVX") > + > +(define_expand "maskstore<mode>" > + [(set (match_operand:V48_AVX2 0 "memory_operand") > + (unspec:V48_AVX2 > + [(match_operand:<sseintvecmode> 2 "register_operand") > + (match_operand:V48_AVX2 1 "register_operand") > + (match_dup 0)] > + UNSPEC_MASKMOV))] > + "TARGET_AVX") > + > (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" > [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") > (unspec:AVX256MODE2P > --- gcc/tree-data-ref.c.jj 2012-11-20 09:29:59.390775042 +0100 > +++ gcc/tree-data-ref.c 2012-11-20 11:40:26.407912003 +0100 > @@ -4275,11 +4275,11 @@ compute_all_dependences (vec<data_refere > > typedef struct data_ref_loc_d > { > - /* Position of the memory reference. */ > - tree *pos; > + /* The memory reference. */ > + tree ref; > > - /* True if the memory reference is read. */ > - bool is_read; > + /* True if the memory reference is read. */ > + bool is_read; > } data_ref_loc; > > > @@ -4291,7 +4291,7 @@ get_references_in_stmt (gimple stmt, vec > { > bool clobbers_memory = false; > data_ref_loc ref; > - tree *op0, *op1; > + tree op0, op1; > enum gimple_code stmt_code = gimple_code (stmt); > > references->create (0); > @@ -4300,7 +4300,10 @@ get_references_in_stmt (gimple stmt, vec > As we cannot model data-references to not spelled out > accesses give up if they may occur. */ > if ((stmt_code == GIMPLE_CALL > - && !(gimple_call_flags (stmt) & ECF_CONST)) > + && !(gimple_call_flags (stmt) & ECF_CONST) > + && (!gimple_call_internal_p (stmt) > + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD > + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) > || (stmt_code == GIMPLE_ASM > && (gimple_asm_volatile_p (stmt) || gimple_vuse (stmt)))) > clobbers_memory = true; > @@ -4311,15 +4314,15 @@ get_references_in_stmt (gimple stmt, vec > if (stmt_code == GIMPLE_ASSIGN) > { > tree base; > - op0 = gimple_assign_lhs_ptr (stmt); > - op1 = gimple_assign_rhs1_ptr (stmt); > + op0 = gimple_assign_lhs (stmt); > + op1 = gimple_assign_rhs1 (stmt); > > - if (DECL_P (*op1) > - || (REFERENCE_CLASS_P (*op1) > - && (base = get_base_address (*op1)) > + if (DECL_P (op1) > + || (REFERENCE_CLASS_P (op1) > + && (base = get_base_address (op1)) > && TREE_CODE (base) != SSA_NAME)) > { > - ref.pos = op1; > + ref.ref = op1; > ref.is_read = true; > references->safe_push (ref); > } > @@ -4328,16 +4331,35 @@ get_references_in_stmt (gimple stmt, vec > { > unsigned i, n; > > - op0 = gimple_call_lhs_ptr (stmt); > + ref.is_read = false; > + if (gimple_call_internal_p (stmt)) > + switch (gimple_call_internal_fn (stmt)) > + { > + case IFN_MASK_LOAD: > + ref.is_read = true; > + case IFN_MASK_STORE: > + ref.ref = build2 (MEM_REF, > + ref.is_read > + ? TREE_TYPE (gimple_call_lhs (stmt)) > + : TREE_TYPE (gimple_call_arg (stmt, 3)), > + gimple_call_arg (stmt, 0), > + gimple_call_arg (stmt, 1)); > + references->safe_push (ref); > + return false; > + default: > + break; > + } > + > + op0 = gimple_call_lhs (stmt); > n = gimple_call_num_args (stmt); > for (i = 0; i < n; i++) > { > - op1 = gimple_call_arg_ptr (stmt, i); > + op1 = gimple_call_arg (stmt, i); > > - if (DECL_P (*op1) > - || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1))) > + if (DECL_P (op1) > + || (REFERENCE_CLASS_P (op1) && get_base_address (op1))) > { > - ref.pos = op1; > + ref.ref = op1; > ref.is_read = true; > references->safe_push (ref); > } > @@ -4346,11 +4368,11 @@ get_references_in_stmt (gimple stmt, vec > else > return clobbers_memory; > > - if (*op0 > - && (DECL_P (*op0) > - || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0)))) > + if (op0 > + && (DECL_P (op0) > + || (REFERENCE_CLASS_P (op0) && get_base_address (op0)))) > { > - ref.pos = op0; > + ref.ref = op0; > ref.is_read = false; > references->safe_push (ref); > } > @@ -4380,7 +4402,7 @@ find_data_references_in_stmt (struct loo > FOR_EACH_VEC_ELT (references, i, ref) > { > dr = create_data_ref (nest, loop_containing_stmt (stmt), > - *ref->pos, stmt, ref->is_read); > + ref->ref, stmt, ref->is_read); > gcc_assert (dr != NULL); > datarefs->safe_push (dr); > } > @@ -4412,7 +4434,7 @@ graphite_find_data_references_in_stmt (l > > FOR_EACH_VEC_ELT (references, i, ref) > { > - dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read); > + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); > gcc_assert (dr != NULL); > datarefs->safe_push (dr); > } > @@ -5048,7 +5070,7 @@ create_rdg_vertices (struct graph *rdg, > else > RDGV_HAS_MEM_READS (v) = true; > dr = create_data_ref (loop, loop_containing_stmt (stmt), > - *ref->pos, stmt, ref->is_read); > + ref->ref, stmt, ref->is_read); > if (dr) > RDGV_DATAREFS (v).safe_push (dr); > } > --- gcc/internal-fn.def.jj 2012-11-07 08:42:08.225683975 +0100 > +++ gcc/internal-fn.def 2012-11-20 11:36:51.535175388 +0100 > @@ -1,5 +1,5 @@ > /* Internal functions. > - Copyright (C) 2011 Free Software Foundation, Inc. > + Copyright (C) 2011, 2012 Free Software Foundation, Inc. > > This file is part of GCC. > > @@ -40,3 +40,5 @@ along with GCC; see the file COPYING3. > > DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF) > DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF) > +DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF) > +DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF) > --- gcc/tree-if-conv.c.jj 2012-11-19 14:41:23.762912063 +0100 > +++ gcc/tree-if-conv.c 2012-11-20 11:39:10.913356780 +0100 > @@ -96,6 +96,9 @@ along with GCC; see the file COPYING3. > #include "tree-scalar-evolution.h" > #include "tree-pass.h" > #include "dbgcnt.h" > +#include "target.h" > +#include "expr.h" > +#include "optabs.h" > > /* List of basic blocks in if-conversion-suitable order. */ > static basic_block *ifc_bbs; > @@ -448,7 +451,8 @@ bb_with_exit_edge_p (struct loop *loop, > - there is a virtual PHI in a BB other than the loop->header. */ > > static bool > -if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi) > +if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi, > + bool any_mask_load_store) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > { > @@ -463,7 +467,7 @@ if_convertible_phi_p (struct loop *loop, > return false; > } > > - if (flag_tree_loop_if_convert_stores) > + if (flag_tree_loop_if_convert_stores || any_mask_load_store) > return true; > > /* When the flag_tree_loop_if_convert_stores is not set, check > @@ -679,6 +683,84 @@ ifcvt_could_trap_p (gimple stmt, vec<dat > return gimple_could_trap_p (stmt); > } > > +/* Return true if STMT could be converted into a masked load or store > + (conditional load or store based on a mask computed from bb predicate). */ > + > +static bool > +ifcvt_can_use_mask_load_store (gimple stmt) > +{ > + tree lhs, ref; > + enum machine_mode mode, vmode; > + optab op; > + basic_block bb; > + unsigned int vector_sizes; > + > + if (!flag_tree_vectorize > + || !gimple_assign_single_p (stmt) > + || gimple_has_volatile_ops (stmt)) > + return false; > + > + /* Avoid creating mask loads/stores if we'd need to chain > + conditions, to make it easier to undo them. */ > + bb = gimple_bb (stmt); > + if (!single_pred_p (bb) > + || is_predicated (single_pred (bb))) > + return false; > + > + /* Check whether this is a load or store. */ > + lhs = gimple_assign_lhs (stmt); > + if (TREE_CODE (lhs) != SSA_NAME) > + { > + if (!is_gimple_val (gimple_assign_rhs1 (stmt))) > + return false; > + op = maskstore_optab; > + ref = lhs; > + } > + else if (gimple_assign_load_p (stmt)) > + { > + op = maskload_optab; > + ref = gimple_assign_rhs1 (stmt); > + } > + else > + return false; > + > + /* And whether REF isn't a MEM_REF with non-addressable decl. */ > + if (TREE_CODE (ref) == MEM_REF > + && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR > + && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)) > + && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))) > + return false; > + > + /* Mask should be integer mode of the same size as the load/store > + mode. */ > + mode = TYPE_MODE (TREE_TYPE (lhs)); > + if (int_mode_for_mode (mode) == BLKmode) > + return false; > + > + /* See if there is any chance the mask load or store might be > + vectorized. If not, punt. */ > + vmode = targetm.vectorize.preferred_simd_mode (mode); > + if (!VECTOR_MODE_P (vmode)) > + return false; > + > + if (optab_handler (op, vmode) != CODE_FOR_nothing) > + return true; > + > + vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > + while (vector_sizes != 0) > + { > + unsigned int cur = 1 << floor_log2 (vector_sizes); > + vector_sizes &= ~cur; > + if (cur <= GET_MODE_SIZE (mode)) > + continue; > + vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); > + if (VECTOR_MODE_P (vmode) > + && optab_handler (op, vmode) != CODE_FOR_nothing) > + return true; > + } > + return false; > +} > + > /* Return true when STMT is if-convertible. > > GIMPLE_ASSIGN statement is not if-convertible if, > @@ -688,7 +770,8 @@ ifcvt_could_trap_p (gimple stmt, vec<dat > > static bool > if_convertible_gimple_assign_stmt_p (gimple stmt, > - vec<data_reference_p> refs) > + vec<data_reference_p> refs, > + bool *any_mask_load_store) > { > tree lhs = gimple_assign_lhs (stmt); > basic_block bb; > @@ -714,10 +797,18 @@ if_convertible_gimple_assign_stmt_p (gim > return false; > } > > + gimple_set_plf (stmt, GF_PLF_1, false); > + > if (flag_tree_loop_if_convert_stores) > { > if (ifcvt_could_trap_p (stmt, refs)) > { > + if (ifcvt_can_use_mask_load_store (stmt)) > + { > + gimple_set_plf (stmt, GF_PLF_1, true); > + *any_mask_load_store = true; > + return true; > + } > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "tree could trap...\n"); > return false; > @@ -727,6 +818,12 @@ if_convertible_gimple_assign_stmt_p (gim > > if (gimple_assign_rhs_could_trap_p (stmt)) > { > + if (ifcvt_can_use_mask_load_store (stmt)) > + { > + gimple_set_plf (stmt, GF_PLF_1, true); > + *any_mask_load_store = true; > + return true; > + } > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "tree could trap...\n"); > return false; > @@ -738,6 +835,12 @@ if_convertible_gimple_assign_stmt_p (gim > && bb != bb->loop_father->header > && !bb_with_exit_edge_p (bb->loop_father, bb)) > { > + if (ifcvt_can_use_mask_load_store (stmt)) > + { > + gimple_set_plf (stmt, GF_PLF_1, true); > + *any_mask_load_store = true; > + return true; > + } > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "LHS is not var\n"); > @@ -756,7 +859,8 @@ if_convertible_gimple_assign_stmt_p (gim > - it is a GIMPLE_LABEL or a GIMPLE_COND. */ > > static bool > -if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs) > +if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs, > + bool *any_mask_load_store) > { > switch (gimple_code (stmt)) > { > @@ -766,7 +870,8 @@ if_convertible_stmt_p (gimple stmt, vec< > return true; > > case GIMPLE_ASSIGN: > - return if_convertible_gimple_assign_stmt_p (stmt, refs); > + return if_convertible_gimple_assign_stmt_p (stmt, refs, > + any_mask_load_store); > > case GIMPLE_CALL: > { > @@ -1072,7 +1177,7 @@ static bool > if_convertible_loop_p_1 (struct loop *loop, > vec<loop_p> *loop_nest, > vec<data_reference_p> *refs, > - vec<ddr_p> *ddrs) > + vec<ddr_p> *ddrs, bool *any_mask_load_store) > { > bool res; > unsigned int i; > @@ -1128,17 +1233,27 @@ if_convertible_loop_p_1 (struct loop *lo > basic_block bb = ifc_bbs[i]; > gimple_stmt_iterator itr; > > - for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) > - if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr))) > - return false; > - > /* Check the if-convertibility of statements in predicated BBs. */ > if (is_predicated (bb)) > for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) > - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs)) > + if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, > + any_mask_load_store)) > return false; > } > > + /* Checking PHIs needs to be done after stmts, as the fact whether there > + are any masked loads or stores affects the tests. */ > + for (i = 0; i < loop->num_nodes; i++) > + { > + basic_block bb = ifc_bbs[i]; > + gimple_stmt_iterator itr; > + > + for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) > + if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr), > + *any_mask_load_store)) > + return false; > + } > + > if (dump_file) > fprintf (dump_file, "Applying if-conversion\n"); > > @@ -1154,7 +1269,7 @@ if_convertible_loop_p_1 (struct loop *lo > - if its basic blocks and phi nodes are if convertible. */ > > static bool > -if_convertible_loop_p (struct loop *loop) > +if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) > { > edge e; > edge_iterator ei; > @@ -1196,7 +1311,8 @@ if_convertible_loop_p (struct loop *loop > refs.create (5); > ddrs.create (25); > loop_nest.create (3); > - res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs); > + res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs, > + any_mask_load_store); > > if (flag_tree_loop_if_convert_stores) > { > @@ -1414,7 +1530,7 @@ predicate_all_scalar_phis (struct loop * > gimplification of the predicates. */ > > static void > -insert_gimplified_predicates (loop_p loop) > +insert_gimplified_predicates (loop_p loop, bool any_mask_load_store) > { > unsigned int i; > > @@ -1435,7 +1551,8 @@ insert_gimplified_predicates (loop_p loo > stmts = bb_predicate_gimplified_stmts (bb); > if (stmts) > { > - if (flag_tree_loop_if_convert_stores) > + if (flag_tree_loop_if_convert_stores > + || any_mask_load_store) > { > /* Insert the predicate of the BB just after the label, > as the if-conversion of memory writes will use this > @@ -1594,9 +1711,49 @@ predicate_mem_writes (loop_p loop) > } > > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > - if ((stmt = gsi_stmt (gsi)) > - && gimple_assign_single_p (stmt) > - && gimple_vdef (stmt)) > + if ((stmt = gsi_stmt (gsi)) == NULL > + || !gimple_assign_single_p (stmt)) > + continue; > + else if (gimple_plf (stmt, GF_PLF_1)) > + { > + tree lhs = gimple_assign_lhs (stmt); > + tree rhs = gimple_assign_rhs1 (stmt); > + tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask; > + gimple new_stmt; > + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); > + > + masktype = build_nonstandard_integer_type (bitsize, 1); > + mask_op0 = build_int_cst (masktype, swap ? 0 : -1); > + mask_op1 = build_int_cst (masktype, swap ? -1 : 0); > + ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; > + addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), > + true, NULL_TREE, true, > + GSI_SAME_STMT); > + cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), > + is_gimple_condexpr, NULL_TREE, > + true, GSI_SAME_STMT); > + mask = fold_build_cond_expr (masktype, unshare_expr (cond), > + mask_op0, mask_op1); > + mask = ifc_temp_var (masktype, mask, &gsi); > + ptr = build_int_cst (reference_alias_ptr_type (ref), 0); > + /* Copy points-to info if possible. */ > + if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) > + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), > + ref); > + if (TREE_CODE (lhs) == SSA_NAME) > + { > + new_stmt > + = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, > + ptr, mask); > + gimple_call_set_lhs (new_stmt, lhs); > + } > + else > + new_stmt > + = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, > + mask, rhs); > + gsi_replace (&gsi, new_stmt, false); > + } > + else if (gimple_vdef (stmt)) > { > tree lhs = gimple_assign_lhs (stmt); > tree rhs = gimple_assign_rhs1 (stmt); > @@ -1666,7 +1823,7 @@ remove_conditions_and_labels (loop_p loo > blocks. Replace PHI nodes with conditional modify expressions. */ > > static void > -combine_blocks (struct loop *loop) > +combine_blocks (struct loop *loop, bool any_mask_load_store) > { > basic_block bb, exit_bb, merge_target_bb; > unsigned int orig_loop_num_nodes = loop->num_nodes; > @@ -1675,10 +1832,10 @@ combine_blocks (struct loop *loop) > edge_iterator ei; > > remove_conditions_and_labels (loop); > - insert_gimplified_predicates (loop); > + insert_gimplified_predicates (loop, any_mask_load_store); > predicate_all_scalar_phis (loop); > > - if (flag_tree_loop_if_convert_stores) > + if (flag_tree_loop_if_convert_stores || any_mask_load_store) > predicate_mem_writes (loop); > > /* Merge basic blocks: first remove all the edges in the loop, > @@ -1775,23 +1932,25 @@ combine_blocks (struct loop *loop) > profitability analysis. Returns true when something changed. */ > > static bool > -tree_if_conversion (struct loop *loop) > +tree_if_conversion (struct loop *loop, bool *any_mask_load_store_p) > { > bool changed = false; > ifc_bbs = NULL; > + bool any_mask_load_store = false; > > - if (!if_convertible_loop_p (loop) > + if (!if_convertible_loop_p (loop, &any_mask_load_store) > || !dbg_cnt (if_conversion_tree)) > goto cleanup; > > /* Now all statements are if-convertible. Combine all the basic > blocks into one huge basic block doing the if-conversion > on-the-fly. */ > - combine_blocks (loop); > + combine_blocks (loop, any_mask_load_store); > > - if (flag_tree_loop_if_convert_stores) > + if (flag_tree_loop_if_convert_stores || any_mask_load_store) > mark_virtual_operands_for_renaming (cfun); > > + *any_mask_load_store_p |= any_mask_load_store; > changed = true; > > cleanup: > @@ -1809,6 +1968,9 @@ tree_if_conversion (struct loop *loop) > return changed; > } > > +/* Flag whether if-unconversion pass will be needed afterwards. */ > +static bool need_if_unconversion; > + > /* Tree if-conversion pass management. */ > > static unsigned int > @@ -1818,17 +1980,20 @@ main_tree_if_conversion (void) > struct loop *loop; > bool changed = false; > unsigned todo = 0; > + bool any_mask_load_store = false; > > if (number_of_loops () <= 1) > return 0; > > FOR_EACH_LOOP (li, loop, 0) > - changed |= tree_if_conversion (loop); > + changed |= tree_if_conversion (loop, &any_mask_load_store); > + > + need_if_unconversion = any_mask_load_store; > > if (changed) > todo |= TODO_cleanup_cfg; > > - if (changed && flag_tree_loop_if_convert_stores) > + if (changed && (flag_tree_loop_if_convert_stores || any_mask_load_store)) > todo |= TODO_update_ssa_only_virtuals; > > free_dominance_info (CDI_POST_DOMINATORS); > @@ -1865,6 +2030,139 @@ struct gimple_opt_pass pass_if_conversio > NULL, /* sub */ > NULL, /* next */ > 0, /* static_pass_number */ > + TV_NONE, /* tv_id */ > + PROP_cfg | PROP_ssa, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_verify_stmts | TODO_verify_flow > + /* todo_flags_finish */ > + } > +}; > + > +/* Undo creation of MASK_LOAD or MASK_STORE, if it hasn't > + been successfully vectorized. */ > + > +static bool > +gate_tree_if_unconversion (void) > +{ > + return need_if_unconversion; > +} > + > +static unsigned int > +main_tree_if_unconversion (void) > +{ > + basic_block bb; > + gimple_stmt_iterator gsi; > + > + need_if_unconversion = false; > + FOR_EACH_BB (bb) > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple stmt = gsi_stmt (gsi); > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt) > + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD > + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + && INTEGRAL_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, 2)))) > + { > + tree cond = gimple_call_arg (stmt, 2), mem, type; > + edge e1, e2, e3; > + bool swapped_p = false; > + gimple cond_stmt, new_stmt; > + > + if (TREE_CODE (cond) == SSA_NAME > + && !SSA_NAME_IS_DEFAULT_DEF (cond)) > + { > + gimple def_stmt = SSA_NAME_DEF_STMT (cond); > + if (is_gimple_assign (def_stmt) > + && gimple_bb (def_stmt) == bb > + && gimple_assign_rhs_code (def_stmt) == COND_EXPR) > + { > + tree rhs2 = gimple_assign_rhs2 (def_stmt); > + tree rhs3 = gimple_assign_rhs3 (def_stmt); > + if (integer_all_onesp (rhs2) && integer_zerop (rhs3)) > + cond = gimple_assign_rhs1 (def_stmt); > + else if (integer_zerop (rhs2) && integer_all_onesp (rhs3)) > + { > + cond = gimple_assign_rhs1 (def_stmt); > + swapped_p = true; > + } > + } > + } > + gsi_prev (&gsi); > + e1 = split_block (bb, gsi_stmt (gsi)); > + e2 = split_block (e1->dest, stmt); > + e3 = make_edge (e1->src, e2->dest, > + swapped_p ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE); > + e1->flags = (e1->flags & ~EDGE_FALLTHRU) > + | (swapped_p ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE); > + set_immediate_dominator (CDI_DOMINATORS, e2->dest, e1->src); > + if (cond == gimple_call_arg (stmt, 2)) > + cond_stmt > + = gimple_build_cond (NE_EXPR, cond, > + build_int_cst (TREE_TYPE (cond), 0), > + NULL_TREE, NULL_TREE); > + else > + cond_stmt > + = gimple_build_cond_from_tree (cond, NULL_TREE, NULL_TREE); > + gsi = gsi_last_bb (e1->src); > + gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT); > + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) > + type = TREE_TYPE (gimple_call_lhs (stmt)); > + else > + type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + mem = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), > + gimple_call_arg (stmt, 1)); > + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), > + mem); > + else > + new_stmt = gimple_build_assign (mem, gimple_call_arg (stmt, 3)); > + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); > + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) > + { > + gimple phi; > + tree res = gimple_assign_lhs (new_stmt); > + tree tem = make_ssa_name (TREE_TYPE (res), NULL); > + tree zero = build_zero_cst (TREE_TYPE (res)); > + gimple_assign_set_lhs (new_stmt, tem); > + gimple_call_set_lhs (stmt, NULL_TREE); > + phi = create_phi_node (res, e2->dest); > + add_phi_arg (phi, tem, e2, gimple_location (stmt)); > + add_phi_arg (phi, zero, e3, gimple_location (stmt)); > + SSA_NAME_DEF_STMT (res) = phi; > + } > + else > + { > + gimple phi; > + tree new_vdef = copy_ssa_name (gimple_vuse (stmt), new_stmt); > + gimple_set_vdef (new_stmt, new_vdef); > + phi = create_phi_node (gimple_vdef (stmt), e2->dest); > + add_phi_arg (phi, new_vdef, e2, UNKNOWN_LOCATION); > + add_phi_arg (phi, gimple_vuse (stmt), e3, UNKNOWN_LOCATION); > + SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = phi; > + } > + gsi = gsi_for_stmt (stmt); > + gsi_replace (&gsi, new_stmt, false); > + gsi = gsi_for_stmt (cond_stmt); > + } > + } > + > + return 0; > +} > + > +struct gimple_opt_pass pass_if_unconversion = > +{ > + { > + GIMPLE_PASS, > + "ifuncvt", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + gate_tree_if_unconversion, /* gate */ > + main_tree_if_unconversion, /* execute */ > + NULL, /* sub */ > + NULL, /* next */ > + 0, /* static_pass_number */ > TV_NONE, /* tv_id */ > PROP_cfg | PROP_ssa, /* properties_required */ > 0, /* properties_provided */ > --- gcc/tree-vect-data-refs.c.jj 2012-11-19 14:41:23.766912043 +0100 > +++ gcc/tree-vect-data-refs.c 2012-11-20 11:36:51.587179427 +0100 > @@ -2705,6 +2705,24 @@ vect_check_gather (gimple stmt, loop_vec > enum machine_mode pmode; > int punsignedp, pvolatilep; > > + base = DR_REF (dr); > + /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, > + see if we can use the def stmt of the address. */ > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt) > + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD > + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + && TREE_CODE (base) == MEM_REF > + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME > + && integer_zerop (TREE_OPERAND (base, 1)) > + && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0))) > + { > + gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0)); > + if (is_gimple_assign (def_stmt) > + && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR) > + base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0); > + } > + > /* The gather builtins need address of the form > loop_invariant + vector * {1, 2, 4, 8} > or > @@ -2717,7 +2735,7 @@ vect_check_gather (gimple stmt, loop_vec > vectorized. The following code attempts to find such a preexistng > SSA_NAME OFF and put the loop invariants into a tree BASE > that can be gimplified before the loop. */ > - base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off, > + base = get_inner_reference (base, &pbitsize, &pbitpos, &off, > &pmode, &punsignedp, &pvolatilep, false); > gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0); > > @@ -3185,7 +3203,10 @@ vect_analyze_data_refs (loop_vec_info lo > offset = unshare_expr (DR_OFFSET (dr)); > init = unshare_expr (DR_INIT (dr)); > > - if (is_gimple_call (stmt)) > + if (is_gimple_call (stmt) > + && (!gimple_call_internal_p (stmt) > + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD > + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) > { > if (dump_enabled_p ()) > { > @@ -4892,6 +4913,14 @@ vect_supportable_dr_alignment (struct da > if (aligned_access_p (dr) && !check_aligned_accesses) > return dr_aligned; > > + /* For now assume all conditional loads/stores support unaligned > + access without any special code. */ > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt) > + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD > + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) > + return dr_unaligned_supported; > + > if (loop_vinfo) > { > vect_loop = LOOP_VINFO_LOOP (loop_vinfo); > --- gcc/gimple.h.jj 2012-11-19 14:41:26.184898949 +0100 > +++ gcc/gimple.h 2012-11-20 11:36:51.588179472 +0100 > @@ -4938,7 +4938,13 @@ gimple_expr_type (const_gimple stmt) > useless conversion involved. That means returning the > original RHS type as far as we can reconstruct it. */ > if (code == GIMPLE_CALL) > - type = gimple_call_return_type (stmt); > + { > + if (gimple_call_internal_p (stmt) > + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + else > + type = gimple_call_return_type (stmt); > + } > else > switch (gimple_assign_rhs_code (stmt)) > { > --- gcc/internal-fn.c.jj 2012-11-07 08:42:08.534682161 +0100 > +++ gcc/internal-fn.c 2012-11-20 11:36:51.589179516 +0100 > @@ -1,5 +1,5 @@ > /* Internal functions. > - Copyright (C) 2011 Free Software Foundation, Inc. > + Copyright (C) 2011, 2012 Free Software Foundation, Inc. > > This file is part of GCC. > > @@ -109,6 +109,52 @@ expand_STORE_LANES (gimple stmt) > expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops); > } > > +static void > +expand_MASK_LOAD (gimple stmt) > +{ > + struct expand_operand ops[3]; > + tree type, lhs, rhs, maskt; > + rtx mem, target, mask; > + > + maskt = gimple_call_arg (stmt, 2); > + lhs = gimple_call_lhs (stmt); > + type = TREE_TYPE (lhs); > + rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), > + gimple_call_arg (stmt, 1)); > + > + mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > + gcc_assert (MEM_P (mem)); > + mask = expand_normal (maskt); > + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > + create_output_operand (&ops[0], target, TYPE_MODE (type)); > + create_fixed_operand (&ops[1], mem); > + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > + expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); > +} > + > +static void > +expand_MASK_STORE (gimple stmt) > +{ > + struct expand_operand ops[3]; > + tree type, lhs, rhs, maskt; > + rtx mem, reg, mask; > + > + maskt = gimple_call_arg (stmt, 2); > + rhs = gimple_call_arg (stmt, 3); > + type = TREE_TYPE (rhs); > + lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), > + gimple_call_arg (stmt, 1)); > + > + mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > + gcc_assert (MEM_P (mem)); > + mask = expand_normal (maskt); > + reg = expand_normal (rhs); > + create_fixed_operand (&ops[0], mem); > + create_input_operand (&ops[1], reg, TYPE_MODE (type)); > + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > + expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); > +} > + > /* Routines to expand each internal function, indexed by function number. > Each routine has the prototype: > > --- gcc/tree-vect-loop.c.jj 2012-11-19 14:41:23.763912058 +0100 > +++ gcc/tree-vect-loop.c 2012-11-20 11:36:51.591179598 +0100 > @@ -351,7 +351,11 @@ vect_determine_vectorization_factor (loo > analyze_pattern_stmt = false; > } > > - if (gimple_get_lhs (stmt) == NULL_TREE) > + if (gimple_get_lhs (stmt) == NULL_TREE > + /* MASK_STORE has no lhs, but is ok. */ > + && (!is_gimple_call (stmt) > + || !gimple_call_internal_p (stmt) > + || gimple_call_internal_fn (stmt) != IFN_MASK_STORE)) > { > if (dump_enabled_p ()) > { > @@ -388,7 +392,12 @@ vect_determine_vectorization_factor (loo > else > { > gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); > - scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt) > + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + else > + scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > if (dump_enabled_p ()) > { > dump_printf_loc (MSG_NOTE, vect_location, > --- gcc/passes.c.jj 2012-11-19 14:41:26.185898944 +0100 > +++ gcc/passes.c 2012-11-20 11:36:51.593179673 +0100 > @@ -1478,6 +1478,7 @@ init_optimization_passes (void) > struct opt_pass **p = &pass_vectorize.pass.sub; > NEXT_PASS (pass_dce_loop); > } > + NEXT_PASS (pass_if_unconversion); > NEXT_PASS (pass_predcom); > NEXT_PASS (pass_complete_unroll); > NEXT_PASS (pass_slp_vectorize); > --- gcc/optabs.def.jj 2012-11-19 14:41:14.487962283 +0100 > +++ gcc/optabs.def 2012-11-20 11:36:51.593179673 +0100 > @@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a > OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") > OPTAB_D (udot_prod_optab, "udot_prod$I$a") > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > +OPTAB_D (maskload_optab, "maskload$a") > +OPTAB_D (maskstore_optab, "maskstore$a") > OPTAB_D (vec_extract_optab, "vec_extract$a") > OPTAB_D (vec_init_optab, "vec_init$a") > OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") > --- gcc/tree-pass.h.jj 2012-11-14 08:13:26.039860547 +0100 > +++ gcc/tree-pass.h 2012-11-20 11:36:51.594179709 +0100 > @@ -1,5 +1,5 @@ > /* Definitions for describing one tree-ssa optimization pass. > - Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 > + Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 > Free Software Foundation, Inc. > Contributed by Richard Henderson <rth@redhat.com> > > @@ -286,6 +286,7 @@ extern struct gimple_opt_pass pass_recor > extern struct gimple_opt_pass pass_graphite; > extern struct gimple_opt_pass pass_graphite_transforms; > extern struct gimple_opt_pass pass_if_conversion; > +extern struct gimple_opt_pass pass_if_unconversion; > extern struct gimple_opt_pass pass_loop_distribution; > extern struct gimple_opt_pass pass_vectorize; > extern struct gimple_opt_pass pass_slp_vectorize; > --- gcc/tree-vect-stmts.c.jj 2012-11-19 14:41:26.174898997 +0100 > +++ gcc/tree-vect-stmts.c 2012-11-20 11:36:51.596179777 +0100 > @@ -218,7 +218,7 @@ vect_mark_relevant (vec<gimple> *worklis > /* This use is out of pattern use, if LHS has other uses that are > pattern uses, we should mark the stmt itself, and not the pattern > stmt. */ > - if (TREE_CODE (lhs) == SSA_NAME) > + if (lhs && TREE_CODE (lhs) == SSA_NAME) > FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) > { > if (is_gimple_debug (USE_STMT (use_p))) > @@ -376,7 +376,27 @@ exist_non_indexing_operands_for_use_p (t > first case, and whether var corresponds to USE. */ > > if (!gimple_assign_copy_p (stmt)) > - return false; > + { > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt)) > + switch (gimple_call_internal_fn (stmt)) > + { > + case IFN_MASK_STORE: > + operand = gimple_call_arg (stmt, 3); > + if (operand == use) > + return true; > + /* FALLTHRU */ > + case IFN_MASK_LOAD: > + operand = gimple_call_arg (stmt, 2); > + if (operand == use) > + return true; > + break; > + default: > + break; > + } > + return false; > + } > + > if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) > return false; > operand = gimple_assign_rhs1 (stmt); > @@ -1695,6 +1715,401 @@ vectorizable_function (gimple call, tree > vectype_in); > } > > + > +static tree permute_vec_elements (tree, tree, tree, gimple, > + gimple_stmt_iterator *); > + > + > +static bool > +vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, > + gimple *vec_stmt, slp_tree slp_node) > +{ > + tree vec_dest = NULL; > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + stmt_vec_info prev_stmt_info; > + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + tree elem_type; > + gimple new_stmt; > + tree dummy; > + tree dataref_ptr = NULL_TREE; > + gimple ptr_incr; > + int nunits = TYPE_VECTOR_SUBPARTS (vectype); > + int ncopies; > + int i, j; > + bool inv_p; > + tree gather_base = NULL_TREE, gather_off = NULL_TREE; > + tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE; > + int gather_scale = 1; > + enum vect_def_type gather_dt = vect_unknown_def_type; > + bool is_store; > + tree mask; > + gimple def_stmt; > + tree def; > + enum vect_def_type dt; > + > + if (slp_node != NULL) > + return false; > + > + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > + gcc_assert (ncopies >= 1); > + > + is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; > + mask = gimple_call_arg (stmt, 2); > + if (TYPE_PRECISION (TREE_TYPE (mask)) > + != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) > + return false; > + > + /* FORNOW. This restriction should be relaxed. */ > + if (nested_in_vect_loop && ncopies > 1) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "multiple types in nested loop."); > + return false; > + } > + > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + return false; > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) > + return false; > + > + if (!STMT_VINFO_DATA_REF (stmt_info)) > + return false; > + > + elem_type = TREE_TYPE (vectype); > + > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > + return false; > + > + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) > + return false; > + > + if (STMT_VINFO_GATHER_P (stmt_info)) > + { > + gimple def_stmt; > + tree def; > + gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base, > + &gather_off, &gather_scale); > + gcc_assert (gather_decl); > + if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL, > + &def_stmt, &def, &gather_dt, > + &gather_off_vectype)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "gather index use not simple."); > + return false; > + } > + } > + else if (tree_int_cst_compare (nested_in_vect_loop > + ? STMT_VINFO_DR_STEP (stmt_info) > + : DR_STEP (dr), size_zero_node) < 0) > + return false; > + else if (optab_handler (is_store ? maskstore_optab : maskload_optab, > + TYPE_MODE (vectype)) == CODE_FOR_nothing) > + return false; > + > + if (TREE_CODE (mask) != SSA_NAME) > + return false; > + > + if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, > + &def_stmt, &def, &dt)) > + return false; > + > + if (is_store) > + { > + tree rhs = gimple_call_arg (stmt, 3); > + if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL, > + &def_stmt, &def, &dt)) > + return false; > + } > + > + if (!vec_stmt) /* transformation not required. */ > + { > + STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; > + return true; > + } > + > + /** Transform. **/ > + > + if (STMT_VINFO_GATHER_P (stmt_info)) > + { > + tree vec_oprnd0 = NULL_TREE, op; > + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl)); > + tree rettype, srctype, ptrtype, idxtype, masktype, scaletype; > + tree ptr, vec_mask = NULL_TREE, mask_op, var, scale; > + tree perm_mask = NULL_TREE, prev_res = NULL_TREE; > + edge pe = loop_preheader_edge (loop); > + gimple_seq seq; > + basic_block new_bb; > + enum { NARROW, NONE, WIDEN } modifier; > + int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype); > + > + if (nunits == gather_off_nunits) > + modifier = NONE; > + else if (nunits == gather_off_nunits / 2) > + { > + unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); > + modifier = WIDEN; > + > + for (i = 0; i < gather_off_nunits; ++i) > + sel[i] = i | nunits; > + > + perm_mask = vect_gen_perm_mask (gather_off_vectype, sel); > + gcc_assert (perm_mask != NULL_TREE); > + } > + else if (nunits == gather_off_nunits * 2) > + { > + unsigned char *sel = XALLOCAVEC (unsigned char, nunits); > + modifier = NARROW; > + > + for (i = 0; i < nunits; ++i) > + sel[i] = i < gather_off_nunits > + ? i : i + nunits - gather_off_nunits; > + > + perm_mask = vect_gen_perm_mask (vectype, sel); > + gcc_assert (perm_mask != NULL_TREE); > + ncopies *= 2; > + } > + else > + gcc_unreachable (); > + > + rettype = TREE_TYPE (TREE_TYPE (gather_decl)); > + srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); > + ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); > + idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); > + masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); > + scaletype = TREE_VALUE (arglist); > + gcc_checking_assert (types_compatible_p (srctype, rettype) > + && types_compatible_p (srctype, masktype)); > + > + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); > + > + ptr = fold_convert (ptrtype, gather_base); > + if (!is_gimple_min_invariant (ptr)) > + { > + ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); > + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > + gcc_assert (!new_bb); > + } > + > + scale = build_int_cst (scaletype, gather_scale); > + > + prev_stmt_info = NULL; > + for (j = 0; j < ncopies; ++j) > + { > + if (modifier == WIDEN && (j & 1)) > + op = permute_vec_elements (vec_oprnd0, vec_oprnd0, > + perm_mask, stmt, gsi); > + else if (j == 0) > + op = vec_oprnd0 > + = vect_get_vec_def_for_operand (gather_off, stmt, NULL); > + else > + op = vec_oprnd0 > + = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0); > + > + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) > + { > + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)) > + == TYPE_VECTOR_SUBPARTS (idxtype)); > + var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL); > + var = make_ssa_name (var, NULL); > + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); > + new_stmt > + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, > + op, NULL_TREE); > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + op = var; > + } > + > + if (j == 0) > + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); > + else > + { > + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, > + &def, &dt); > + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); > + } > + > + mask_op = vec_mask; > + if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) > + { > + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)) > + == TYPE_VECTOR_SUBPARTS (masktype)); > + var = vect_get_new_vect_var (masktype, vect_simple_var, NULL); > + var = make_ssa_name (var, NULL); > + mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op); > + new_stmt > + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, > + mask_op, NULL_TREE); > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + mask_op = var; > + } > + > + new_stmt > + = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op, > + scale); > + > + if (!useless_type_conversion_p (vectype, rettype)) > + { > + gcc_assert (TYPE_VECTOR_SUBPARTS (vectype) > + == TYPE_VECTOR_SUBPARTS (rettype)); > + var = vect_get_new_vect_var (rettype, vect_simple_var, NULL); > + op = make_ssa_name (var, new_stmt); > + gimple_call_set_lhs (new_stmt, op); > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + var = make_ssa_name (vec_dest, NULL); > + op = build1 (VIEW_CONVERT_EXPR, vectype, op); > + new_stmt > + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op, > + NULL_TREE); > + } > + else > + { > + var = make_ssa_name (vec_dest, new_stmt); > + gimple_call_set_lhs (new_stmt, var); > + } > + > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + > + if (modifier == NARROW) > + { > + if ((j & 1) == 0) > + { > + prev_res = var; > + continue; > + } > + var = permute_vec_elements (prev_res, var, > + perm_mask, stmt, gsi); > + new_stmt = SSA_NAME_DEF_STMT (var); > + } > + > + if (prev_stmt_info == NULL) > + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > + else > + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; > + prev_stmt_info = vinfo_for_stmt (new_stmt); > + } > + return true; > + } > + else if (is_store) > + { > + tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; > + prev_stmt_info = NULL; > + for (i = 0; i < ncopies; i++) > + { > + unsigned align, misalign; > + > + if (i == 0) > + { > + tree rhs = gimple_call_arg (stmt, 3); > + vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL); > + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); > + /* We should have catched mismatched types earlier. */ > + gcc_assert (useless_type_conversion_p (vectype, > + TREE_TYPE (vec_rhs))); > + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, > + NULL_TREE, &dummy, gsi, > + &ptr_incr, false, &inv_p); > + gcc_assert (!inv_p); > + } > + else > + { > + vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt, > + &def, &dt); > + vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs); > + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, > + &def, &dt); > + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); > + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, > + TYPE_SIZE_UNIT (vectype)); > + } > + > + align = TYPE_ALIGN_UNIT (vectype); > + if (aligned_access_p (dr)) > + misalign = 0; > + else if (DR_MISALIGNMENT (dr) == -1) > + { > + align = TYPE_ALIGN_UNIT (elem_type); > + misalign = 0; > + } > + else > + misalign = DR_MISALIGNMENT (dr); > + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, > + misalign); > + new_stmt > + = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, > + gimple_call_arg (stmt, 1), > + vec_mask, vec_rhs); > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + if (i == 0) > + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > + else > + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; > + prev_stmt_info = vinfo_for_stmt (new_stmt); > + } > + } > + else > + { > + tree vec_mask = NULL_TREE; > + prev_stmt_info = NULL; > + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); > + for (i = 0; i < ncopies; i++) > + { > + unsigned align, misalign; > + > + if (i == 0) > + { > + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); > + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, > + NULL_TREE, &dummy, gsi, > + &ptr_incr, false, &inv_p); > + gcc_assert (!inv_p); > + } > + else > + { > + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, > + &def, &dt); > + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); > + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, > + TYPE_SIZE_UNIT (vectype)); > + } > + > + align = TYPE_ALIGN_UNIT (vectype); > + if (aligned_access_p (dr)) > + misalign = 0; > + else if (DR_MISALIGNMENT (dr) == -1) > + { > + align = TYPE_ALIGN_UNIT (elem_type); > + misalign = 0; > + } > + else > + misalign = DR_MISALIGNMENT (dr); > + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, > + misalign); > + new_stmt > + = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, > + gimple_call_arg (stmt, 1), > + vec_mask); > + gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL)); > + vect_finish_stmt_generation (stmt, new_stmt, gsi); > + if (i == 0) > + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > + else > + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; > + prev_stmt_info = vinfo_for_stmt (new_stmt); > + } > + } > + > + return true; > +} > + > + > /* Function vectorizable_call. > > Check if STMT performs a function call that can be vectorized. > @@ -1737,10 +2152,16 @@ vectorizable_call (gimple stmt, gimple_s > if (!is_gimple_call (stmt)) > return false; > > - if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) > + if (stmt_can_throw_internal (stmt)) > return false; > > - if (stmt_can_throw_internal (stmt)) > + if (gimple_call_internal_p (stmt) > + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD > + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) > + return vectorizable_mask_load_store (stmt, gsi, vec_stmt, > + slp_node); > + > + if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) > return false; > > vectype_out = STMT_VINFO_VECTYPE (stmt_info); > @@ -3426,10 +3847,6 @@ vectorizable_shift (gimple stmt, gimple_ > } > > > -static tree permute_vec_elements (tree, tree, tree, gimple, > - gimple_stmt_iterator *); > - > - > /* Function vectorizable_operation. > > Check if STMT performs a binary, unary or ternary operation that can > @@ -5831,6 +6248,10 @@ vect_transform_stmt (gimple stmt, gimple > case call_vec_info_type: > done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); > stmt = gsi_stmt (*gsi); > + if (is_gimple_call (stmt) > + && gimple_call_internal_p (stmt) > + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) > + is_store = true; > break; > > case reduc_vec_info_type: > > > Jakub
--- gcc/Makefile.in.jj 2012-11-19 14:41:26.182898959 +0100 +++ gcc/Makefile.in 2012-11-20 11:36:51.527174629 +0100 @@ -2398,7 +2398,7 @@ tree-nested.o: tree-nested.c $(CONFIG_H) tree-if-conv.o: tree-if-conv.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(TREE_H) $(FLAGS_H) $(BASIC_BLOCK_H) $(TREE_FLOW_H) \ $(CFGLOOP_H) $(TREE_DATA_REF_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \ - $(DBGCNT_H) $(GIMPLE_PRETTY_PRINT_H) + $(DBGCNT_H) $(GIMPLE_PRETTY_PRINT_H) $(TARGET_H) $(EXPR_H) $(OPTABS_H) tree-iterator.o : tree-iterator.c $(CONFIG_H) $(SYSTEM_H) $(TREE_H) \ coretypes.h $(GGC_H) tree-iterator.h $(GIMPLE_H) gt-tree-iterator.h tree-dfa.o : tree-dfa.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ --- gcc/config/i386/sse.md.jj 2012-11-16 12:39:17.489959499 +0100 +++ gcc/config/i386/sse.md 2012-11-20 11:36:51.530174926 +0100 @@ -11080,6 +11080,23 @@ (define_insn "<avx_avx2>_maskstore<ssemo (set_attr "prefix" "vex") (set_attr "mode" "<sseinsnmode>")]) +(define_expand "maskload<mode>" + [(set (match_operand:V48_AVX2 0 "register_operand") + (unspec:V48_AVX2 + [(match_operand:<sseintvecmode> 2 "register_operand") + (match_operand:V48_AVX2 1 "memory_operand")] + UNSPEC_MASKMOV))] + "TARGET_AVX") + +(define_expand "maskstore<mode>" + [(set (match_operand:V48_AVX2 0 "memory_operand") + (unspec:V48_AVX2 + [(match_operand:<sseintvecmode> 2 "register_operand") + (match_operand:V48_AVX2 1 "register_operand") + (match_dup 0)] + UNSPEC_MASKMOV))] + "TARGET_AVX") + (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P --- gcc/tree-data-ref.c.jj 2012-11-20 09:29:59.390775042 +0100 +++ gcc/tree-data-ref.c 2012-11-20 11:40:26.407912003 +0100 @@ -4275,11 +4275,11 @@ compute_all_dependences (vec<data_refere typedef struct data_ref_loc_d { - /* Position of the memory reference. */ - tree *pos; + /* The memory reference. */ + tree ref; - /* True if the memory reference is read. */ - bool is_read; + /* True if the memory reference is read. */ + bool is_read; } data_ref_loc; @@ -4291,7 +4291,7 @@ get_references_in_stmt (gimple stmt, vec { bool clobbers_memory = false; data_ref_loc ref; - tree *op0, *op1; + tree op0, op1; enum gimple_code stmt_code = gimple_code (stmt); references->create (0); @@ -4300,7 +4300,10 @@ get_references_in_stmt (gimple stmt, vec As we cannot model data-references to not spelled out accesses give up if they may occur. */ if ((stmt_code == GIMPLE_CALL - && !(gimple_call_flags (stmt) & ECF_CONST)) + && !(gimple_call_flags (stmt) & ECF_CONST) + && (!gimple_call_internal_p (stmt) + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) || (stmt_code == GIMPLE_ASM && (gimple_asm_volatile_p (stmt) || gimple_vuse (stmt)))) clobbers_memory = true; @@ -4311,15 +4314,15 @@ get_references_in_stmt (gimple stmt, vec if (stmt_code == GIMPLE_ASSIGN) { tree base; - op0 = gimple_assign_lhs_ptr (stmt); - op1 = gimple_assign_rhs1_ptr (stmt); + op0 = gimple_assign_lhs (stmt); + op1 = gimple_assign_rhs1 (stmt); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) - && (base = get_base_address (*op1)) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) + && (base = get_base_address (op1)) && TREE_CODE (base) != SSA_NAME)) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4328,16 +4331,35 @@ get_references_in_stmt (gimple stmt, vec { unsigned i, n; - op0 = gimple_call_lhs_ptr (stmt); + ref.is_read = false; + if (gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_LOAD: + ref.is_read = true; + case IFN_MASK_STORE: + ref.ref = build2 (MEM_REF, + ref.is_read + ? TREE_TYPE (gimple_call_lhs (stmt)) + : TREE_TYPE (gimple_call_arg (stmt, 3)), + gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + references->safe_push (ref); + return false; + default: + break; + } + + op0 = gimple_call_lhs (stmt); n = gimple_call_num_args (stmt); for (i = 0; i < n; i++) { - op1 = gimple_call_arg_ptr (stmt, i); + op1 = gimple_call_arg (stmt, i); - if (DECL_P (*op1) - || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1))) + if (DECL_P (op1) + || (REFERENCE_CLASS_P (op1) && get_base_address (op1))) { - ref.pos = op1; + ref.ref = op1; ref.is_read = true; references->safe_push (ref); } @@ -4346,11 +4368,11 @@ get_references_in_stmt (gimple stmt, vec else return clobbers_memory; - if (*op0 - && (DECL_P (*op0) - || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0)))) + if (op0 + && (DECL_P (op0) + || (REFERENCE_CLASS_P (op0) && get_base_address (op0)))) { - ref.pos = op0; + ref.ref = op0; ref.is_read = false; references->safe_push (ref); } @@ -4380,7 +4402,7 @@ find_data_references_in_stmt (struct loo FOR_EACH_VEC_ELT (references, i, ref) { dr = create_data_ref (nest, loop_containing_stmt (stmt), - *ref->pos, stmt, ref->is_read); + ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } @@ -4412,7 +4434,7 @@ graphite_find_data_references_in_stmt (l FOR_EACH_VEC_ELT (references, i, ref) { - dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read); + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); datarefs->safe_push (dr); } @@ -5048,7 +5070,7 @@ create_rdg_vertices (struct graph *rdg, else RDGV_HAS_MEM_READS (v) = true; dr = create_data_ref (loop, loop_containing_stmt (stmt), - *ref->pos, stmt, ref->is_read); + ref->ref, stmt, ref->is_read); if (dr) RDGV_DATAREFS (v).safe_push (dr); } --- gcc/internal-fn.def.jj 2012-11-07 08:42:08.225683975 +0100 +++ gcc/internal-fn.def 2012-11-20 11:36:51.535175388 +0100 @@ -1,5 +1,5 @@ /* Internal functions. - Copyright (C) 2011 Free Software Foundation, Inc. + Copyright (C) 2011, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -40,3 +40,5 @@ along with GCC; see the file COPYING3. DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF) DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF) +DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF) +DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF) --- gcc/tree-if-conv.c.jj 2012-11-19 14:41:23.762912063 +0100 +++ gcc/tree-if-conv.c 2012-11-20 11:39:10.913356780 +0100 @@ -96,6 +96,9 @@ along with GCC; see the file COPYING3. #include "tree-scalar-evolution.h" #include "tree-pass.h" #include "dbgcnt.h" +#include "target.h" +#include "expr.h" +#include "optabs.h" /* List of basic blocks in if-conversion-suitable order. */ static basic_block *ifc_bbs; @@ -448,7 +451,8 @@ bb_with_exit_edge_p (struct loop *loop, - there is a virtual PHI in a BB other than the loop->header. */ static bool -if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi) +if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi, + bool any_mask_load_store) { if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -463,7 +467,7 @@ if_convertible_phi_p (struct loop *loop, return false; } - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) return true; /* When the flag_tree_loop_if_convert_stores is not set, check @@ -679,6 +683,84 @@ ifcvt_could_trap_p (gimple stmt, vec<dat return gimple_could_trap_p (stmt); } +/* Return true if STMT could be converted into a masked load or store + (conditional load or store based on a mask computed from bb predicate). */ + +static bool +ifcvt_can_use_mask_load_store (gimple stmt) +{ + tree lhs, ref; + enum machine_mode mode, vmode; + optab op; + basic_block bb; + unsigned int vector_sizes; + + if (!flag_tree_vectorize + || !gimple_assign_single_p (stmt) + || gimple_has_volatile_ops (stmt)) + return false; + + /* Avoid creating mask loads/stores if we'd need to chain + conditions, to make it easier to undo them. */ + bb = gimple_bb (stmt); + if (!single_pred_p (bb) + || is_predicated (single_pred (bb))) + return false; + + /* Check whether this is a load or store. */ + lhs = gimple_assign_lhs (stmt); + if (TREE_CODE (lhs) != SSA_NAME) + { + if (!is_gimple_val (gimple_assign_rhs1 (stmt))) + return false; + op = maskstore_optab; + ref = lhs; + } + else if (gimple_assign_load_p (stmt)) + { + op = maskload_optab; + ref = gimple_assign_rhs1 (stmt); + } + else + return false; + + /* And whether REF isn't a MEM_REF with non-addressable decl. */ + if (TREE_CODE (ref) == MEM_REF + && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR + && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)) + && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))) + return false; + + /* Mask should be integer mode of the same size as the load/store + mode. */ + mode = TYPE_MODE (TREE_TYPE (lhs)); + if (int_mode_for_mode (mode) == BLKmode) + return false; + + /* See if there is any chance the mask load or store might be + vectorized. If not, punt. */ + vmode = targetm.vectorize.preferred_simd_mode (mode); + if (!VECTOR_MODE_P (vmode)) + return false; + + if (optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + + vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); + while (vector_sizes != 0) + { + unsigned int cur = 1 << floor_log2 (vector_sizes); + vector_sizes &= ~cur; + if (cur <= GET_MODE_SIZE (mode)) + continue; + vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); + if (VECTOR_MODE_P (vmode) + && optab_handler (op, vmode) != CODE_FOR_nothing) + return true; + } + return false; +} + /* Return true when STMT is if-convertible. GIMPLE_ASSIGN statement is not if-convertible if, @@ -688,7 +770,8 @@ ifcvt_could_trap_p (gimple stmt, vec<dat static bool if_convertible_gimple_assign_stmt_p (gimple stmt, - vec<data_reference_p> refs) + vec<data_reference_p> refs, + bool *any_mask_load_store) { tree lhs = gimple_assign_lhs (stmt); basic_block bb; @@ -714,10 +797,18 @@ if_convertible_gimple_assign_stmt_p (gim return false; } + gimple_set_plf (stmt, GF_PLF_1, false); + if (flag_tree_loop_if_convert_stores) { if (ifcvt_could_trap_p (stmt, refs)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_1, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -727,6 +818,12 @@ if_convertible_gimple_assign_stmt_p (gim if (gimple_assign_rhs_could_trap_p (stmt)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_1, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "tree could trap...\n"); return false; @@ -738,6 +835,12 @@ if_convertible_gimple_assign_stmt_p (gim && bb != bb->loop_father->header && !bb_with_exit_edge_p (bb->loop_father, bb)) { + if (ifcvt_can_use_mask_load_store (stmt)) + { + gimple_set_plf (stmt, GF_PLF_1, true); + *any_mask_load_store = true; + return true; + } if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "LHS is not var\n"); @@ -756,7 +859,8 @@ if_convertible_gimple_assign_stmt_p (gim - it is a GIMPLE_LABEL or a GIMPLE_COND. */ static bool -if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs) +if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs, + bool *any_mask_load_store) { switch (gimple_code (stmt)) { @@ -766,7 +870,8 @@ if_convertible_stmt_p (gimple stmt, vec< return true; case GIMPLE_ASSIGN: - return if_convertible_gimple_assign_stmt_p (stmt, refs); + return if_convertible_gimple_assign_stmt_p (stmt, refs, + any_mask_load_store); case GIMPLE_CALL: { @@ -1072,7 +1177,7 @@ static bool if_convertible_loop_p_1 (struct loop *loop, vec<loop_p> *loop_nest, vec<data_reference_p> *refs, - vec<ddr_p> *ddrs) + vec<ddr_p> *ddrs, bool *any_mask_load_store) { bool res; unsigned int i; @@ -1128,17 +1233,27 @@ if_convertible_loop_p_1 (struct loop *lo basic_block bb = ifc_bbs[i]; gimple_stmt_iterator itr; - for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr))) - return false; - /* Check the if-convertibility of statements in predicated BBs. */ if (is_predicated (bb)) for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs)) + if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, + any_mask_load_store)) return false; } + /* Checking PHIs needs to be done after stmts, as the fact whether there + are any masked loads or stores affects the tests. */ + for (i = 0; i < loop->num_nodes; i++) + { + basic_block bb = ifc_bbs[i]; + gimple_stmt_iterator itr; + + for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr)) + if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr), + *any_mask_load_store)) + return false; + } + if (dump_file) fprintf (dump_file, "Applying if-conversion\n"); @@ -1154,7 +1269,7 @@ if_convertible_loop_p_1 (struct loop *lo - if its basic blocks and phi nodes are if convertible. */ static bool -if_convertible_loop_p (struct loop *loop) +if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store) { edge e; edge_iterator ei; @@ -1196,7 +1311,8 @@ if_convertible_loop_p (struct loop *loop refs.create (5); ddrs.create (25); loop_nest.create (3); - res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs); + res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs, + any_mask_load_store); if (flag_tree_loop_if_convert_stores) { @@ -1414,7 +1530,7 @@ predicate_all_scalar_phis (struct loop * gimplification of the predicates. */ static void -insert_gimplified_predicates (loop_p loop) +insert_gimplified_predicates (loop_p loop, bool any_mask_load_store) { unsigned int i; @@ -1435,7 +1551,8 @@ insert_gimplified_predicates (loop_p loo stmts = bb_predicate_gimplified_stmts (bb); if (stmts) { - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores + || any_mask_load_store) { /* Insert the predicate of the BB just after the label, as the if-conversion of memory writes will use this @@ -1594,9 +1711,49 @@ predicate_mem_writes (loop_p loop) } for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - if ((stmt = gsi_stmt (gsi)) - && gimple_assign_single_p (stmt) - && gimple_vdef (stmt)) + if ((stmt = gsi_stmt (gsi)) == NULL + || !gimple_assign_single_p (stmt)) + continue; + else if (gimple_plf (stmt, GF_PLF_1)) + { + tree lhs = gimple_assign_lhs (stmt); + tree rhs = gimple_assign_rhs1 (stmt); + tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask; + gimple new_stmt; + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); + + masktype = build_nonstandard_integer_type (bitsize, 1); + mask_op0 = build_int_cst (masktype, swap ? 0 : -1); + mask_op1 = build_int_cst (masktype, swap ? -1 : 0); + ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; + addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref), + true, NULL_TREE, true, + GSI_SAME_STMT); + cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), + is_gimple_condexpr, NULL_TREE, + true, GSI_SAME_STMT); + mask = fold_build_cond_expr (masktype, unshare_expr (cond), + mask_op0, mask_op1); + mask = ifc_temp_var (masktype, mask, &gsi); + ptr = build_int_cst (reference_alias_ptr_type (ref), 0); + /* Copy points-to info if possible. */ + if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), + ref); + if (TREE_CODE (lhs) == SSA_NAME) + { + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr, + ptr, mask); + gimple_call_set_lhs (new_stmt, lhs); + } + else + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr, + mask, rhs); + gsi_replace (&gsi, new_stmt, false); + } + else if (gimple_vdef (stmt)) { tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); @@ -1666,7 +1823,7 @@ remove_conditions_and_labels (loop_p loo blocks. Replace PHI nodes with conditional modify expressions. */ static void -combine_blocks (struct loop *loop) +combine_blocks (struct loop *loop, bool any_mask_load_store) { basic_block bb, exit_bb, merge_target_bb; unsigned int orig_loop_num_nodes = loop->num_nodes; @@ -1675,10 +1832,10 @@ combine_blocks (struct loop *loop) edge_iterator ei; remove_conditions_and_labels (loop); - insert_gimplified_predicates (loop); + insert_gimplified_predicates (loop, any_mask_load_store); predicate_all_scalar_phis (loop); - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) predicate_mem_writes (loop); /* Merge basic blocks: first remove all the edges in the loop, @@ -1775,23 +1932,25 @@ combine_blocks (struct loop *loop) profitability analysis. Returns true when something changed. */ static bool -tree_if_conversion (struct loop *loop) +tree_if_conversion (struct loop *loop, bool *any_mask_load_store_p) { bool changed = false; ifc_bbs = NULL; + bool any_mask_load_store = false; - if (!if_convertible_loop_p (loop) + if (!if_convertible_loop_p (loop, &any_mask_load_store) || !dbg_cnt (if_conversion_tree)) goto cleanup; /* Now all statements are if-convertible. Combine all the basic blocks into one huge basic block doing the if-conversion on-the-fly. */ - combine_blocks (loop); + combine_blocks (loop, any_mask_load_store); - if (flag_tree_loop_if_convert_stores) + if (flag_tree_loop_if_convert_stores || any_mask_load_store) mark_virtual_operands_for_renaming (cfun); + *any_mask_load_store_p |= any_mask_load_store; changed = true; cleanup: @@ -1809,6 +1968,9 @@ tree_if_conversion (struct loop *loop) return changed; } +/* Flag whether if-unconversion pass will be needed afterwards. */ +static bool need_if_unconversion; + /* Tree if-conversion pass management. */ static unsigned int @@ -1818,17 +1980,20 @@ main_tree_if_conversion (void) struct loop *loop; bool changed = false; unsigned todo = 0; + bool any_mask_load_store = false; if (number_of_loops () <= 1) return 0; FOR_EACH_LOOP (li, loop, 0) - changed |= tree_if_conversion (loop); + changed |= tree_if_conversion (loop, &any_mask_load_store); + + need_if_unconversion = any_mask_load_store; if (changed) todo |= TODO_cleanup_cfg; - if (changed && flag_tree_loop_if_convert_stores) + if (changed && (flag_tree_loop_if_convert_stores || any_mask_load_store)) todo |= TODO_update_ssa_only_virtuals; free_dominance_info (CDI_POST_DOMINATORS); @@ -1865,6 +2030,139 @@ struct gimple_opt_pass pass_if_conversio NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ + TV_NONE, /* tv_id */ + PROP_cfg | PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_verify_stmts | TODO_verify_flow + /* todo_flags_finish */ + } +}; + +/* Undo creation of MASK_LOAD or MASK_STORE, if it hasn't + been successfully vectorized. */ + +static bool +gate_tree_if_unconversion (void) +{ + return need_if_unconversion; +} + +static unsigned int +main_tree_if_unconversion (void) +{ + basic_block bb; + gimple_stmt_iterator gsi; + + need_if_unconversion = false; + FOR_EACH_BB (bb) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + && INTEGRAL_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, 2)))) + { + tree cond = gimple_call_arg (stmt, 2), mem, type; + edge e1, e2, e3; + bool swapped_p = false; + gimple cond_stmt, new_stmt; + + if (TREE_CODE (cond) == SSA_NAME + && !SSA_NAME_IS_DEFAULT_DEF (cond)) + { + gimple def_stmt = SSA_NAME_DEF_STMT (cond); + if (is_gimple_assign (def_stmt) + && gimple_bb (def_stmt) == bb + && gimple_assign_rhs_code (def_stmt) == COND_EXPR) + { + tree rhs2 = gimple_assign_rhs2 (def_stmt); + tree rhs3 = gimple_assign_rhs3 (def_stmt); + if (integer_all_onesp (rhs2) && integer_zerop (rhs3)) + cond = gimple_assign_rhs1 (def_stmt); + else if (integer_zerop (rhs2) && integer_all_onesp (rhs3)) + { + cond = gimple_assign_rhs1 (def_stmt); + swapped_p = true; + } + } + } + gsi_prev (&gsi); + e1 = split_block (bb, gsi_stmt (gsi)); + e2 = split_block (e1->dest, stmt); + e3 = make_edge (e1->src, e2->dest, + swapped_p ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE); + e1->flags = (e1->flags & ~EDGE_FALLTHRU) + | (swapped_p ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE); + set_immediate_dominator (CDI_DOMINATORS, e2->dest, e1->src); + if (cond == gimple_call_arg (stmt, 2)) + cond_stmt + = gimple_build_cond (NE_EXPR, cond, + build_int_cst (TREE_TYPE (cond), 0), + NULL_TREE, NULL_TREE); + else + cond_stmt + = gimple_build_cond_from_tree (cond, NULL_TREE, NULL_TREE); + gsi = gsi_last_bb (e1->src); + gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT); + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) + type = TREE_TYPE (gimple_call_lhs (stmt)); + else + type = TREE_TYPE (gimple_call_arg (stmt, 3)); + mem = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), + mem); + else + new_stmt = gimple_build_assign (mem, gimple_call_arg (stmt, 3)); + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + if (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD) + { + gimple phi; + tree res = gimple_assign_lhs (new_stmt); + tree tem = make_ssa_name (TREE_TYPE (res), NULL); + tree zero = build_zero_cst (TREE_TYPE (res)); + gimple_assign_set_lhs (new_stmt, tem); + gimple_call_set_lhs (stmt, NULL_TREE); + phi = create_phi_node (res, e2->dest); + add_phi_arg (phi, tem, e2, gimple_location (stmt)); + add_phi_arg (phi, zero, e3, gimple_location (stmt)); + SSA_NAME_DEF_STMT (res) = phi; + } + else + { + gimple phi; + tree new_vdef = copy_ssa_name (gimple_vuse (stmt), new_stmt); + gimple_set_vdef (new_stmt, new_vdef); + phi = create_phi_node (gimple_vdef (stmt), e2->dest); + add_phi_arg (phi, new_vdef, e2, UNKNOWN_LOCATION); + add_phi_arg (phi, gimple_vuse (stmt), e3, UNKNOWN_LOCATION); + SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = phi; + } + gsi = gsi_for_stmt (stmt); + gsi_replace (&gsi, new_stmt, false); + gsi = gsi_for_stmt (cond_stmt); + } + } + + return 0; +} + +struct gimple_opt_pass pass_if_unconversion = +{ + { + GIMPLE_PASS, + "ifuncvt", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + gate_tree_if_unconversion, /* gate */ + main_tree_if_unconversion, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ TV_NONE, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ --- gcc/tree-vect-data-refs.c.jj 2012-11-19 14:41:23.766912043 +0100 +++ gcc/tree-vect-data-refs.c 2012-11-20 11:36:51.587179427 +0100 @@ -2705,6 +2705,24 @@ vect_check_gather (gimple stmt, loop_vec enum machine_mode pmode; int punsignedp, pvolatilep; + base = DR_REF (dr); + /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, + see if we can use the def stmt of the address. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + && TREE_CODE (base) == MEM_REF + && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME + && integer_zerop (TREE_OPERAND (base, 1)) + && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0))) + { + gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0)); + if (is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR) + base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0); + } + /* The gather builtins need address of the form loop_invariant + vector * {1, 2, 4, 8} or @@ -2717,7 +2735,7 @@ vect_check_gather (gimple stmt, loop_vec vectorized. The following code attempts to find such a preexistng SSA_NAME OFF and put the loop invariants into a tree BASE that can be gimplified before the loop. */ - base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off, + base = get_inner_reference (base, &pbitsize, &pbitpos, &off, &pmode, &punsignedp, &pvolatilep, false); gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0); @@ -3185,7 +3203,10 @@ vect_analyze_data_refs (loop_vec_info lo offset = unshare_expr (DR_OFFSET (dr)); init = unshare_expr (DR_INIT (dr)); - if (is_gimple_call (stmt)) + if (is_gimple_call (stmt) + && (!gimple_call_internal_p (stmt) + || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD + && gimple_call_internal_fn (stmt) != IFN_MASK_STORE))) { if (dump_enabled_p ()) { @@ -4892,6 +4913,14 @@ vect_supportable_dr_alignment (struct da if (aligned_access_p (dr) && !check_aligned_accesses) return dr_aligned; + /* For now assume all conditional loads/stores support unaligned + access without any special code. */ + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return dr_unaligned_supported; + if (loop_vinfo) { vect_loop = LOOP_VINFO_LOOP (loop_vinfo); --- gcc/gimple.h.jj 2012-11-19 14:41:26.184898949 +0100 +++ gcc/gimple.h 2012-11-20 11:36:51.588179472 +0100 @@ -4938,7 +4938,13 @@ gimple_expr_type (const_gimple stmt) useless conversion involved. That means returning the original RHS type as far as we can reconstruct it. */ if (code == GIMPLE_CALL) - type = gimple_call_return_type (stmt); + { + if (gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + type = gimple_call_return_type (stmt); + } else switch (gimple_assign_rhs_code (stmt)) { --- gcc/internal-fn.c.jj 2012-11-07 08:42:08.534682161 +0100 +++ gcc/internal-fn.c 2012-11-20 11:36:51.589179516 +0100 @@ -1,5 +1,5 @@ /* Internal functions. - Copyright (C) 2011 Free Software Foundation, Inc. + Copyright (C) 2011, 2012 Free Software Foundation, Inc. This file is part of GCC. @@ -109,6 +109,52 @@ expand_STORE_LANES (gimple stmt) expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops); } +static void +expand_MASK_LOAD (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, target, mask; + + maskt = gimple_call_arg (stmt, 2); + lhs = gimple_call_lhs (stmt); + type = TREE_TYPE (lhs); + rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + create_output_operand (&ops[0], target, TYPE_MODE (type)); + create_fixed_operand (&ops[1], mem); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); +} + +static void +expand_MASK_STORE (gimple stmt) +{ + struct expand_operand ops[3]; + tree type, lhs, rhs, maskt; + rtx mem, reg, mask; + + maskt = gimple_call_arg (stmt, 2); + rhs = gimple_call_arg (stmt, 3); + type = TREE_TYPE (rhs); + lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1)); + + mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (MEM_P (mem)); + mask = expand_normal (maskt); + reg = expand_normal (rhs); + create_fixed_operand (&ops[0], mem); + create_input_operand (&ops[1], reg, TYPE_MODE (type)); + create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); + expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: --- gcc/tree-vect-loop.c.jj 2012-11-19 14:41:23.763912058 +0100 +++ gcc/tree-vect-loop.c 2012-11-20 11:36:51.591179598 +0100 @@ -351,7 +351,11 @@ vect_determine_vectorization_factor (loo analyze_pattern_stmt = false; } - if (gimple_get_lhs (stmt) == NULL_TREE) + if (gimple_get_lhs (stmt) == NULL_TREE + /* MASK_STORE has no lhs, but is ok. */ + && (!is_gimple_call (stmt) + || !gimple_call_internal_p (stmt) + || gimple_call_internal_fn (stmt) != IFN_MASK_STORE)) { if (dump_enabled_p ()) { @@ -388,7 +392,12 @@ vect_determine_vectorization_factor (loo else { gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); - scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else + scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, --- gcc/passes.c.jj 2012-11-19 14:41:26.185898944 +0100 +++ gcc/passes.c 2012-11-20 11:36:51.593179673 +0100 @@ -1478,6 +1478,7 @@ init_optimization_passes (void) struct opt_pass **p = &pass_vectorize.pass.sub; NEXT_PASS (pass_dce_loop); } + NEXT_PASS (pass_if_unconversion); NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_slp_vectorize); --- gcc/optabs.def.jj 2012-11-19 14:41:14.487962283 +0100 +++ gcc/optabs.def 2012-11-20 11:36:51.593179673 +0100 @@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") +OPTAB_D (maskload_optab, "maskload$a") +OPTAB_D (maskstore_optab, "maskstore$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") --- gcc/tree-pass.h.jj 2012-11-14 08:13:26.039860547 +0100 +++ gcc/tree-pass.h 2012-11-20 11:36:51.594179709 +0100 @@ -1,5 +1,5 @@ /* Definitions for describing one tree-ssa optimization pass. - Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 + Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. Contributed by Richard Henderson <rth@redhat.com> @@ -286,6 +286,7 @@ extern struct gimple_opt_pass pass_recor extern struct gimple_opt_pass pass_graphite; extern struct gimple_opt_pass pass_graphite_transforms; extern struct gimple_opt_pass pass_if_conversion; +extern struct gimple_opt_pass pass_if_unconversion; extern struct gimple_opt_pass pass_loop_distribution; extern struct gimple_opt_pass pass_vectorize; extern struct gimple_opt_pass pass_slp_vectorize; --- gcc/tree-vect-stmts.c.jj 2012-11-19 14:41:26.174898997 +0100 +++ gcc/tree-vect-stmts.c 2012-11-20 11:36:51.596179777 +0100 @@ -218,7 +218,7 @@ vect_mark_relevant (vec<gimple> *worklis /* This use is out of pattern use, if LHS has other uses that are pattern uses, we should mark the stmt itself, and not the pattern stmt. */ - if (TREE_CODE (lhs) == SSA_NAME) + if (lhs && TREE_CODE (lhs) == SSA_NAME) FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) { if (is_gimple_debug (USE_STMT (use_p))) @@ -376,7 +376,27 @@ exist_non_indexing_operands_for_use_p (t first case, and whether var corresponds to USE. */ if (!gimple_assign_copy_p (stmt)) - return false; + { + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt)) + switch (gimple_call_internal_fn (stmt)) + { + case IFN_MASK_STORE: + operand = gimple_call_arg (stmt, 3); + if (operand == use) + return true; + /* FALLTHRU */ + case IFN_MASK_LOAD: + operand = gimple_call_arg (stmt, 2); + if (operand == use) + return true; + break; + default: + break; + } + return false; + } + if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME) return false; operand = gimple_assign_rhs1 (stmt); @@ -1695,6 +1715,401 @@ vectorizable_function (gimple call, tree vectype_in); } + +static tree permute_vec_elements (tree, tree, tree, gimple, + gimple_stmt_iterator *); + + +static bool +vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, slp_tree slp_node) +{ + tree vec_dest = NULL; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + stmt_vec_info prev_stmt_info; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree elem_type; + gimple new_stmt; + tree dummy; + tree dataref_ptr = NULL_TREE; + gimple ptr_incr; + int nunits = TYPE_VECTOR_SUBPARTS (vectype); + int ncopies; + int i, j; + bool inv_p; + tree gather_base = NULL_TREE, gather_off = NULL_TREE; + tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE; + int gather_scale = 1; + enum vect_def_type gather_dt = vect_unknown_def_type; + bool is_store; + tree mask; + gimple def_stmt; + tree def; + enum vect_def_type dt; + + if (slp_node != NULL) + return false; + + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + gcc_assert (ncopies >= 1); + + is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; + mask = gimple_call_arg (stmt, 2); + if (TYPE_PRECISION (TREE_TYPE (mask)) + != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) + return false; + + /* FORNOW. This restriction should be relaxed. */ + if (nested_in_vect_loop && ncopies > 1) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "multiple types in nested loop."); + return false; + } + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + if (!STMT_VINFO_DATA_REF (stmt_info)) + return false; + + elem_type = TREE_TYPE (vectype); + + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + return false; + + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) + return false; + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + gimple def_stmt; + tree def; + gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base, + &gather_off, &gather_scale); + gcc_assert (gather_decl); + if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL, + &def_stmt, &def, &gather_dt, + &gather_off_vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "gather index use not simple."); + return false; + } + } + else if (tree_int_cst_compare (nested_in_vect_loop + ? STMT_VINFO_DR_STEP (stmt_info) + : DR_STEP (dr), size_zero_node) < 0) + return false; + else if (optab_handler (is_store ? maskstore_optab : maskload_optab, + TYPE_MODE (vectype)) == CODE_FOR_nothing) + return false; + + if (TREE_CODE (mask) != SSA_NAME) + return false; + + if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + + if (is_store) + { + tree rhs = gimple_call_arg (stmt, 3); + if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt)) + return false; + } + + if (!vec_stmt) /* transformation not required. */ + { + STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; + return true; + } + + /** Transform. **/ + + if (STMT_VINFO_GATHER_P (stmt_info)) + { + tree vec_oprnd0 = NULL_TREE, op; + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl)); + tree rettype, srctype, ptrtype, idxtype, masktype, scaletype; + tree ptr, vec_mask = NULL_TREE, mask_op, var, scale; + tree perm_mask = NULL_TREE, prev_res = NULL_TREE; + edge pe = loop_preheader_edge (loop); + gimple_seq seq; + basic_block new_bb; + enum { NARROW, NONE, WIDEN } modifier; + int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype); + + if (nunits == gather_off_nunits) + modifier = NONE; + else if (nunits == gather_off_nunits / 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits); + modifier = WIDEN; + + for (i = 0; i < gather_off_nunits; ++i) + sel[i] = i | nunits; + + perm_mask = vect_gen_perm_mask (gather_off_vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + } + else if (nunits == gather_off_nunits * 2) + { + unsigned char *sel = XALLOCAVEC (unsigned char, nunits); + modifier = NARROW; + + for (i = 0; i < nunits; ++i) + sel[i] = i < gather_off_nunits + ? i : i + nunits - gather_off_nunits; + + perm_mask = vect_gen_perm_mask (vectype, sel); + gcc_assert (perm_mask != NULL_TREE); + ncopies *= 2; + } + else + gcc_unreachable (); + + rettype = TREE_TYPE (TREE_TYPE (gather_decl)); + srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + scaletype = TREE_VALUE (arglist); + gcc_checking_assert (types_compatible_p (srctype, rettype) + && types_compatible_p (srctype, masktype)); + + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + + ptr = fold_convert (ptrtype, gather_base); + if (!is_gimple_min_invariant (ptr)) + { + ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); + gcc_assert (!new_bb); + } + + scale = build_int_cst (scaletype, gather_scale); + + prev_stmt_info = NULL; + for (j = 0; j < ncopies; ++j) + { + if (modifier == WIDEN && (j & 1)) + op = permute_vec_elements (vec_oprnd0, vec_oprnd0, + perm_mask, stmt, gsi); + else if (j == 0) + op = vec_oprnd0 + = vect_get_vec_def_for_operand (gather_off, stmt, NULL); + else + op = vec_oprnd0 + = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0); + + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)) + == TYPE_VECTOR_SUBPARTS (idxtype)); + var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + op = var; + } + + if (j == 0) + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + } + + mask_op = vec_mask; + if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)) + == TYPE_VECTOR_SUBPARTS (masktype)); + var = vect_get_new_vect_var (masktype, vect_simple_var, NULL); + var = make_ssa_name (var, NULL); + mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, + mask_op, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + mask_op = var; + } + + new_stmt + = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op, + scale); + + if (!useless_type_conversion_p (vectype, rettype)) + { + gcc_assert (TYPE_VECTOR_SUBPARTS (vectype) + == TYPE_VECTOR_SUBPARTS (rettype)); + var = vect_get_new_vect_var (rettype, vect_simple_var, NULL); + op = make_ssa_name (var, new_stmt); + gimple_call_set_lhs (new_stmt, op); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + var = make_ssa_name (vec_dest, NULL); + op = build1 (VIEW_CONVERT_EXPR, vectype, op); + new_stmt + = gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op, + NULL_TREE); + } + else + { + var = make_ssa_name (vec_dest, new_stmt); + gimple_call_set_lhs (new_stmt, var); + } + + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + if (modifier == NARROW) + { + if ((j & 1) == 0) + { + prev_res = var; + continue; + } + var = permute_vec_elements (prev_res, var, + perm_mask, stmt, gsi); + new_stmt = SSA_NAME_DEF_STMT (var); + } + + if (prev_stmt_info == NULL) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + return true; + } + else if (is_store) + { + tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE; + prev_stmt_info = NULL; + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + tree rhs = gimple_call_arg (stmt, 3); + vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL); + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + /* We should have catched mismatched types earlier. */ + gcc_assert (useless_type_conversion_p (vectype, + TREE_TYPE (vec_rhs))); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs); + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask, vec_rhs); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + else + { + tree vec_mask = NULL_TREE; + prev_stmt_info = NULL; + vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype); + for (i = 0; i < ncopies; i++) + { + unsigned align, misalign; + + if (i == 0) + { + vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL); + dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL, + NULL_TREE, &dummy, gsi, + &ptr_incr, false, &inv_p); + gcc_assert (!inv_p); + } + else + { + vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt, + &def, &dt); + vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask); + dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, + TYPE_SIZE_UNIT (vectype)); + } + + align = TYPE_ALIGN_UNIT (vectype); + if (aligned_access_p (dr)) + misalign = 0; + else if (DR_MISALIGNMENT (dr) == -1) + { + align = TYPE_ALIGN_UNIT (elem_type); + misalign = 0; + } + else + misalign = DR_MISALIGNMENT (dr); + set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, + misalign); + new_stmt + = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, + gimple_call_arg (stmt, 1), + vec_mask); + gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL)); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (i == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + } + + return true; +} + + /* Function vectorizable_call. Check if STMT performs a function call that can be vectorized. @@ -1737,10 +2152,16 @@ vectorizable_call (gimple stmt, gimple_s if (!is_gimple_call (stmt)) return false; - if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) + if (stmt_can_throw_internal (stmt)) return false; - if (stmt_can_throw_internal (stmt)) + if (gimple_call_internal_p (stmt) + && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD + || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + return vectorizable_mask_load_store (stmt, gsi, vec_stmt, + slp_node); + + if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) return false; vectype_out = STMT_VINFO_VECTYPE (stmt_info); @@ -3426,10 +3847,6 @@ vectorizable_shift (gimple stmt, gimple_ } -static tree permute_vec_elements (tree, tree, tree, gimple, - gimple_stmt_iterator *); - - /* Function vectorizable_operation. Check if STMT performs a binary, unary or ternary operation that can @@ -5831,6 +6248,10 @@ vect_transform_stmt (gimple stmt, gimple case call_vec_info_type: done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); + if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) + && gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + is_store = true; break; case reduc_vec_info_type: