diff mbox

[RFC] By default if-convert only basic blocks that will be vectorized (take 4)

Message ID 20131023172220.GW30970@tucnak.zalov.cz
State New
Headers show

Commit Message

Jakub Jelinek Oct. 23, 2013, 5:22 p.m. UTC
On Tue, Oct 22, 2013 at 08:27:54PM +0400, Sergey Ostanevich wrote:
> still fails on 403 et al.

Ok, reproduced, unfortunately the pending stmt sequences already pretty much
assume that they will end up in a single combined basic block.  I went
through various alternatives (deferring update_ssa (TODO_update_ssa) call
until after combine_blocks - doesn't work, because it is unhappy about
basic blocks being removed, temporarily putting all the stmts into latch
(doesn't work, because there are no PHIs for it in the loop), so the final
fix as discussed with Richard on IRC is not to predicate_bbs early before
versioning (unless -ftree-loop-if-convert-stores it is easily achievable
by just using a better dominance check for that), or for the stores stuff
doing it and freeing again (at least for now).

The predicate_bbs stuff would certainly appreciate more TLC in the future.

Attaching whole new patchset, the above mentioned fix is mostly in the
first patch (which also contains a tree-cfg.h include that is needed for
today's header reshufling), the other two patches are just tweaked to
apply on top of that.  All 3 patches together have been
bootstrapped/regtested on x86_64-linux and i686-linux, the first one
and first+second just compile time tested.

	Jakub
2013-10-23  Jakub Jelinek  <jakub@redhat.com>

	* tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field.
	(LOOP_VINFO_SCALAR_LOOP): Define.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
	* internal-fn.def (LOOP_VECTORIZED): New internal fn code.
	* tree-if-conv.c (release_bb_predicate): New function.
	(free_bb_predicate): Use it.
	(reset_bb_predicate): Likewise.  Don't unallocate bb->aux
	just to immediately allocate it again.
	(predicate_bbs): Don't return bool, only check if the last stmt
	of a basic block is GIMPLE_COND and handle that.  For basic blocks
	that dominate loop->latch assume they don't need to be predicated.
	(if_convertible_loop_p_1): Only call predicate_bbs if
	flag_tree_loop_if_convert_stores and free_bb_predicate in that case
	afterwards, check gimple_code of stmts here.  Replace is_predicated
	check with dominance check.
	(insert_gimplified_predicates): If bb dominates loop->latch, call
	reset_bb_predicate.
	(combine_blocks): Call predicate_bbs.
	(version_loop_for_if_conversion): New function.
	(tree_if_conversion): Return todo flags instead of bool, call
	version_loop_for_if_conversion if if-conversion should be just
	for the vectorized loops and nothing else.
	(main_tree_if_conversion): Adjust caller.  Don't call
	tree_if_conversion if flag_tree_loop_if_convert isn't 1 and
	the loop isn't going to be vectorized.
	(gate_tree_if_conversion): Don't turn on if-conversion just because of
	flag_tree_loop_if_convert_stores == 1.
	* internal-fn.c (expand_LOOP_VECTORIZED): New function.
	* tree-vectorizer.c (vect_loop_vectorized_call): New function.
	(vectorize_loops): Don't try to vectorize loops with
	loop->dont_vectorize set.  Set LOOP_VINFO_SCALAR_LOOP for if-converted
	loops, fold LOOP_VECTORIZED internal call depending on if loop
	has been vectorized or not.
	* tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges):
	New function.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
	If non-NULL, copy basic blocks from scalar_loop instead of loop, but
	still to loop's entry or exit edge.
	(slpeel_tree_peel_loop_to_edge): Add scalar_loop argument, pass it
	down to slpeel_tree_duplicate_loop_to_edge_cfg.
	(vect_do_peeling_for_loop_bound, vect_do_peeling_for_loop_alignment):
	Adjust callers.
	(vect_loop_versioning): If LOOP_VINFO_SCALAR_LOOP, perform loop
	versioning from that loop instead of LOOP_VINFO_LOOP, move it to the
	right place in the CFG afterwards.
	* cfgloop.h (struct loop): Add dont_vectorize field.
	* tree-loop-distribution.c (copy_loop_before): Adjust
	slpeel_tree_duplicate_loop_to_edge_cfg caller.
	* passes.def: Add a note that pass_vectorize must immediately follow
	pass_if_conversion.

	* gcc.dg/vect/bb-slp-cond-1.c: Add dg-additional-options
	-ftree-loop-if-convert.
	* gcc.dg/vect/bb-slp-pattern-2.c: Likewise.
	* gcc.dg/vect/vect-cond-11.c: New testcase.
2013-10-23  Jakub Jelinek  <jakub@redhat.com>

	* tree-if-conv.c (version_loop_for_if_conversion): Add DO_OUTER
	argument.  Store to what it points whether outer loop should be
	versioned.  If it is NULL, optimize LOOP_VERSION internal call
	in loop into boolean_true_node and in new_loop update arguments
	of it to the inner loop copies and set dont_vectorize flag.
	(tree_if_conversion): Adjust caller.  If the flag was set,
	call version_loop_for_if_conversion once again on the outer loop
	at the end.
	* tree-vectorizer.c (vect_loop_select): New function.
	(vectorize_loops): Use it to attempt to vectorize an if-converted
	loop before it's non-if-converted counterpart.  If outer loop
	vectorization is successful in that case, ensure the loop in the
	soon to be dead non-if-converted loop is not vectorized.

--- gcc/tree-if-conv.c.jj	2013-10-23 18:38:00.739772777 +0200
+++ gcc/tree-if-conv.c	2013-10-23 18:46:09.334284914 +0200
@@ -1763,14 +1763,30 @@ combine_blocks (struct loop *loop)
    internal call into either true or false.  */
 
 static bool
-version_loop_for_if_conversion (struct loop *loop)
+version_loop_for_if_conversion (struct loop *loop, bool *do_outer)
 {
+  struct loop *outer = loop_outer (loop);
   basic_block cond_bb;
   tree cond = make_ssa_name (boolean_type_node, NULL);
   struct loop *new_loop;
   gimple g;
   gimple_stmt_iterator gsi;
 
+  if (do_outer)
+    {
+      *do_outer = false;
+      if (loop->inner == NULL
+	  && outer->inner == loop
+	  && loop->next == NULL
+	  && loop_outer (outer)
+	  && outer->num_nodes == 3 + loop->num_nodes
+	  && loop_preheader_edge (loop)->src == outer->header
+	  && single_exit (loop)
+	  && outer->latch
+	  && single_exit (loop)->dest == EDGE_PRED (outer->latch, 0)->src)
+	*do_outer = true;
+    }
+
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
 				  build_int_cst (integer_type_node, loop->num),
 				  integer_zero_node);
@@ -1789,6 +1805,58 @@ version_loop_for_if_conversion (struct l
   gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
   update_ssa (TODO_update_ssa);
+  if (do_outer == NULL)
+    {
+      gcc_assert (single_succ_p (loop->header));
+      gsi = gsi_last_bb (single_succ (loop->header));
+      gimple cond_stmt = gsi_stmt (gsi);
+      gsi_prev (&gsi);
+      g = gsi_stmt (gsi);
+      gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND
+		  && is_gimple_call (g)
+		  && gimple_call_internal_p (g)
+		  && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+		  && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g));
+      gimple_cond_set_lhs (cond_stmt, boolean_true_node);
+      update_stmt (cond_stmt);
+      gcc_assert (has_zero_uses (gimple_call_lhs (g)));
+      gsi_remove (&gsi, false);
+      gcc_assert (single_succ_p (new_loop->header));
+      gsi = gsi_last_bb (single_succ (new_loop->header));
+      cond_stmt = gsi_stmt (gsi);
+      gsi_prev (&gsi);
+      g = gsi_stmt (gsi);
+      gcc_assert (gimple_code (cond_stmt) == GIMPLE_COND
+		  && is_gimple_call (g)
+		  && gimple_call_internal_p (g)
+		  && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+		  && gimple_cond_lhs (cond_stmt) == gimple_call_lhs (g)
+		  && new_loop->inner
+		  && new_loop->inner->next
+		  && new_loop->inner->next->next == NULL);
+      struct loop *inner = new_loop->inner;
+      basic_block empty_bb = loop_preheader_edge (inner)->src;
+      gcc_assert (empty_block_p (empty_bb)
+		  && single_pred_p (empty_bb)
+		  && single_succ_p (empty_bb)
+		  && single_pred (empty_bb) == single_succ (new_loop->header));
+      if (single_pred_edge (empty_bb)->flags & EDGE_TRUE_VALUE)
+	{
+	  gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+						    inner->num));
+	  gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+						    inner->next->num));
+	  inner->next->dont_vectorize = true;
+	}
+      else
+	{
+	  gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+						    inner->next->num));
+	  gimple_call_set_arg (g, 0, build_int_cst (integer_type_node,
+						    inner->num));
+	  inner->dont_vectorize = true;
+	}
+    }
   return true;
 }
 
@@ -1800,6 +1868,7 @@ static unsigned int
 tree_if_conversion (struct loop *loop)
 {
   unsigned int todo = 0;
+  bool version_outer_loop = false;
   ifc_bbs = NULL;
 
   if (!if_convertible_loop_p (loop)
@@ -1808,7 +1877,7 @@ tree_if_conversion (struct loop *loop)
 
   if ((flag_tree_loop_vectorize || loop->force_vect)
       && flag_tree_loop_if_convert == -1
-      && !version_loop_for_if_conversion (loop))
+      && !version_loop_for_if_conversion (loop, &version_outer_loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -1835,6 +1904,15 @@ tree_if_conversion (struct loop *loop)
       ifc_bbs = NULL;
     }
 
+  if (todo && version_outer_loop)
+    {
+      if (todo & TODO_update_ssa_only_virtuals)
+	{
+	  update_ssa (TODO_update_ssa_only_virtuals);
+	  todo &= ~TODO_update_ssa_only_virtuals;
+	}
+      version_loop_for_if_conversion (loop_outer (loop), NULL);
+    }
   return todo;
 }
 
--- gcc/tree-vectorizer.c.jj	2013-10-23 18:30:29.914021368 +0200
+++ gcc/tree-vectorizer.c	2013-10-23 18:41:49.143612005 +0200
@@ -348,6 +348,31 @@ vect_loop_vectorized_call (struct loop *
   return NULL;
 }
 
+/* Helper function of vectorize_loops.  If LOOP is non-if-converted
+   loop that has if-converted counterpart, return the if-converted
+   counterpart, so that we try vectorizing if-converted loops before
+   inner loops of non-if-converted loops.  */
+
+static struct loop *
+vect_loop_select (struct loop *loop)
+{
+  if (!loop->dont_vectorize)
+    return loop;
+
+  gimple g = vect_loop_vectorized_call (loop);
+  if (g == NULL)
+    return loop;
+
+  if (tree_low_cst (gimple_call_arg (g, 1), 0) != loop->num)
+    return loop;
+
+  struct loop *ifcvt_loop
+    = get_loop (cfun, tree_low_cst (gimple_call_arg (g, 0), 0));
+  if (ifcvt_loop && !ifcvt_loop->dont_vectorize)
+    return ifcvt_loop;
+  return loop;
+}
+
 
 /* Function vectorize_loops.
 
@@ -360,7 +385,7 @@ vectorize_loops (void)
   unsigned int num_vectorized_loops = 0;
   unsigned int vect_loops_num;
   loop_iterator li;
-  struct loop *loop;
+  struct loop *loop, *iloop;
   hash_table <simduid_to_vf> simduid_to_vf_htab;
   hash_table <simd_array_to_simduid> simd_array_to_simduid_htab;
   bool any_ifcvt_loops = false;
@@ -386,8 +411,8 @@ vectorize_loops (void)
   /* If some loop was duplicated, it gets bigger number
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
-  FOR_EACH_LOOP (li, loop, 0)
-    if (loop->dont_vectorize)
+  FOR_EACH_LOOP (li, iloop, 0)
+    if ((loop = vect_loop_select (iloop))->dont_vectorize)
       any_ifcvt_loops = true;
     else if ((flag_tree_loop_vectorize
 	      && optimize_loop_nest_for_speed_p (loop))
@@ -400,6 +425,10 @@ vectorize_loops (void)
 	  dump_printf (MSG_NOTE, "\nAnalyzing loop at %s:%d\n",
                        LOC_FILE (vect_location), LOC_LINE (vect_location));
 
+	/* Make sure we don't try to vectorize this loop
+	   more than once.  */
+	loop->dont_vectorize = true;
+
 	loop_vinfo = vect_analyze_loop (loop);
 	loop->aux = loop_vinfo;
 
@@ -416,6 +445,7 @@ vectorize_loops (void)
 	    basic_block *bbs;
 	    unsigned int i;
 	    struct loop *scalar_loop = get_loop (cfun, tree_low_cst (arg, 0));
+	    struct loop *inner;
 
 	    LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
 	    gcc_checking_assert (vect_loop_vectorized_call
@@ -440,6 +470,11 @@ vectorize_loops (void)
 		  }
 	      }
 	    free (bbs);
+	    /* If we have successfully vectorized an if-converted outer
+	       loop, don't attempt to vectorize the if-converted inner
+	       loop of the alternate loop.  */
+	    for (inner = scalar_loop->inner; inner; inner = inner->next)
+	      inner->dont_vectorize = true;
 	  }
         if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC
 	    && dump_enabled_p ())
@@ -482,6 +517,8 @@ vectorize_loops (void)
 	      }
 	  }
       }
+    else
+      loop->dont_vectorize = true;
 
   vect_location = UNKNOWN_LOC;
2013-10-23  Jakub Jelinek  <jakub@redhat.com>

	* config/i386/sse.md (maskload<mode>, maskstore<mode>): New expanders.
	* tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref.
	(get_references_in_stmt): Don't record operand addresses, but
	operands themselves.  Handle MASK_LOAD and MASK_STORE.
	(find_data_references_in_stmt, graphite_find_data_references_in_stmt,
	* internal-fn.def (MASK_LOAD, MASK_STORE): New internal fns.
	* tree-if-conv.c: Include target.h, expr.h, optabs.h and
	tree-ssa-address.h.
	(if_convertible_phi_p, insert_gimplified_predicates): Add
	any_mask_load_store argument, if true, handle it like
	flag_tree_loop_if_convert_stores.
	(ifcvt_can_use_mask_load_store): New function.
	(if_convertible_gimple_assign_stmt_p): Add any_mask_load_store
	argument, check if some conditional loads or stores can't be
	converted into MASK_LOAD or MASK_STORE.
	(if_convertible_stmt_p): Add any_mask_load_store argument,
	pass it down to if_convertible_gimple_assign_stmt_p.
	(if_convertible_loop_p_1): Add any_mask_load_store argument,
	pass it down to if_convertible_stmt_p and if_convertible_phi_p,
	call if_convertible_phi_p only after all if_convertible_stmt_p
	calls.
	(if_convertible_loop_p): Add any_mask_load_store argument,
	pass it down to if_convertible_loop_p_1.
	(predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls.
	(combine_blocks): Add any_mask_load_store argument, pass
	it down to insert_gimplified_predicates and call predicate_mem_writes
	if it is set.
	(tree_if_conversion): Adjust if_convertible_loop_p and combine_blocks
	calls, set TODO_update_ssa_only_virtuals in todo also if
	any_mask_load_store has been set for the loop.
	* tree-vect-data-refs.c (vect_check_gather): Handle
	MASK_LOAD/MASK_STORE.
	(vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise.
	* gimple.h (gimple_expr_type): Handle MASK_STORE.
	* internal-fn.c (expand_MASK_LOAD, expand_MASK_STORE): New functions.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Handle
	MASK_STORE.
	* optabs.def (maskload_optab, maskstore_optab): New optabs.
	* tree-vect-stmts.c (vect_mark_relevant): Don't crash if lhs
	is NULL.
	(exist_non_indexing_operands_for_use_p): Handle MASK_LOAD
	and MASK_STORE.
	(vectorizable_mask_load_store): New function.
	(vectorizable_call): Call it for MASK_LOAD or MASK_STORE.
	(vect_transform_stmt): Handle MASK_STORE.

	* gcc.target/i386/avx2-gather-5.c: New test.
	* gcc.target/i386/avx2-gather-6.c: New test.
	* gcc.dg/vect/vect-mask-loadstore-1.c: New test.
	* gcc.dg/vect/vect-mask-load-1.c: New test.

--- gcc/config/i386/sse.md.jj	2013-10-23 14:43:09.660920594 +0200
+++ gcc/config/i386/sse.md	2013-10-23 18:50:33.292952867 +0200
@@ -12391,6 +12391,23 @@ (define_insn "<avx_avx2>_maskstore<ssemo
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
+(define_expand "maskload<mode>"
+  [(set (match_operand:V48_AVX2 0 "register_operand")
+	(unspec:V48_AVX2
+	  [(match_operand:<sseintvecmode> 2 "register_operand")
+	   (match_operand:V48_AVX2 1 "memory_operand")]
+	  UNSPEC_MASKMOV))]
+  "TARGET_AVX")
+
+(define_expand "maskstore<mode>"
+  [(set (match_operand:V48_AVX2 0 "memory_operand")
+	(unspec:V48_AVX2
+	  [(match_operand:<sseintvecmode> 2 "register_operand")
+	   (match_operand:V48_AVX2 1 "register_operand")
+	   (match_dup 0)]
+	  UNSPEC_MASKMOV))]
+  "TARGET_AVX")
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P
--- gcc/internal-fn.c.jj	2013-10-23 18:29:24.189348915 +0200
+++ gcc/internal-fn.c	2013-10-23 18:50:33.293952865 +0200
@@ -141,6 +141,52 @@ expand_LOOP_VECTORIZED (gimple stmt ATTR
   gcc_unreachable ();
 }
 
+static void
+expand_MASK_LOAD (gimple stmt)
+{
+  struct expand_operand ops[3];
+  tree type, lhs, rhs, maskt;
+  rtx mem, target, mask;
+
+  maskt = gimple_call_arg (stmt, 2);
+  lhs = gimple_call_lhs (stmt);
+  type = TREE_TYPE (lhs);
+  rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
+		gimple_call_arg (stmt, 1));
+
+  mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  gcc_assert (MEM_P (mem));
+  mask = expand_normal (maskt);
+  target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, TYPE_MODE (type));
+  create_fixed_operand (&ops[1], mem);
+  create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
+  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+}
+
+static void
+expand_MASK_STORE (gimple stmt)
+{
+  struct expand_operand ops[3];
+  tree type, lhs, rhs, maskt;
+  rtx mem, reg, mask;
+
+  maskt = gimple_call_arg (stmt, 2);
+  rhs = gimple_call_arg (stmt, 3);
+  type = TREE_TYPE (rhs);
+  lhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0),
+		gimple_call_arg (stmt, 1));
+
+  mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  gcc_assert (MEM_P (mem));
+  mask = expand_normal (maskt);
+  reg = expand_normal (rhs);
+  create_fixed_operand (&ops[0], mem);
+  create_input_operand (&ops[1], reg, TYPE_MODE (type));
+  create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
+  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
--- gcc/tree-data-ref.c.jj	2013-10-23 14:43:15.985887980 +0200
+++ gcc/tree-data-ref.c	2013-10-23 18:50:33.294952863 +0200
@@ -4312,8 +4312,8 @@ compute_all_dependences (vec<data_refere
 
 typedef struct data_ref_loc_d
 {
-  /* Position of the memory reference.  */
-  tree *pos;
+  /* The memory reference.  */
+  tree ref;
 
   /* True if the memory reference is read.  */
   bool is_read;
@@ -4328,7 +4328,7 @@ get_references_in_stmt (gimple stmt, vec
 {
   bool clobbers_memory = false;
   data_ref_loc ref;
-  tree *op0, *op1;
+  tree op0, op1;
   enum gimple_code stmt_code = gimple_code (stmt);
 
   /* ASM_EXPR and CALL_EXPR may embed arbitrary side effects.
@@ -4338,16 +4338,26 @@ get_references_in_stmt (gimple stmt, vec
       && !(gimple_call_flags (stmt) & ECF_CONST))
     {
       /* Allow IFN_GOMP_SIMD_LANE in their own loops.  */
-      if (gimple_call_internal_p (stmt)
-	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
-	{
-	  struct loop *loop = gimple_bb (stmt)->loop_father;
-	  tree uid = gimple_call_arg (stmt, 0);
-	  gcc_assert (TREE_CODE (uid) == SSA_NAME);
-	  if (loop == NULL
-	      || loop->simduid != SSA_NAME_VAR (uid))
+      if (gimple_call_internal_p (stmt))
+	switch (gimple_call_internal_fn (stmt))
+	  {
+	  case IFN_GOMP_SIMD_LANE:
+	    {
+	      struct loop *loop = gimple_bb (stmt)->loop_father;
+	      tree uid = gimple_call_arg (stmt, 0);
+	      gcc_assert (TREE_CODE (uid) == SSA_NAME);
+	      if (loop == NULL
+		  || loop->simduid != SSA_NAME_VAR (uid))
+		clobbers_memory = true;
+	      break;
+	    }
+	  case IFN_MASK_LOAD:
+	  case IFN_MASK_STORE:
+	    break;
+	  default:
 	    clobbers_memory = true;
-	}
+	    break;
+	  }
       else
 	clobbers_memory = true;
     }
@@ -4361,15 +4371,15 @@ get_references_in_stmt (gimple stmt, vec
   if (stmt_code == GIMPLE_ASSIGN)
     {
       tree base;
-      op0 = gimple_assign_lhs_ptr (stmt);
-      op1 = gimple_assign_rhs1_ptr (stmt);
+      op0 = gimple_assign_lhs (stmt);
+      op1 = gimple_assign_rhs1 (stmt);
 
-      if (DECL_P (*op1)
-	  || (REFERENCE_CLASS_P (*op1)
-	      && (base = get_base_address (*op1))
+      if (DECL_P (op1)
+	  || (REFERENCE_CLASS_P (op1)
+	      && (base = get_base_address (op1))
 	      && TREE_CODE (base) != SSA_NAME))
 	{
-	  ref.pos = op1;
+	  ref.ref = op1;
 	  ref.is_read = true;
 	  references->safe_push (ref);
 	}
@@ -4378,16 +4388,35 @@ get_references_in_stmt (gimple stmt, vec
     {
       unsigned i, n;
 
-      op0 = gimple_call_lhs_ptr (stmt);
+      ref.is_read = false;
+      if (gimple_call_internal_p (stmt))
+	switch (gimple_call_internal_fn (stmt))
+	  {
+	  case IFN_MASK_LOAD:
+	    ref.is_read = true;
+	  case IFN_MASK_STORE:
+	    ref.ref = build2 (MEM_REF,
+			      ref.is_read
+			      ? TREE_TYPE (gimple_call_lhs (stmt))
+			      : TREE_TYPE (gimple_call_arg (stmt, 3)),
+			      gimple_call_arg (stmt, 0),
+			      gimple_call_arg (stmt, 1));
+	    references->safe_push (ref);
+	    return false;
+	  default:
+	    break;
+	  }
+
+      op0 = gimple_call_lhs (stmt);
       n = gimple_call_num_args (stmt);
       for (i = 0; i < n; i++)
 	{
-	  op1 = gimple_call_arg_ptr (stmt, i);
+	  op1 = gimple_call_arg (stmt, i);
 
-	  if (DECL_P (*op1)
-	      || (REFERENCE_CLASS_P (*op1) && get_base_address (*op1)))
+	  if (DECL_P (op1)
+	      || (REFERENCE_CLASS_P (op1) && get_base_address (op1)))
 	    {
-	      ref.pos = op1;
+	      ref.ref = op1;
 	      ref.is_read = true;
 	      references->safe_push (ref);
 	    }
@@ -4396,11 +4425,11 @@ get_references_in_stmt (gimple stmt, vec
   else
     return clobbers_memory;
 
-  if (*op0
-      && (DECL_P (*op0)
-	  || (REFERENCE_CLASS_P (*op0) && get_base_address (*op0))))
+  if (op0
+      && (DECL_P (op0)
+	  || (REFERENCE_CLASS_P (op0) && get_base_address (op0))))
     {
-      ref.pos = op0;
+      ref.ref = op0;
       ref.is_read = false;
       references->safe_push (ref);
     }
@@ -4431,7 +4460,7 @@ find_data_references_in_stmt (struct loo
   FOR_EACH_VEC_ELT (references, i, ref)
     {
       dr = create_data_ref (nest, loop_containing_stmt (stmt),
-			    *ref->pos, stmt, ref->is_read);
+			    ref->ref, stmt, ref->is_read);
       gcc_assert (dr != NULL);
       datarefs->safe_push (dr);
     }
@@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l
 
   FOR_EACH_VEC_ELT (references, i, ref)
     {
-      dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read);
+      dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read);
       gcc_assert (dr != NULL);
       datarefs->safe_push (dr);
     }
--- gcc/tree-if-conv.c.jj	2013-10-23 18:46:09.334284914 +0200
+++ gcc/tree-if-conv.c	2013-10-23 19:02:30.235317171 +0200
@@ -100,8 +100,12 @@ along with GCC; see the file COPYING3.
 #include "tree-chrec.h"
 #include "tree-data-ref.h"
 #include "tree-scalar-evolution.h"
+#include "tree-ssa-address.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "target.h"
+#include "expr.h"
+#include "optabs.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -463,7 +467,8 @@ bb_with_exit_edge_p (struct loop *loop,
    - there is a virtual PHI in a BB other than the loop->header.  */
 
 static bool
-if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
+if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
+		      bool any_mask_load_store)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -478,7 +483,7 @@ if_convertible_phi_p (struct loop *loop,
       return false;
     }
 
-  if (flag_tree_loop_if_convert_stores)
+  if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     return true;
 
   /* When the flag_tree_loop_if_convert_stores is not set, check
@@ -694,6 +699,78 @@ ifcvt_could_trap_p (gimple stmt, vec<dat
   return gimple_could_trap_p (stmt);
 }
 
+/* Return true if STMT could be converted into a masked load or store
+   (conditional load or store based on a mask computed from bb predicate).  */
+
+static bool
+ifcvt_can_use_mask_load_store (gimple stmt)
+{
+  tree lhs, ref;
+  enum machine_mode mode, vmode;
+  optab op;
+  basic_block bb = gimple_bb (stmt);
+  unsigned int vector_sizes;
+
+  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vect)
+      || bb->loop_father->dont_vectorize
+      || !gimple_assign_single_p (stmt)
+      || gimple_has_volatile_ops (stmt))
+    return false;
+
+  /* Check whether this is a load or store.  */
+  lhs = gimple_assign_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+    {
+      if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
+	return false;
+      op = maskstore_optab;
+      ref = lhs;
+    }
+  else if (gimple_assign_load_p (stmt))
+    {
+      op = maskload_optab;
+      ref = gimple_assign_rhs1 (stmt);
+    }
+  else
+    return false;
+
+  /* And whether REF isn't a MEM_REF with non-addressable decl.  */
+  if (TREE_CODE (ref) == MEM_REF
+      && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
+      && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))
+      && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)))
+    return false;
+
+  /* Mask should be integer mode of the same size as the load/store
+     mode.  */
+  mode = TYPE_MODE (TREE_TYPE (lhs));
+  if (int_mode_for_mode (mode) == BLKmode)
+    return false;
+
+  /* See if there is any chance the mask load or store might be
+     vectorized.  If not, punt.  */
+  vmode = targetm.vectorize.preferred_simd_mode (mode);
+  if (!VECTOR_MODE_P (vmode))
+    return false;
+
+  if (optab_handler (op, vmode) != CODE_FOR_nothing)
+    return true;
+
+  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+  while (vector_sizes != 0)
+    {
+      unsigned int cur = 1 << floor_log2 (vector_sizes);
+      vector_sizes &= ~cur;
+      if (cur <= GET_MODE_SIZE (mode))
+	continue;
+      vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+      if (VECTOR_MODE_P (vmode)
+	  && optab_handler (op, vmode) != CODE_FOR_nothing)
+	return true;
+    }
+  return false;
+}
+
 /* Return true when STMT is if-convertible.
 
    GIMPLE_ASSIGN statement is not if-convertible if,
@@ -703,7 +780,8 @@ ifcvt_could_trap_p (gimple stmt, vec<dat
 
 static bool
 if_convertible_gimple_assign_stmt_p (gimple stmt,
-				     vec<data_reference_p> refs)
+				     vec<data_reference_p> refs,
+				     bool *any_mask_load_store)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -729,10 +807,21 @@ if_convertible_gimple_assign_stmt_p (gim
       return false;
     }
 
+  /* tree-into-ssa.c uses GF_PLF_1, so avoid it, because
+     in between if_convertible_loop_p and combine_blocks
+     we can perform loop versioning.  */
+  gimple_set_plf (stmt, GF_PLF_2, false);
+
   if (flag_tree_loop_if_convert_stores)
     {
       if (ifcvt_could_trap_p (stmt, refs))
 	{
+	  if (ifcvt_can_use_mask_load_store (stmt))
+	    {
+	      gimple_set_plf (stmt, GF_PLF_2, true);
+	      *any_mask_load_store = true;
+	      return true;
+	    }
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "tree could trap...\n");
 	  return false;
@@ -742,6 +831,12 @@ if_convertible_gimple_assign_stmt_p (gim
 
   if (gimple_assign_rhs_could_trap_p (stmt))
     {
+      if (ifcvt_can_use_mask_load_store (stmt))
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  *any_mask_load_store = true;
+	  return true;
+	}
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "tree could trap...\n");
       return false;
@@ -753,6 +848,12 @@ if_convertible_gimple_assign_stmt_p (gim
       && bb != bb->loop_father->header
       && !bb_with_exit_edge_p (bb->loop_father, bb))
     {
+      if (ifcvt_can_use_mask_load_store (stmt))
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  *any_mask_load_store = true;
+	  return true;
+	}
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "LHS is not var\n");
@@ -771,7 +872,8 @@ if_convertible_gimple_assign_stmt_p (gim
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs)
+if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
+		       bool *any_mask_load_store)
 {
   switch (gimple_code (stmt))
     {
@@ -781,7 +883,8 @@ if_convertible_stmt_p (gimple stmt, vec<
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt, refs);
+      return if_convertible_gimple_assign_stmt_p (stmt, refs,
+						  any_mask_load_store);
 
     case GIMPLE_CALL:
       {
@@ -1069,7 +1172,7 @@ static bool
 if_convertible_loop_p_1 (struct loop *loop,
 			 vec<loop_p> *loop_nest,
 			 vec<data_reference_p> *refs,
-			 vec<ddr_p> *ddrs)
+			 vec<ddr_p> *ddrs, bool *any_mask_load_store)
 {
   bool res;
   unsigned int i;
@@ -1140,14 +1243,11 @@ if_convertible_loop_p_1 (struct loop *lo
       basic_block bb = ifc_bbs[i];
       gimple_stmt_iterator itr;
 
-      for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
-	if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr)))
-	  return false;
-
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+				      any_mask_load_store))
 	    return false;
     }
 
@@ -1155,6 +1255,19 @@ if_convertible_loop_p_1 (struct loop *lo
     for (i = 0; i < loop->num_nodes; i++)
       free_bb_predicate (ifc_bbs[i]);
 
+  /* Checking PHIs needs to be done after stmts, as the fact whether there
+     are any masked loads or stores affects the tests.  */
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = ifc_bbs[i];
+      gimple_stmt_iterator itr;
+
+      for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
+	if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr),
+				   *any_mask_load_store))
+	  return false;
+    }
+
   if (dump_file)
     fprintf (dump_file, "Applying if-conversion\n");
 
@@ -1170,7 +1283,7 @@ if_convertible_loop_p_1 (struct loop *lo
    - if its basic blocks and phi nodes are if convertible.  */
 
 static bool
-if_convertible_loop_p (struct loop *loop)
+if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
 {
   edge e;
   edge_iterator ei;
@@ -1212,7 +1325,8 @@ if_convertible_loop_p (struct loop *loop
   refs.create (5);
   ddrs.create (25);
   loop_nest.create (3);
-  res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs);
+  res = if_convertible_loop_p_1 (loop, &loop_nest, &refs, &ddrs,
+				 any_mask_load_store);
 
   if (flag_tree_loop_if_convert_stores)
     {
@@ -1400,7 +1514,7 @@ predicate_all_scalar_phis (struct loop *
    gimplification of the predicates.  */
 
 static void
-insert_gimplified_predicates (loop_p loop)
+insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 {
   unsigned int i;
 
@@ -1422,7 +1536,8 @@ insert_gimplified_predicates (loop_p loo
       stmts = bb_predicate_gimplified_stmts (bb);
       if (stmts)
 	{
-	  if (flag_tree_loop_if_convert_stores)
+	  if (flag_tree_loop_if_convert_stores
+	      || any_mask_load_store)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1581,9 +1696,49 @@ predicate_mem_writes (loop_p loop)
 	}
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	if ((stmt = gsi_stmt (gsi))
-	    && gimple_assign_single_p (stmt)
-	    && gimple_vdef (stmt))
+	if ((stmt = gsi_stmt (gsi)) == NULL
+	    || !gimple_assign_single_p (stmt))
+	  continue;
+	else if (gimple_plf (stmt, GF_PLF_2))
+	  {
+	    tree lhs = gimple_assign_lhs (stmt);
+	    tree rhs = gimple_assign_rhs1 (stmt);
+	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+	    gimple new_stmt;
+	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
+
+	    masktype = build_nonstandard_integer_type (bitsize, 1);
+	    mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
+	    mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
+	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
+	    addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
+					     true, NULL_TREE, true,
+					     GSI_SAME_STMT);
+	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+					       is_gimple_condexpr, NULL_TREE,
+					       true, GSI_SAME_STMT);
+	    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
+					 mask_op0, mask_op1);
+	    mask = ifc_temp_var (masktype, mask, &gsi);
+	    ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
+	    /* Copy points-to info if possible.  */
+	    if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
+	      copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
+			     ref);
+	    if (TREE_CODE (lhs) == SSA_NAME)
+	      {
+		new_stmt
+		  = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr,
+						ptr, mask);
+		gimple_call_set_lhs (new_stmt, lhs);
+	      }
+	    else
+	      new_stmt
+		= gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr,
+					      mask, rhs);
+	    gsi_replace (&gsi, new_stmt, false);
+	  }
+	else if (gimple_vdef (stmt))
 	  {
 	    tree lhs = gimple_assign_lhs (stmt);
 	    tree rhs = gimple_assign_rhs1 (stmt);
@@ -1653,7 +1808,7 @@ remove_conditions_and_labels (loop_p loo
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (struct loop *loop)
+combine_blocks (struct loop *loop, bool any_mask_load_store)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1663,10 +1818,10 @@ combine_blocks (struct loop *loop)
 
   predicate_bbs (loop);
   remove_conditions_and_labels (loop);
-  insert_gimplified_predicates (loop);
+  insert_gimplified_predicates (loop, any_mask_load_store);
   predicate_all_scalar_phis (loop);
 
-  if (flag_tree_loop_if_convert_stores)
+  if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     predicate_mem_writes (loop);
 
   /* Merge basic blocks: first remove all the edges in the loop,
@@ -1870,23 +2025,29 @@ tree_if_conversion (struct loop *loop)
   unsigned int todo = 0;
   bool version_outer_loop = false;
   ifc_bbs = NULL;
+  bool any_mask_load_store = false;
 
-  if (!if_convertible_loop_p (loop)
+  if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
 
+  if (any_mask_load_store
+      && ((!flag_tree_loop_vectorize && !loop->force_vect)
+	  || loop->dont_vectorize))
+    goto cleanup;
+
   if ((flag_tree_loop_vectorize || loop->force_vect)
-      && flag_tree_loop_if_convert == -1
+      && (flag_tree_loop_if_convert == -1 || any_mask_load_store)
       && !version_loop_for_if_conversion (loop, &version_outer_loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop);
+  combine_blocks (loop, any_mask_load_store);
 
   todo |= TODO_cleanup_cfg;
-  if (flag_tree_loop_if_convert_stores)
+  if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     {
       mark_virtual_operands_for_renaming (cfun);
       todo |= TODO_update_ssa_only_virtuals;
--- gcc/optabs.def.jj	2013-10-23 14:43:09.908919315 +0200
+++ gcc/optabs.def	2013-10-23 18:50:33.296952852 +0200
@@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
+OPTAB_D (maskload_optab, "maskload$a")
+OPTAB_D (maskstore_optab, "maskstore$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
--- gcc/tree-vect-data-refs.c.jj	2013-10-23 14:43:15.986887975 +0200
+++ gcc/tree-vect-data-refs.c	2013-10-23 18:50:33.297952847 +0200
@@ -2747,6 +2747,24 @@ vect_check_gather (gimple stmt, loop_vec
   enum machine_mode pmode;
   int punsignedp, pvolatilep;
 
+  base = DR_REF (dr);
+  /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF,
+     see if we can use the def stmt of the address.  */
+  if (is_gimple_call (stmt)
+      && gimple_call_internal_p (stmt)
+      && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+	  || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+      && TREE_CODE (base) == MEM_REF
+      && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+      && integer_zerop (TREE_OPERAND (base, 1))
+      && !expr_invariant_in_loop_p (loop, TREE_OPERAND (base, 0)))
+    {
+      gimple def_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (base, 0));
+      if (is_gimple_assign (def_stmt)
+	  && gimple_assign_rhs_code (def_stmt) == ADDR_EXPR)
+	base = TREE_OPERAND (gimple_assign_rhs1 (def_stmt), 0);
+    }
+
   /* The gather builtins need address of the form
      loop_invariant + vector * {1, 2, 4, 8}
      or
@@ -2759,7 +2777,7 @@ vect_check_gather (gimple stmt, loop_vec
      vectorized.  The following code attempts to find such a preexistng
      SSA_NAME OFF and put the loop invariants into a tree BASE
      that can be gimplified before the loop.  */
-  base = get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &off,
+  base = get_inner_reference (base, &pbitsize, &pbitpos, &off,
 			      &pmode, &punsignedp, &pvolatilep, false);
   gcc_assert (base != NULL_TREE && (pbitpos % BITS_PER_UNIT) == 0);
 
@@ -3205,7 +3223,10 @@ again:
       offset = unshare_expr (DR_OFFSET (dr));
       init = unshare_expr (DR_INIT (dr));
 
-      if (is_gimple_call (stmt))
+      if (is_gimple_call (stmt)
+	  && (!gimple_call_internal_p (stmt)
+	      || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD
+		  && gimple_call_internal_fn (stmt) != IFN_MASK_STORE)))
 	{
 	  if (dump_enabled_p ())
 	    {
@@ -4856,6 +4877,14 @@ vect_supportable_dr_alignment (struct da
   if (aligned_access_p (dr) && !check_aligned_accesses)
     return dr_aligned;
 
+  /* For now assume all conditional loads/stores support unaligned
+     access without any special code.  */
+  if (is_gimple_call (stmt)
+      && gimple_call_internal_p (stmt)
+      && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+	  || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
+    return dr_unaligned_supported;
+
   if (loop_vinfo)
     {
       vect_loop = LOOP_VINFO_LOOP (loop_vinfo);
--- gcc/tree-vect-loop.c.jj	2013-10-23 14:43:15.984887985 +0200
+++ gcc/tree-vect-loop.c	2013-10-23 18:50:33.299952836 +0200
@@ -364,7 +364,11 @@ vect_determine_vectorization_factor (loo
 		analyze_pattern_stmt = false;
 	    }
 
-	  if (gimple_get_lhs (stmt) == NULL_TREE)
+	  if (gimple_get_lhs (stmt) == NULL_TREE
+	      /* MASK_STORE has no lhs, but is ok.  */
+	      && (!is_gimple_call (stmt)
+		  || !gimple_call_internal_p (stmt)
+		  || gimple_call_internal_fn (stmt) != IFN_MASK_STORE))
 	    {
 	      if (dump_enabled_p ())
 		{
@@ -403,7 +407,12 @@ vect_determine_vectorization_factor (loo
 	  else
 	    {
 	      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
-	      scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+	      if (is_gimple_call (stmt)
+		  && gimple_call_internal_p (stmt)
+		  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+	      else
+		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 	      if (dump_enabled_p ())
 		{
 		  dump_printf_loc (MSG_NOTE, vect_location,
--- gcc/testsuite/gcc.target/i386/avx2-gather-5.c.jj	2013-10-23 18:50:33.299952836 +0200
+++ gcc/testsuite/gcc.target/i386/avx2-gather-5.c	2013-10-23 18:50:33.299952836 +0200
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-O3 -mavx2 -fno-common" } */
+
+#include "avx2-check.h"
+
+#define N 1024
+float vf1[N+16], vf2[N], vf3[N];
+int k[N];
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      float f;
+      if (vf3[i] < 0.0f)
+	f = vf1[k[i]];
+      else
+	f = 7.0f;
+      vf2[i] = f;
+    }
+}
+
+static void
+avx2_test (void)
+{
+  int i;
+  for (i = 0; i < N + 16; i++)
+    {
+      vf1[i] = 5.5f * i;
+      if (i >= N)
+	continue;
+      vf2[i] = 2.0f;
+      vf3[i] = (i & 1) ? i : -i - 1;
+      k[i] = (i & 1) ? ((i & 2) ? -i : N / 2 + i) : (i * 7) % N;
+      asm ("");
+    }
+  foo ();
+  for (i = 0; i < N; i++)
+    if (vf1[i] != 5.5 * i
+	|| vf2[i] != ((i & 1) ? 7.0f : 5.5f * ((i * 7) % N))
+	|| vf3[i] != ((i & 1) ? i : -i - 1)
+	|| k[i] != ((i & 1) ? ((i & 2) ? -i : N / 2 + i) : ((i * 7) % N)))
+      abort ();
+}
--- gcc/testsuite/gcc.target/i386/avx2-gather-6.c.jj	2013-10-23 18:50:33.299952836 +0200
+++ gcc/testsuite/gcc.target/i386/avx2-gather-6.c	2013-10-23 18:50:33.299952836 +0200
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -fno-common -fdump-tree-vect-details" } */
+
+#include "avx2-gather-5.c"
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops in function" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c.jj	2013-10-23 18:50:33.300952831 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c	2013-10-23 18:50:33.300952831 +0200
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Ofast -fno-common" } */
+/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */
+
+#include <stdlib.h>
+#include "tree-vect.h"
+
+__attribute__((noinline, noclone)) void
+foo (float *__restrict x, float *__restrict y, float *__restrict z)
+{
+  float *__restrict p = __builtin_assume_aligned (x, 32);
+  float *__restrict q = __builtin_assume_aligned (y, 32);
+  float *__restrict r = __builtin_assume_aligned (z, 32);
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      if (p[i] < 0.0f)
+	q[i] = p[i] + 2.0f;
+      else
+	p[i] = r[i] + 3.0f;
+    }
+}
+
+float a[1024] __attribute__((aligned (32)));
+float b[1024] __attribute__((aligned (32)));
+float c[1024] __attribute__((aligned (32)));
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  for (i = 0; i < 1024; i++)
+    {
+      a[i] = (i & 1) ? -i : i;
+      b[i] = 7 * i;
+      c[i] = a[i] - 3.0f;
+      asm ("");
+    }
+  foo (a, b, c);
+  for (i = 0; i < 1024; i++)
+    if (a[i] != ((i & 1) ? -i : i)
+	|| b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i)
+	|| c[i] != a[i] - 3.0f)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c.jj	2013-10-23 18:50:33.300952831 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c	2013-10-23 18:50:33.300952831 +0200
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-additional-options "-Ofast -fno-common" } */
+/* { dg-additional-options "-Ofast -fno-common -mavx" { target avx_runtime } } */
+
+#include <stdlib.h>
+#include "tree-vect.h"
+
+__attribute__((noinline, noclone)) void
+foo (double *x, double *y)
+{
+  double *p = __builtin_assume_aligned (x, 16);
+  double *q = __builtin_assume_aligned (y, 16);
+  double z, h;
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      if (p[i] < 0.0)
+        z = q[i], h = q[i] * 7.0 + 3.0;
+      else
+        z = p[i] + 6.0, h = p[1024 + i];
+      p[i] = z + 2.0 * h;
+    }
+}
+
+double a[2048] __attribute__((aligned (16)));
+double b[1024] __attribute__((aligned (16)));
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  for (i = 0; i < 1024; i++)
+    {
+      a[i] = (i & 1) ? -i : 2 * i;
+      a[i + 1024] = i;
+      b[i] = 7 * i;
+      asm ("");
+    }
+  foo (a, b);
+  for (i = 0; i < 1024; i++)
+    if (a[i] != ((i & 1)
+		 ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0)
+		 : 2 * i + 6.0 + 2.0 * i)
+        || b[i] != 7 * i
+        || a[i + 1024] != i)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loops" 1 "vect" { target avx_runtime } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/gimple.h.jj	2013-10-23 14:43:09.916919274 +0200
+++ gcc/gimple.h	2013-10-23 18:50:33.301952826 +0200
@@ -5331,7 +5331,13 @@ gimple_expr_type (const_gimple stmt)
 	 useless conversion involved.  That means returning the
 	 original RHS type as far as we can reconstruct it.  */
       if (code == GIMPLE_CALL)
-	type = gimple_call_return_type (stmt);
+	{
+	  if (gimple_call_internal_p (stmt)
+	      && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	    type = TREE_TYPE (gimple_call_arg (stmt, 3));
+	  else
+	    type = gimple_call_return_type (stmt);
+	}
       else
 	switch (gimple_assign_rhs_code (stmt))
 	  {
--- gcc/tree-vect-stmts.c.jj	2013-10-23 14:43:16.036887717 +0200
+++ gcc/tree-vect-stmts.c	2013-10-23 18:50:33.303952817 +0200
@@ -223,7 +223,7 @@ vect_mark_relevant (vec<gimple> *worklis
           /* This use is out of pattern use, if LHS has other uses that are
              pattern uses, we should mark the stmt itself, and not the pattern
              stmt.  */
-	  if (TREE_CODE (lhs) == SSA_NAME)
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
 	      {
 		if (is_gimple_debug (USE_STMT (use_p)))
@@ -381,7 +381,27 @@ exist_non_indexing_operands_for_use_p (t
      first case, and whether var corresponds to USE.  */
 
   if (!gimple_assign_copy_p (stmt))
-    return false;
+    {
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt))
+	switch (gimple_call_internal_fn (stmt))
+	  {
+	  case IFN_MASK_STORE:
+	    operand = gimple_call_arg (stmt, 3);
+	    if (operand == use)
+	      return true;
+	    /* FALLTHRU */
+	  case IFN_MASK_LOAD:
+	    operand = gimple_call_arg (stmt, 2);
+	    if (operand == use)
+	      return true;
+	    break;
+	  default:
+	    break;
+	  }
+      return false;
+    }
+
   if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
     return false;
   operand = gimple_assign_rhs1 (stmt);
@@ -1709,6 +1729,401 @@ vectorizable_function (gimple call, tree
 						        vectype_in);
 }
 
+
+static tree permute_vec_elements (tree, tree, tree, gimple,
+				  gimple_stmt_iterator *);
+
+
+static bool
+vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
+			      gimple *vec_stmt, slp_tree slp_node)
+{
+  tree vec_dest = NULL;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  stmt_vec_info prev_stmt_info;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree elem_type;
+  gimple new_stmt;
+  tree dummy;
+  tree dataref_ptr = NULL_TREE;
+  gimple ptr_incr;
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  int ncopies;
+  int i, j;
+  bool inv_p;
+  tree gather_base = NULL_TREE, gather_off = NULL_TREE;
+  tree gather_off_vectype = NULL_TREE, gather_decl = NULL_TREE;
+  int gather_scale = 1;
+  enum vect_def_type gather_dt = vect_unknown_def_type;
+  bool is_store;
+  tree mask;
+  gimple def_stmt;
+  tree def;
+  enum vect_def_type dt;
+
+  if (slp_node != NULL)
+    return false;
+
+  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  gcc_assert (ncopies >= 1);
+
+  is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
+  mask = gimple_call_arg (stmt, 2);
+  if (TYPE_PRECISION (TREE_TYPE (mask))
+      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+    return false;
+
+  /* FORNOW. This restriction should be relaxed.  */
+  if (nested_in_vect_loop && ncopies > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "multiple types in nested loop.");
+      return false;
+    }
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+    return false;
+
+  if (!STMT_VINFO_DATA_REF (stmt_info))
+    return false;
+
+  elem_type = TREE_TYPE (vectype);
+
+  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+    return false;
+
+  if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
+    return false;
+
+  if (STMT_VINFO_GATHER_P (stmt_info))
+    {
+      gimple def_stmt;
+      tree def;
+      gather_decl = vect_check_gather (stmt, loop_vinfo, &gather_base,
+				       &gather_off, &gather_scale);
+      gcc_assert (gather_decl);
+      if (!vect_is_simple_use_1 (gather_off, NULL, loop_vinfo, NULL,
+				 &def_stmt, &def, &gather_dt,
+				 &gather_off_vectype))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "gather index use not simple.");
+	  return false;
+	}
+    }
+  else if (tree_int_cst_compare (nested_in_vect_loop
+				 ? STMT_VINFO_DR_STEP (stmt_info)
+				 : DR_STEP (dr), size_zero_node) < 0)
+    return false;
+  else if (optab_handler (is_store ? maskstore_optab : maskload_optab,
+			  TYPE_MODE (vectype)) == CODE_FOR_nothing)
+    return false;
+
+  if (TREE_CODE (mask) != SSA_NAME)
+    return false;
+
+  if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
+			   &def_stmt, &def, &dt))
+    return false;
+
+  if (is_store)
+    {
+      tree rhs = gimple_call_arg (stmt, 3);
+      if (!vect_is_simple_use (rhs, stmt, loop_vinfo, NULL,
+			       &def_stmt, &def, &dt))
+	return false;
+    }
+
+  if (!vec_stmt) /* transformation not required.  */
+    {
+      STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+      return true;
+    }
+
+  /** Transform.  **/
+
+  if (STMT_VINFO_GATHER_P (stmt_info))
+    {
+      tree vec_oprnd0 = NULL_TREE, op;
+      tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gather_decl));
+      tree rettype, srctype, ptrtype, idxtype, masktype, scaletype;
+      tree ptr, vec_mask = NULL_TREE, mask_op, var, scale;
+      tree perm_mask = NULL_TREE, prev_res = NULL_TREE;
+      edge pe = loop_preheader_edge (loop);
+      gimple_seq seq;
+      basic_block new_bb;
+      enum { NARROW, NONE, WIDEN } modifier;
+      int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gather_off_vectype);
+
+      if (nunits == gather_off_nunits)
+	modifier = NONE;
+      else if (nunits == gather_off_nunits / 2)
+	{
+	  unsigned char *sel = XALLOCAVEC (unsigned char, gather_off_nunits);
+	  modifier = WIDEN;
+
+	  for (i = 0; i < gather_off_nunits; ++i)
+	    sel[i] = i | nunits;
+
+	  perm_mask = vect_gen_perm_mask (gather_off_vectype, sel);
+	  gcc_assert (perm_mask != NULL_TREE);
+	}
+      else if (nunits == gather_off_nunits * 2)
+	{
+	  unsigned char *sel = XALLOCAVEC (unsigned char, nunits);
+	  modifier = NARROW;
+
+	  for (i = 0; i < nunits; ++i)
+	    sel[i] = i < gather_off_nunits
+		     ? i : i + nunits - gather_off_nunits;
+
+	  perm_mask = vect_gen_perm_mask (vectype, sel);
+	  gcc_assert (perm_mask != NULL_TREE);
+	  ncopies *= 2;
+	}
+      else
+	gcc_unreachable ();
+
+      rettype = TREE_TYPE (TREE_TYPE (gather_decl));
+      srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
+      scaletype = TREE_VALUE (arglist);
+      gcc_checking_assert (types_compatible_p (srctype, rettype)
+			   && types_compatible_p (srctype, masktype));
+
+      vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype);
+
+      ptr = fold_convert (ptrtype, gather_base);
+      if (!is_gimple_min_invariant (ptr))
+	{
+	  ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE);
+	  new_bb = gsi_insert_seq_on_edge_immediate (pe, seq);
+	  gcc_assert (!new_bb);
+	}
+
+      scale = build_int_cst (scaletype, gather_scale);
+
+      prev_stmt_info = NULL;
+      for (j = 0; j < ncopies; ++j)
+	{
+	  if (modifier == WIDEN && (j & 1))
+	    op = permute_vec_elements (vec_oprnd0, vec_oprnd0,
+				       perm_mask, stmt, gsi);
+	  else if (j == 0)
+	    op = vec_oprnd0
+	      = vect_get_vec_def_for_operand (gather_off, stmt, NULL);
+	  else
+	    op = vec_oprnd0
+	      = vect_get_vec_def_for_stmt_copy (gather_dt, vec_oprnd0);
+
+	  if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
+	    {
+	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
+			  == TYPE_VECTOR_SUBPARTS (idxtype));
+	      var = vect_get_new_vect_var (idxtype, vect_simple_var, NULL);
+	      var = make_ssa_name (var, NULL);
+	      op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
+	      new_stmt
+		= gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var,
+						op, NULL_TREE);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      op = var;
+	    }
+
+	  if (j == 0)
+	    vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+	  else
+	    {
+	      vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+				  &def, &dt);
+	      vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+	    }
+
+	  mask_op = vec_mask;
+	  if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask)))
+	    {
+	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op))
+			  == TYPE_VECTOR_SUBPARTS (masktype));
+	      var = vect_get_new_vect_var (masktype, vect_simple_var, NULL);
+	      var = make_ssa_name (var, NULL);
+	      mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op);
+	      new_stmt
+		= gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var,
+						mask_op, NULL_TREE);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      mask_op = var;
+	    }
+
+	  new_stmt
+	    = gimple_build_call (gather_decl, 5, mask_op, ptr, op, mask_op,
+				 scale);
+
+	  if (!useless_type_conversion_p (vectype, rettype))
+	    {
+	      gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
+			  == TYPE_VECTOR_SUBPARTS (rettype));
+	      var = vect_get_new_vect_var (rettype, vect_simple_var, NULL);
+	      op = make_ssa_name (var, new_stmt);
+	      gimple_call_set_lhs (new_stmt, op);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      var = make_ssa_name (vec_dest, NULL);
+	      op = build1 (VIEW_CONVERT_EXPR, vectype, op);
+	      new_stmt
+		= gimple_build_assign_with_ops (VIEW_CONVERT_EXPR, var, op,
+						NULL_TREE);
+	    }
+	  else
+	    {
+	      var = make_ssa_name (vec_dest, new_stmt);
+	      gimple_call_set_lhs (new_stmt, var);
+	    }
+
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+	  if (modifier == NARROW)
+	    {
+	      if ((j & 1) == 0)
+		{
+		  prev_res = var;
+		  continue;
+		}
+	      var = permute_vec_elements (prev_res, var,
+					  perm_mask, stmt, gsi);
+	      new_stmt = SSA_NAME_DEF_STMT (var);
+	    }
+
+	  if (prev_stmt_info == NULL)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+      return true;
+    }
+  else if (is_store)
+    {
+      tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
+      prev_stmt_info = NULL;
+      for (i = 0; i < ncopies; i++)
+	{
+	  unsigned align, misalign;
+
+	  if (i == 0)
+	    {
+	      tree rhs = gimple_call_arg (stmt, 3);
+	      vec_rhs = vect_get_vec_def_for_operand (rhs, stmt, NULL);
+	      vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+	      /* We should have catched mismatched types earlier.  */
+	      gcc_assert (useless_type_conversion_p (vectype,
+						     TREE_TYPE (vec_rhs)));
+	      dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL,
+						      NULL_TREE, &dummy, gsi,
+						      &ptr_incr, false, &inv_p);
+	      gcc_assert (!inv_p);
+	    }
+	  else
+	    {
+	      vect_is_simple_use (vec_rhs, NULL, loop_vinfo, NULL, &def_stmt,
+				  &def, &dt);
+	      vec_rhs = vect_get_vec_def_for_stmt_copy (dt, vec_rhs);
+	      vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+				  &def, &dt);
+	      vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+					     TYPE_SIZE_UNIT (vectype));
+	    }
+
+	  align = TYPE_ALIGN_UNIT (vectype);
+	  if (aligned_access_p (dr))
+	    misalign = 0;
+	  else if (DR_MISALIGNMENT (dr) == -1)
+	    {
+	      align = TYPE_ALIGN_UNIT (elem_type);
+	      misalign = 0;
+	    }
+	  else
+	    misalign = DR_MISALIGNMENT (dr);
+	  set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
+				  misalign);
+	  new_stmt
+	    = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr,
+					  gimple_call_arg (stmt, 1),
+					  vec_mask, vec_rhs);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (i == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+    }
+  else
+    {
+      tree vec_mask = NULL_TREE;
+      prev_stmt_info = NULL;
+      vec_dest = vect_create_destination_var (gimple_call_lhs (stmt), vectype);
+      for (i = 0; i < ncopies; i++)
+	{
+	  unsigned align, misalign;
+
+	  if (i == 0)
+	    {
+	      vec_mask = vect_get_vec_def_for_operand (mask, stmt, NULL);
+	      dataref_ptr = vect_create_data_ref_ptr (stmt, vectype, NULL,
+						      NULL_TREE, &dummy, gsi,
+						      &ptr_incr, false, &inv_p);
+	      gcc_assert (!inv_p);
+	    }
+	  else
+	    {
+	      vect_is_simple_use (vec_mask, NULL, loop_vinfo, NULL, &def_stmt,
+				  &def, &dt);
+	      vec_mask = vect_get_vec_def_for_stmt_copy (dt, vec_mask);
+	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+					     TYPE_SIZE_UNIT (vectype));
+	    }
+
+	  align = TYPE_ALIGN_UNIT (vectype);
+	  if (aligned_access_p (dr))
+	    misalign = 0;
+	  else if (DR_MISALIGNMENT (dr) == -1)
+	    {
+	      align = TYPE_ALIGN_UNIT (elem_type);
+	      misalign = 0;
+	    }
+	  else
+	    misalign = DR_MISALIGNMENT (dr);
+	  set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
+				  misalign);
+	  new_stmt
+	    = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr,
+					  gimple_call_arg (stmt, 1),
+					  vec_mask);
+	  gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest, NULL));
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (i == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+    }
+
+  return true;
+}
+
+
 /* Function vectorizable_call.
 
    Check if STMT performs a function call that can be vectorized.
@@ -1751,10 +2166,16 @@ vectorizable_call (gimple stmt, gimple_s
   if (!is_gimple_call (stmt))
     return false;
 
-  if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
+  if (stmt_can_throw_internal (stmt))
     return false;
 
-  if (stmt_can_throw_internal (stmt))
+  if (gimple_call_internal_p (stmt)
+      && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
+	  || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
+    return vectorizable_mask_load_store (stmt, gsi, vec_stmt,
+					 slp_node);
+
+  if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
     return false;
 
   vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -3474,10 +3895,6 @@ vectorizable_shift (gimple stmt, gimple_
 }
 
 
-static tree permute_vec_elements (tree, tree, tree, gimple,
-				  gimple_stmt_iterator *);
-
-
 /* Function vectorizable_operation.
 
    Check if STMT performs a binary, unary or ternary operation that can
@@ -5988,6 +6405,10 @@ vect_transform_stmt (gimple stmt, gimple
     case call_vec_info_type:
       done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
       stmt = gsi_stmt (*gsi);
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+	is_store = true;
       break;
 
     case reduc_vec_info_type:
--- gcc/internal-fn.def.jj	2013-10-23 18:29:24.188348927 +0200
+++ gcc/internal-fn.def	2013-10-23 18:50:33.304952811 +0200
@@ -44,3 +44,5 @@ DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOV
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF)
+DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF)

Comments

Jakub Jelinek Oct. 24, 2013, 11:14 a.m. UTC | #1
On Wed, Oct 23, 2013 at 07:22:20PM +0200, Jakub Jelinek wrote:
> On Tue, Oct 22, 2013 at 08:27:54PM +0400, Sergey Ostanevich wrote:
> > still fails on 403 et al.
> 
> Ok, reproduced, unfortunately the pending stmt sequences already pretty much
> assume that they will end up in a single combined basic block.  I went
> through various alternatives (deferring update_ssa (TODO_update_ssa) call
> until after combine_blocks - doesn't work, because it is unhappy about
> basic blocks being removed, temporarily putting all the stmts into latch
> (doesn't work, because there are no PHIs for it in the loop), so the final
> fix as discussed with Richard on IRC is not to predicate_bbs early before
> versioning (unless -ftree-loop-if-convert-stores it is easily achievable
> by just using a better dominance check for that), or for the stores stuff
> doing it and freeing again (at least for now).
> 
> The predicate_bbs stuff would certainly appreciate more TLC in the future.
> 
> Attaching whole new patchset, the above mentioned fix is mostly in the
> first patch (which also contains a tree-cfg.h include that is needed for
> today's header reshufling), the other two patches are just tweaked to
> apply on top of that.  All 3 patches together have been
> bootstrapped/regtested on x86_64-linux and i686-linux, the first one
> and first+second just compile time tested.

BTW, to test the effect of not disabling if-conversion for non-vectorized
loops unless there were conditional loads/stores, instead of patching
you could just test with additional -ftree-loop-if-convert I guess.

	Jakub
diff mbox

Patch

--- gcc/tree-vectorizer.h.jj	2013-10-23 14:43:09.667920558 +0200
+++ gcc/tree-vectorizer.h	2013-10-23 18:29:24.187348942 +0200
@@ -314,6 +314,10 @@  typedef struct _loop_vec_info {
      fix it up.  */
   bool operands_swapped;
 
+  /* If if-conversion versioned this loop before conversion, this is the
+     loop version without if-conversion.  */
+  struct loop *scalar_loop;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -345,6 +349,7 @@  typedef struct _loop_vec_info {
 #define LOOP_VINFO_TARGET_COST_DATA(L)     (L)->target_cost_data
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_OPERANDS_SWAPPED(L)     (L)->operands_swapped
+#define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
 (L)->may_misalign_stmts.length () > 0
@@ -899,7 +904,8 @@  extern LOC vect_location;
    in tree-vect-loop-manip.c.  */
 extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree);
 extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
-struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, edge);
+struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
+						     struct loop *, edge);
 extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
 extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *,
 					    unsigned int, bool);
--- gcc/internal-fn.def.jj	2013-10-23 14:43:09.560921110 +0200
+++ gcc/internal-fn.def	2013-10-23 18:29:24.188348927 +0200
@@ -43,3 +43,4 @@  DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
--- gcc/tree-if-conv.c.jj	2013-10-23 14:43:15.981888001 +0200
+++ gcc/tree-if-conv.c	2013-10-23 18:38:00.739772777 +0200
@@ -184,39 +184,48 @@  init_bb_predicate (basic_block bb)
   set_bb_predicate (bb, boolean_true_node);
 }
 
-/* Free the predicate of basic block BB.  */
+/* Release the SSA_NAMEs associated with the predicate of basic block BB,
+   but don't actually free it.  */
 
 static inline void
-free_bb_predicate (basic_block bb)
+release_bb_predicate (basic_block bb)
 {
-  gimple_seq stmts;
-
-  if (!bb_has_predicate (bb))
-    return;
-
-  /* Release the SSA_NAMEs created for the gimplification of the
-     predicate.  */
-  stmts = bb_predicate_gimplified_stmts (bb);
+  gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
   if (stmts)
     {
       gimple_stmt_iterator i;
 
       for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i))
 	free_stmt_operands (gsi_stmt (i));
+      set_bb_predicate_gimplified_stmts (bb, NULL);
     }
+}
 
+/* Free the predicate of basic block BB.  */
+
+static inline void
+free_bb_predicate (basic_block bb)
+{
+  if (!bb_has_predicate (bb))
+    return;
+
+  release_bb_predicate (bb);
   free (bb->aux);
   bb->aux = NULL;
 }
 
-/* Free the predicate of BB and reinitialize it with the true
-   predicate.  */
+/* Reinitialize predicate of BB with the true predicate.  */
 
 static inline void
 reset_bb_predicate (basic_block bb)
 {
-  free_bb_predicate (bb);
-  init_bb_predicate (bb);
+  if (!bb_has_predicate (bb))
+    init_bb_predicate (bb);
+  else
+    {
+      release_bb_predicate (bb);
+      set_bb_predicate (bb, boolean_true_node);
+    }
 }
 
 /* Returns a new SSA_NAME of type TYPE that is assigned the value of
@@ -974,7 +983,7 @@  get_loop_body_in_if_conv_order (const st
    S1 will be predicated with "x", and
    S2 will be predicated with "!x".  */
 
-static bool
+static void
 predicate_bbs (loop_p loop)
 {
   unsigned int i;
@@ -986,7 +995,7 @@  predicate_bbs (loop_p loop)
     {
       basic_block bb = ifc_bbs[i];
       tree cond;
-      gimple_stmt_iterator itr;
+      gimple stmt;
 
       /* The loop latch is always executed and has no extra conditions
 	 to be processed: skip it.  */
@@ -996,53 +1005,38 @@  predicate_bbs (loop_p loop)
 	  continue;
 	}
 
+      /* If dominance tells us this basic block is always executed, force
+	 the condition to be true, this might help simplify other
+	 conditions.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	reset_bb_predicate (bb);
       cond = bb_predicate (bb);
-
-      for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
+      stmt = last_stmt (bb);
+      if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  gimple stmt = gsi_stmt (itr);
-
-	  switch (gimple_code (stmt))
-	    {
-	    case GIMPLE_LABEL:
-	    case GIMPLE_ASSIGN:
-	    case GIMPLE_CALL:
-	    case GIMPLE_DEBUG:
-	      break;
-
-	    case GIMPLE_COND:
-	      {
-		tree c2;
-		edge true_edge, false_edge;
-		location_t loc = gimple_location (stmt);
-		tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-					  boolean_type_node,
-					  gimple_cond_lhs (stmt),
-					  gimple_cond_rhs (stmt));
-
-		/* Add new condition into destination's predicate list.  */
-		extract_true_false_edges_from_block (gimple_bb (stmt),
-						     &true_edge, &false_edge);
-
-		/* If C is true, then TRUE_EDGE is taken.  */
-		add_to_dst_predicate_list (loop, true_edge,
-					   unshare_expr (cond),
-					   unshare_expr (c));
-
-		/* If C is false, then FALSE_EDGE is taken.  */
-		c2 = build1_loc (loc, TRUTH_NOT_EXPR,
-				 boolean_type_node, unshare_expr (c));
-		add_to_dst_predicate_list (loop, false_edge,
-					   unshare_expr (cond), c2);
+	  tree c2;
+	  edge true_edge, false_edge;
+	  location_t loc = gimple_location (stmt);
+	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+				    boolean_type_node,
+				    gimple_cond_lhs (stmt),
+				    gimple_cond_rhs (stmt));
+
+	  /* Add new condition into destination's predicate list.  */
+	  extract_true_false_edges_from_block (gimple_bb (stmt),
+					       &true_edge, &false_edge);
+
+	  /* If C is true, then TRUE_EDGE is taken.  */
+	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
+				     unshare_expr (c));
+
+	  /* If C is false, then FALSE_EDGE is taken.  */
+	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			   unshare_expr (c));
+	  add_to_dst_predicate_list (loop, false_edge,
+				     unshare_expr (cond), c2);
 
-		cond = NULL_TREE;
-		break;
-	      }
-
-	    default:
-	      /* Not handled yet in if-conversion.  */
-	      return false;
-	    }
+	  cond = NULL_TREE;
 	}
 
       /* If current bb has only one successor, then consider it as an
@@ -1065,8 +1059,6 @@  predicate_bbs (loop_p loop)
   reset_bb_predicate (loop->header);
   gcc_assert (bb_predicate_gimplified_stmts (loop->header) == NULL
 	      && bb_predicate_gimplified_stmts (loop->latch) == NULL);
-
-  return true;
 }
 
 /* Return true when LOOP is if-convertible.  This is a helper function
@@ -1111,9 +1103,24 @@  if_convertible_loop_p_1 (struct loop *lo
 	exit_bb = bb;
     }
 
-  res = predicate_bbs (loop);
-  if (!res)
-    return false;
+  for (i = 0; i < loop->num_nodes; i++)
+    {
+      basic_block bb = ifc_bbs[i];
+      gimple_stmt_iterator gsi;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	switch (gimple_code (gsi_stmt (gsi)))
+	  {
+	  case GIMPLE_LABEL:
+	  case GIMPLE_ASSIGN:
+	  case GIMPLE_CALL:
+	  case GIMPLE_DEBUG:
+	  case GIMPLE_COND:
+	    break;
+	  default:
+	    return false;
+	  }
+    }
 
   if (flag_tree_loop_if_convert_stores)
     {
@@ -1125,6 +1132,7 @@  if_convertible_loop_p_1 (struct loop *lo
 	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
 	  DR_RW_UNCONDITIONALLY (dr) = -1;
 	}
+      predicate_bbs (loop);
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -1137,12 +1145,16 @@  if_convertible_loop_p_1 (struct loop *lo
 	  return false;
 
       /* Check the if-convertibility of statements in predicated BBs.  */
-      if (is_predicated (bb))
+      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
 	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
 	    return false;
     }
 
+  if (flag_tree_loop_if_convert_stores)
+    for (i = 0; i < loop->num_nodes; i++)
+      free_bb_predicate (ifc_bbs[i]);
+
   if (dump_file)
     fprintf (dump_file, "Applying if-conversion\n");
 
@@ -1397,7 +1409,8 @@  insert_gimplified_predicates (loop_p loo
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      if (!is_predicated (bb)
+	  || dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1648,6 +1661,7 @@  combine_blocks (struct loop *loop)
   edge e;
   edge_iterator ei;
 
+  predicate_bbs (loop);
   remove_conditions_and_labels (loop);
   insert_gimplified_predicates (loop);
   predicate_all_scalar_phis (loop);
@@ -1742,28 +1756,72 @@  combine_blocks (struct loop *loop)
   ifc_bbs = NULL;
 }
 
-/* If-convert LOOP when it is legal.  For the moment this pass has no
-   profitability analysis.  Returns true when something changed.  */
+/* Version LOOP before if-converting it, the original loop
+   will be then if-converted, the new copy of the loop will not,
+   and the LOOP_VECTORIZED internal call will be guarding which
+   loop to execute.  The vectorizer pass will fold this
+   internal call into either true or false.  */
 
 static bool
+version_loop_for_if_conversion (struct loop *loop)
+{
+  basic_block cond_bb;
+  tree cond = make_ssa_name (boolean_type_node, NULL);
+  struct loop *new_loop;
+  gimple g;
+  gimple_stmt_iterator gsi;
+
+  g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
+				  build_int_cst (integer_type_node, loop->num),
+				  integer_zero_node);
+  gimple_call_set_lhs (g, cond);
+
+  initialize_original_copy_tables ();
+  new_loop = loop_version (loop, cond, &cond_bb,
+			   REG_BR_PROB_BASE, REG_BR_PROB_BASE,
+			   REG_BR_PROB_BASE, true);
+  free_original_copy_tables ();
+  if (new_loop == NULL)
+    return false;
+  new_loop->dont_vectorize = true;
+  new_loop->force_vect = false;
+  gsi = gsi_last_bb (cond_bb);
+  gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num));
+  gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+  update_ssa (TODO_update_ssa);
+  return true;
+}
+
+/* If-convert LOOP when it is legal.  For the moment this pass has no
+   profitability analysis.  Returns non-zero todo flags when something
+   changed.  */
+
+static unsigned int
 tree_if_conversion (struct loop *loop)
 {
-  bool changed = false;
+  unsigned int todo = 0;
   ifc_bbs = NULL;
 
   if (!if_convertible_loop_p (loop)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
 
+  if ((flag_tree_loop_vectorize || loop->force_vect)
+      && flag_tree_loop_if_convert == -1
+      && !version_loop_for_if_conversion (loop))
+    goto cleanup;
+
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
   combine_blocks (loop);
 
+  todo |= TODO_cleanup_cfg;
   if (flag_tree_loop_if_convert_stores)
-    mark_virtual_operands_for_renaming (cfun);
-
-  changed = true;
+    {
+      mark_virtual_operands_for_renaming (cfun);
+      todo |= TODO_update_ssa_only_virtuals;
+    }
 
  cleanup:
   if (ifc_bbs)
@@ -1777,7 +1835,7 @@  tree_if_conversion (struct loop *loop)
       ifc_bbs = NULL;
     }
 
-  return changed;
+  return todo;
 }
 
 /* Tree if-conversion pass management.  */
@@ -1787,7 +1845,6 @@  main_tree_if_conversion (void)
 {
   loop_iterator li;
   struct loop *loop;
-  bool changed = false;
   unsigned todo = 0;
 
   if (number_of_loops (cfun) <= 1)
@@ -1795,16 +1852,9 @@  main_tree_if_conversion (void)
 
   FOR_EACH_LOOP (li, loop, 0)
     if (flag_tree_loop_if_convert == 1
-	|| flag_tree_loop_if_convert_stores == 1
-	|| flag_tree_loop_vectorize
-	|| loop->force_vect)
-    changed |= tree_if_conversion (loop);
-
-  if (changed)
-    todo |= TODO_cleanup_cfg;
-
-  if (changed && flag_tree_loop_if_convert_stores)
-    todo |= TODO_update_ssa_only_virtuals;
+	|| ((flag_tree_loop_vectorize || loop->force_vect)
+	    && !loop->dont_vectorize))
+      todo |= tree_if_conversion (loop);
 
 #ifdef ENABLE_CHECKING
   {
@@ -1824,8 +1874,7 @@  gate_tree_if_conversion (void)
 {
   return (((flag_tree_loop_vectorize || cfun->has_force_vect_loops)
 	   && flag_tree_loop_if_convert != 0)
-	  || flag_tree_loop_if_convert == 1
-	  || flag_tree_loop_if_convert_stores == 1);
+	  || flag_tree_loop_if_convert == 1);
 }
 
 namespace {
--- gcc/internal-fn.c.jj	2013-10-23 14:43:09.579921012 +0200
+++ gcc/internal-fn.c	2013-10-23 18:29:24.189348915 +0200
@@ -133,6 +133,14 @@  expand_GOMP_SIMD_LAST_LANE (gimple stmt
   gcc_unreachable ();
 }
 
+/* This should get folded in tree-vectorizer.c.  */
+
+static void
+expand_LOOP_VECTORIZED (gimple stmt ATTRIBUTE_UNUSED)
+{
+  gcc_unreachable ();
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
--- gcc/tree-vectorizer.c.jj	2013-10-23 14:43:15.978888016 +0200
+++ gcc/tree-vectorizer.c	2013-10-23 18:30:29.914021368 +0200
@@ -68,6 +68,7 @@  along with GCC; see the file COPYING3.
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "tree-ssa-loop.h"
+#include "tree-cfg.h"
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "tree-pass.h"
@@ -311,6 +312,43 @@  vect_destroy_datarefs (loop_vec_info loo
 }
 
 
+/* If LOOP has been versioned during ifcvt, return the internal call
+   guarding it.  */
+
+static gimple
+vect_loop_vectorized_call (struct loop *loop)
+{
+  basic_block bb = loop_preheader_edge (loop)->src;
+  gimple g;
+  do
+    {
+      g = last_stmt (bb);
+      if (g)
+	break;
+      if (!single_pred_p (bb))
+	break;
+      bb = single_pred (bb);
+    }
+  while (1);
+  if (g && gimple_code (g) == GIMPLE_COND)
+    {
+      gimple_stmt_iterator gsi = gsi_for_stmt (g);
+      gsi_prev (&gsi);
+      if (!gsi_end_p (gsi))
+	{
+	  g = gsi_stmt (gsi);
+	  if (is_gimple_call (g)
+	      && gimple_call_internal_p (g)
+	      && gimple_call_internal_fn (g) == IFN_LOOP_VECTORIZED
+	      && (tree_low_cst (gimple_call_arg (g, 0), 0) == loop->num
+		  || tree_low_cst (gimple_call_arg (g, 1), 0) == loop->num))
+	    return g;
+	}
+    }
+  return NULL;
+}
+
+
 /* Function vectorize_loops.
 
    Entry point to loop vectorization phase.  */
@@ -325,6 +363,8 @@  vectorize_loops (void)
   struct loop *loop;
   hash_table <simduid_to_vf> simduid_to_vf_htab;
   hash_table <simd_array_to_simduid> simd_array_to_simduid_htab;
+  bool any_ifcvt_loops = false;
+  unsigned ret = 0;
 
   vect_loops_num = number_of_loops (cfun);
 
@@ -347,8 +387,11 @@  vectorize_loops (void)
      than all previously defined loops.  This fact allows us to run
      only over initial loops skipping newly generated ones.  */
   FOR_EACH_LOOP (li, loop, 0)
-    if ((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop))
-	|| loop->force_vect)
+    if (loop->dont_vectorize)
+      any_ifcvt_loops = true;
+    else if ((flag_tree_loop_vectorize
+	      && optimize_loop_nest_for_speed_p (loop))
+	     || loop->force_vect)
       {
 	loop_vec_info loop_vinfo;
 	vect_location = find_loop_location (loop);
@@ -366,6 +409,38 @@  vectorize_loops (void)
         if (!dbg_cnt (vect_loop))
 	  break;
 
+	gimple loop_vectorized_call = vect_loop_vectorized_call (loop);
+	if (loop_vectorized_call)
+	  {
+	    tree arg = gimple_call_arg (loop_vectorized_call, 1);
+	    basic_block *bbs;
+	    unsigned int i;
+	    struct loop *scalar_loop = get_loop (cfun, tree_low_cst (arg, 0));
+
+	    LOOP_VINFO_SCALAR_LOOP (loop_vinfo) = scalar_loop;
+	    gcc_checking_assert (vect_loop_vectorized_call
+					(LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
+				 == loop_vectorized_call);
+	    bbs = get_loop_body (scalar_loop);
+	    for (i = 0; i < scalar_loop->num_nodes; i++)
+	      {
+		basic_block bb = bbs[i];
+		gimple_stmt_iterator gsi;
+		for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+		     gsi_next (&gsi))
+		  {
+		    gimple phi = gsi_stmt (gsi);
+		    gimple_set_uid (phi, 0);
+		  }
+		for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+		     gsi_next (&gsi))
+		  {
+		    gimple stmt = gsi_stmt (gsi);
+		    gimple_set_uid (stmt, 0);
+		  }
+	      }
+	    free (bbs);
+	  }
         if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC
 	    && dump_enabled_p ())
           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
@@ -386,6 +461,26 @@  vectorize_loops (void)
 	    *simduid_to_vf_htab.find_slot (simduid_to_vf_data, INSERT)
 	      = simduid_to_vf_data;
 	  }
+
+	if (loop_vectorized_call)
+	  {
+	    gimple g = loop_vectorized_call;
+	    tree lhs = gimple_call_lhs (g);
+	    gimple_stmt_iterator gsi = gsi_for_stmt (g);
+	    gimplify_and_update_call_from_tree (&gsi, boolean_true_node);
+	    gsi_next (&gsi);
+	    if (!gsi_end_p (gsi))
+	      {
+		g = gsi_stmt (gsi);
+		if (gimple_code (g) == GIMPLE_COND
+		    && gimple_cond_lhs (g) == lhs)
+		  {
+		    gimple_cond_set_lhs (g, boolean_true_node);
+		    update_stmt (g);
+		    ret |= TODO_cleanup_cfg;
+		  }
+	      }
+	  }
       }
 
   vect_location = UNKNOWN_LOC;
@@ -399,6 +494,34 @@  vectorize_loops (void)
 
   /*  ----------- Finalize. -----------  */
 
+  if (any_ifcvt_loops)
+    for (i = 1; i < vect_loops_num; i++)
+      {
+	loop = get_loop (cfun, i);
+	if (loop && loop->dont_vectorize)
+	  {
+	    gimple g = vect_loop_vectorized_call (loop);
+	    if (g)
+	      {
+		tree lhs = gimple_call_lhs (g);
+		gimple_stmt_iterator gsi = gsi_for_stmt (g);
+		gimplify_and_update_call_from_tree (&gsi, boolean_false_node);
+		gsi_next (&gsi);
+		if (!gsi_end_p (gsi))
+		  {
+		    g = gsi_stmt (gsi);
+		    if (gimple_code (g) == GIMPLE_COND
+			&& gimple_cond_lhs (g) == lhs)
+		      {
+			gimple_cond_set_lhs (g, boolean_false_node);
+			update_stmt (g);
+			ret |= TODO_cleanup_cfg;
+		      }
+		  }
+	      }
+	  }
+      }
+
   for (i = 1; i < vect_loops_num; i++)
     {
       loop_vec_info loop_vinfo;
@@ -456,7 +579,7 @@  vectorize_loops (void)
       return TODO_cleanup_cfg;
     }
 
-  return 0;
+  return ret;
 }
 
 
--- gcc/tree-vect-loop-manip.c.jj	2013-10-23 14:43:12.791904450 +0200
+++ gcc/tree-vect-loop-manip.c	2013-10-23 18:29:24.190348902 +0200
@@ -696,12 +696,42 @@  slpeel_make_loop_iterate_ntimes (struct
   loop->nb_iterations = niters;
 }
 
+/* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg.
+   For all PHI arguments in FROM->dest and TO->dest from those
+   edges ensure that TO->dest PHI arguments have current_def
+   to that in from.  */
+
+static void
+slpeel_duplicate_current_defs_from_edges (edge from, edge to)
+{
+  gimple_stmt_iterator gsi_from, gsi_to;
+
+  for (gsi_from = gsi_start_phis (from->dest),
+       gsi_to = gsi_start_phis (to->dest);
+       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+       gsi_next (&gsi_from), gsi_next (&gsi_to))
+    {
+      gimple from_phi = gsi_stmt (gsi_from);
+      gimple to_phi = gsi_stmt (gsi_to);
+      tree from_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, from);
+      tree to_arg = PHI_ARG_DEF_FROM_EDGE (to_phi, to);
+      if (TREE_CODE (from_arg) == SSA_NAME
+	  && TREE_CODE (to_arg) == SSA_NAME
+	  && get_current_def (to_arg) == NULL_TREE)
+	set_current_def (to_arg, get_current_def (from_arg));
+    }
+}
+
 
 /* Given LOOP this function generates a new copy of it and puts it
-   on E which is either the entry or exit of LOOP.  */
+   on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
+   non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
+   basic blocks from SCALAR_LOOP instead of LOOP, but to either the
+   entry or exit of LOOP.  */
 
 struct loop *
-slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop, edge e)
+slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
+					struct loop *scalar_loop, edge e)
 {
   struct loop *new_loop;
   basic_block *new_bbs, *bbs;
@@ -715,19 +745,22 @@  slpeel_tree_duplicate_loop_to_edge_cfg (
   if (!at_exit && e != loop_preheader_edge (loop))
     return NULL;
 
-  bbs = XNEWVEC (basic_block, loop->num_nodes + 1);
-  get_loop_body_with_size (loop, bbs, loop->num_nodes);
+  if (scalar_loop == NULL)
+    scalar_loop = loop;
+
+  bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
+  get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
 
   /* Check whether duplication is possible.  */
-  if (!can_copy_bbs_p (bbs, loop->num_nodes))
+  if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
       free (bbs);
       return NULL;
     }
 
   /* Generate new loop structure.  */
-  new_loop = duplicate_loop (loop, loop_outer (loop));
-  duplicate_subloops (loop, new_loop);
+  new_loop = duplicate_loop (scalar_loop, loop_outer (scalar_loop));
+  duplicate_subloops (scalar_loop, new_loop);
 
   exit_dest = exit->dest;
   was_imm_dom = (get_immediate_dominator (CDI_DOMINATORS,
@@ -737,35 +770,80 @@  slpeel_tree_duplicate_loop_to_edge_cfg (
   /* Also copy the pre-header, this avoids jumping through hoops to
      duplicate the loop entry PHI arguments.  Create an empty
      pre-header unconditionally for this.  */
-  basic_block preheader = split_edge (loop_preheader_edge (loop));
+  basic_block preheader = split_edge (loop_preheader_edge (scalar_loop));
   edge entry_e = single_pred_edge (preheader);
-  bbs[loop->num_nodes] = preheader;
-  new_bbs = XNEWVEC (basic_block, loop->num_nodes + 1);
+  bbs[scalar_loop->num_nodes] = preheader;
+  new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  copy_bbs (bbs, loop->num_nodes + 1, new_bbs,
+  exit = single_exit (scalar_loop);
+  copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
 	    &exit, 1, &new_exit, NULL,
 	    e->src, true);
-  basic_block new_preheader = new_bbs[loop->num_nodes];
+  exit = single_exit (loop);
+  basic_block new_preheader = new_bbs[scalar_loop->num_nodes];
 
-  add_phi_args_after_copy (new_bbs, loop->num_nodes + 1, NULL);
+  add_phi_args_after_copy (new_bbs, scalar_loop->num_nodes + 1, NULL);
+
+  if (scalar_loop != loop)
+    {
+      /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
+	 SCALAR_LOOP will have current_def set to SSA_NAMEs in the new_loop,
+	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
+	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
+	 header) to have current_def set, so copy them over.  */
+      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
+						exit);
+      slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
+							   0),
+						EDGE_SUCC (loop->latch, 0));
+    }
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
+      if (scalar_loop != loop)
+	{
+	  gimple_stmt_iterator gsi;
+	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+
+	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
+	       gsi_next (&gsi))
+	    {
+	      gimple phi = gsi_stmt (gsi);
+	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+	      location_t orig_locus
+		= gimple_phi_arg_location_from_edge (phi, e);
+
+	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+	    }
+	}
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
       if (was_imm_dom)
-	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_loop->header);
+	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
          one so we have a proper pre-header for the loop at the exit edge.  */
-      redirect_edge_pred (single_succ_edge (preheader), single_pred (preheader));
+      redirect_edge_pred (single_succ_edge (preheader),
+			  single_pred (preheader));
       delete_basic_block (preheader);
-      set_immediate_dominator (CDI_DOMINATORS, loop->header,
-			       loop_preheader_edge (loop)->src);
+      set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
+			       loop_preheader_edge (scalar_loop)->src);
     }
   else /* Add the copy at entry.  */
     {
+      if (scalar_loop != loop)
+	{
+	  /* Remove the non-necessary forwarder of scalar_loop again.  */
+	  redirect_edge_pred (single_succ_edge (preheader),
+			      single_pred (preheader));
+	  delete_basic_block (preheader);
+	  set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
+				   loop_preheader_edge (scalar_loop)->src);
+	  preheader = split_edge (loop_preheader_edge (loop));
+	  entry_e = single_pred_edge (preheader);
+	}
+
       redirect_edge_and_branch_force (entry_e, new_preheader);
       flush_pending_stmts (entry_e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, entry_e->src);
@@ -776,15 +854,39 @@  slpeel_tree_duplicate_loop_to_edge_cfg (
 
       /* And remove the non-necessary forwarder again.  Keep the other
          one so we have a proper pre-header for the loop at the exit edge.  */
-      redirect_edge_pred (single_succ_edge (new_preheader), single_pred (new_preheader));
+      redirect_edge_pred (single_succ_edge (new_preheader),
+			  single_pred (new_preheader));
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
     }
 
-  for (unsigned i = 0; i < loop->num_nodes+1; i++)
+  for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
     rename_variables_in_bb (new_bbs[i]);
 
+  if (scalar_loop != loop)
+    {
+      /* Update new_loop->header PHIs, so that on the preheader
+	 edge they are the ones from loop rather than scalar_loop.  */
+      gimple_stmt_iterator gsi_orig, gsi_new;
+      edge orig_e = loop_preheader_edge (loop);
+      edge new_e = loop_preheader_edge (new_loop);
+
+      for (gsi_orig = gsi_start_phis (loop->header),
+	   gsi_new = gsi_start_phis (new_loop->header);
+	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
+	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
+	{
+	  gimple orig_phi = gsi_stmt (gsi_orig);
+	  gimple new_phi = gsi_stmt (gsi_new);
+	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
+	  location_t orig_locus
+	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
+
+	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
+	}
+    }
+
   free (new_bbs);
   free (bbs);
 
@@ -995,6 +1097,8 @@  set_prologue_iterations (basic_block bb_
 
    Input:
    - LOOP: the loop to be peeled.
+   - SCALAR_LOOP: if non-NULL, the alternate loop from which basic blocks
+	should be copied.
    - E: the exit or entry edge of LOOP.
         If it is the entry edge, we peel the first iterations of LOOP. In this
         case first-loop is LOOP, and second-loop is the newly created loop.
@@ -1036,8 +1140,8 @@  set_prologue_iterations (basic_block bb_
    FORNOW the resulting code will not be in loop-closed-ssa form.
 */
 
-static struct loop*
-slpeel_tree_peel_loop_to_edge (struct loop *loop,
+static struct loop *
+slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
 			       edge e, tree *first_niters,
 			       tree niters, bool update_first_loop_count,
 			       unsigned int th, bool check_profitability,
@@ -1122,7 +1226,8 @@  slpeel_tree_peel_loop_to_edge (struct lo
         orig_exit_bb:
    */
 
-  if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e)))
+  if (!(new_loop = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
+							   e)))
     {
       loop_loc = find_loop_location (loop);
       dump_printf_loc (MSG_MISSED_OPTIMIZATION, loop_loc,
@@ -1763,6 +1868,7 @@  vect_do_peeling_for_loop_bound (loop_vec
 {
   tree ni_name, ratio_mult_vf_name;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
   struct loop *new_loop;
   edge update_e;
   basic_block preheader;
@@ -1788,11 +1894,12 @@  vect_do_peeling_for_loop_bound (loop_vec
 
   loop_num  = loop->num;
 
-  new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
-                                            &ratio_mult_vf_name, ni_name, false,
-                                            th, check_profitability,
-					    cond_expr, cond_expr_stmt_list,
-					    0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+  new_loop
+    = slpeel_tree_peel_loop_to_edge (loop, scalar_loop, single_exit (loop),
+				     &ratio_mult_vf_name, ni_name, false,
+				     th, check_profitability,
+				     cond_expr, cond_expr_stmt_list,
+				     0, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
 #ifdef ENABLE_CHECKING
@@ -2025,6 +2132,7 @@  vect_do_peeling_for_alignment (loop_vec_
 			       unsigned int th, bool check_profitability)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
   tree niters_of_prolog_loop, ni_name;
   tree n_iters;
   tree wide_prolog_niters;
@@ -2046,11 +2154,11 @@  vect_do_peeling_for_alignment (loop_vec_
 
   /* Peel the prolog loop and iterate it niters_of_prolog_loop.  */
   new_loop =
-    slpeel_tree_peel_loop_to_edge (loop, loop_preheader_edge (loop),
+    slpeel_tree_peel_loop_to_edge (loop, scalar_loop,
+				   loop_preheader_edge (loop),
 				   &niters_of_prolog_loop, ni_name, true,
 				   th, check_profitability, NULL_TREE, NULL,
-				   bound,
-				   0);
+				   bound, 0);
 
   gcc_assert (new_loop);
 #ifdef ENABLE_CHECKING
@@ -2406,6 +2514,7 @@  vect_loop_versioning (loop_vec_info loop
 		      unsigned int th, bool check_profitability)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
   basic_block condition_bb;
   gimple_stmt_iterator gsi, cond_exp_gsi;
   basic_block merge_bb;
@@ -2441,8 +2550,43 @@  vect_loop_versioning (loop_vec_info loop
   gimple_seq_add_seq (&cond_expr_stmt_list, gimplify_stmt_list);
 
   initialize_original_copy_tables ();
-  loop_version (loop, cond_expr, &condition_bb,
-		prob, prob, REG_BR_PROB_BASE - prob, true);
+  if (scalar_loop)
+    {
+      edge scalar_e;
+      basic_block preheader, scalar_preheader;
+
+      /* We don't want to scale SCALAR_LOOP's frequencies, we need to
+	 scale LOOP's frequencies instead.  */
+      loop_version (scalar_loop, cond_expr, &condition_bb,
+		    prob, REG_BR_PROB_BASE, REG_BR_PROB_BASE - prob, true);
+      scale_loop_frequencies (loop, prob, REG_BR_PROB_BASE);
+      /* CONDITION_BB was created above SCALAR_LOOP's preheader,
+	 while we need to move it above LOOP's preheader.  */
+      e = loop_preheader_edge (loop);
+      scalar_e = loop_preheader_edge (scalar_loop);
+      gcc_assert (empty_block_p (e->src)
+		  && single_pred_p (e->src));
+      gcc_assert (empty_block_p (scalar_e->src)
+		  && single_pred_p (scalar_e->src));
+      gcc_assert (single_pred_p (condition_bb));
+      preheader = e->src;
+      scalar_preheader = scalar_e->src;
+      scalar_e = find_edge (condition_bb, scalar_preheader);
+      e = single_pred_edge (preheader);
+      redirect_edge_and_branch_force (single_pred_edge (condition_bb),
+				      scalar_preheader);
+      redirect_edge_and_branch_force (scalar_e, preheader);
+      redirect_edge_and_branch_force (e, condition_bb);
+      set_immediate_dominator (CDI_DOMINATORS, condition_bb,
+			       single_pred (condition_bb));
+      set_immediate_dominator (CDI_DOMINATORS, scalar_preheader,
+			       single_pred (scalar_preheader));
+      set_immediate_dominator (CDI_DOMINATORS, preheader,
+			       condition_bb);
+    }
+  else
+    loop_version (loop, cond_expr, &condition_bb,
+		  prob, prob, REG_BR_PROB_BASE - prob, true);
 
   if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC
       && dump_enabled_p ())
@@ -2465,24 +2609,29 @@  vect_loop_versioning (loop_vec_info loop
      basic block (i.e. it has two predecessors). Just in order to simplify
      following transformations in the vectorizer, we fix this situation
      here by adding a new (empty) block on the exit-edge of the loop,
-     with the proper loop-exit phis to maintain loop-closed-form.  */
+     with the proper loop-exit phis to maintain loop-closed-form.
+     If loop versioning wasn't done from loop, but scalar_loop instead,
+     merge_bb will have already just a single successor.  */
 
   merge_bb = single_exit (loop)->dest;
-  gcc_assert (EDGE_COUNT (merge_bb->preds) == 2);
-  new_exit_bb = split_edge (single_exit (loop));
-  new_exit_e = single_exit (loop);
-  e = EDGE_SUCC (new_exit_bb, 0);
-
-  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    {
-      tree new_res;
-      orig_phi = gsi_stmt (gsi);
-      new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL);
-      new_phi = create_phi_node (new_res, new_exit_bb);
-      arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
-      add_phi_arg (new_phi, arg, new_exit_e,
-		   gimple_phi_arg_location_from_edge (orig_phi, e));
-      adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
+  if (scalar_loop == NULL || EDGE_COUNT (merge_bb->preds) >= 2)
+    {
+      gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
+      new_exit_bb = split_edge (single_exit (loop));
+      new_exit_e = single_exit (loop);
+      e = EDGE_SUCC (new_exit_bb, 0);
+
+      for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  tree new_res;
+	  orig_phi = gsi_stmt (gsi);
+	  new_res = copy_ssa_name (PHI_RESULT (orig_phi), NULL);
+	  new_phi = create_phi_node (new_res, new_exit_bb);
+	  arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, e);
+	  add_phi_arg (new_phi, arg, new_exit_e,
+		       gimple_phi_arg_location_from_edge (orig_phi, e));
+	  adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
+	}
     }
 
 
--- gcc/cfgloop.h.jj	2013-10-23 14:43:09.538921223 +0200
+++ gcc/cfgloop.h	2013-10-23 18:29:24.191348892 +0200
@@ -176,6 +176,9 @@  struct GTY ((chain_next ("%h.next"))) lo
   /* True if we should try harder to vectorize this loop.  */
   bool force_vect;
 
+  /* True if this loop should never be vectorized.  */
+  bool dont_vectorize;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
      by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
      builtins.  */
--- gcc/tree-loop-distribution.c.jj	2013-10-23 14:43:12.757904625 +0200
+++ gcc/tree-loop-distribution.c	2013-10-23 18:29:24.192348882 +0200
@@ -671,7 +671,7 @@  copy_loop_before (struct loop *loop)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
   gcc_assert (res != NULL);
   free_original_copy_tables ();
   delete_update_ssa ();
--- gcc/passes.def.jj	2013-10-23 14:43:09.915919279 +0200
+++ gcc/passes.def	2013-10-23 18:29:24.192348882 +0200
@@ -213,6 +213,8 @@  along with GCC; see the file COPYING3.
 	  NEXT_PASS (pass_iv_canon);
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_if_conversion);
+	  /* pass_vectorize must immediately follow pass_if_conversion.
+	     Please do not add any other passes in between.  */
 	  NEXT_PASS (pass_vectorize);
           PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
 	      NEXT_PASS (pass_dce_loop);
--- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c.jj	2013-10-23 14:43:09.694920419 +0200
+++ gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c	2013-10-23 18:29:24.192348882 +0200
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_condition } */
+/* { dg-additional-options "-ftree-loop-if-convert" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c.jj	2013-10-23 14:43:09.695920414 +0200
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c	2013-10-23 18:29:24.192348882 +0200
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_condition } */
+/* { dg-additional-options "-ftree-loop-if-convert" } */
 
 #include "tree-vect.h"
 
--- gcc/testsuite/gcc.dg/vect/vect-cond-11.c.jj	2013-10-23 18:29:24.193348876 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-cond-11.c	2013-10-23 18:29:24.193348876 +0200
@@ -0,0 +1,116 @@ 
+#include "tree-vect.h"
+
+#define N 1024
+typedef int V __attribute__((vector_size (4)));
+unsigned int a[N * 2] __attribute__((aligned));
+unsigned int b[N * 2] __attribute__((aligned));
+V c[N];
+
+__attribute__((noinline, noclone)) unsigned int
+foo (unsigned int *a, unsigned int *b)
+{
+  int i;
+  unsigned int r = 0;
+  for (i = 0; i < N; i++)
+    {
+      unsigned int x = a[i], y = b[i];
+      if (x < 32)
+	{
+	  x = x + 127;
+	  y = y * 2;
+	}
+      else
+	{
+	  x = x - 16;
+	  y = y + 1;
+	}
+      a[i] = x;
+      b[i] = y;
+      r += x;
+    }
+  return r;
+}
+
+__attribute__((noinline, noclone)) unsigned int
+bar (unsigned int *a, unsigned int *b)
+{
+  int i;
+  unsigned int r = 0;
+  for (i = 0; i < N; i++)
+    {
+      unsigned int x = a[i], y = b[i];
+      if (x < 32)
+	{
+	  x = x + 127;
+	  y = y * 2;
+	}
+      else
+	{
+	  x = x - 16;
+	  y = y + 1;
+	}
+      a[i] = x;
+      b[i] = y;
+      c[i] = c[i] + 1;
+      r += x;
+    }
+  return r;
+}
+
+void
+baz (unsigned int *a, unsigned int *b,
+     unsigned int (*fn) (unsigned int *, unsigned int *))
+{
+  int i;
+  for (i = -64; i < 0; i++)
+    {
+      a[i] = 19;
+      b[i] = 17;
+    }
+  for (; i < N; i++)
+    {
+      a[i] = i - 512;
+      b[i] = i;
+    }
+  for (; i < N + 64; i++)
+    {
+      a[i] = 27;
+      b[i] = 19;
+    }
+  if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U)
+    __builtin_abort ();
+  for (i = -64; i < 0; i++)
+    if (a[i] != 19 || b[i] != 17)
+      __builtin_abort ();
+  for (; i < N; i++)
+    if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16)
+	|| b[i] != (i - 512U < 32U ? i * 2U : i + 1U))
+      __builtin_abort ();
+  for (; i < N + 64; i++)
+    if (a[i] != 27 || b[i] != 19)
+      __builtin_abort ();
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  baz (a + 512, b + 512, foo);
+  baz (a + 512, b + 512, bar);
+  baz (a + 512 + 1, b + 512 + 1, foo);
+  baz (a + 512 + 1, b + 512 + 1, bar);
+  baz (a + 512 + 31, b + 512 + 31, foo);
+  baz (a + 512 + 31, b + 512 + 31, bar);
+  baz (a + 512 + 1, b + 512, foo);
+  baz (a + 512 + 1, b + 512, bar);
+  baz (a + 512 + 31, b + 512, foo);
+  baz (a + 512 + 31, b + 512, bar);
+  baz (a + 512, b + 512 + 1, foo);
+  baz (a + 512, b + 512 + 1, bar);
+  baz (a + 512, b + 512 + 31, foo);
+  baz (a + 512, b + 512 + 31, bar);
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */