diff mbox series

[4/6] Make the vectoriser do its own DCE

Message ID 87d0u2raz3.fsf@arm.com
State New
Headers show
Series Make vector pattern statements less special | expand

Commit Message

Richard Sandiford Aug. 28, 2018, 11:23 a.m. UTC
The vectoriser normally leaves a later DCE pass to remove the scalar
code, but we've accumulated various special cases for things that
DCE can't handle, such as removing the scalar stores that have been
replaced by vector stores, and the scalar calls to internal functions.
(The latter must be removed for correctness, since no underlying scalar
optabs exist for those calls.)

Now that vec_basic_block gives us an easy way of iterating over the
original scalar code (ignoring any new code inserted by the vectoriser),
it seems easier to do the DCE directly.  This involves marking the few
cases in which the vector code needs part of the original scalar code
to be kept around.


2018-08-28  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
	member variable.
	(vect_mark_used_by_vector_code): Declare.
	(vect_remove_dead_scalar_stmts): Likewise.
	(vect_transform_stmt): Return void.
	(vect_remove_stores): Delete.
	* tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
	* tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
	(vectorizable_call, vectorizable_simd_clone_call): Don't remove
	scalar calls here.
	(vectorizable_shift): Mark shift amounts in a vector-scalar shift
	as being used by the vector code.
	(vectorizable_load): Mark unhoisted scalar loads that feed a
	load-and-broadcast operation as being needed by the vector code.
	(vect_transform_stmt): Return void.
	(vect_remove_stores): Delete.
	(vect_maybe_remove_scalar_stmt): New function.
	(vect_remove_dead_scalar_stmts): Likewise.
	* tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
	(vect_remove_slp_scalar_calls): Delete.
	(vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
	* tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
	are retained by the vector code.
	(vectorizable_live_operation): Mark scalar live-out statements that
	are retained by the vector code.
	(vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
	in nested loops as being needed by the vector code.  Replace the
	outer loop's gcond with a dummy condition.
	(vect_transform_loop): Update calls accordingly.  Don't remove
	scalar stores or calls here; call vect_remove_dead_scalar_stmts
	instead.
	* tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
	remove PHIs that were classified as reductions but never actually
	vectorized.

Comments

Jeff Law Aug. 28, 2018, 11:01 p.m. UTC | #1
On 08/28/2018 05:23 AM, Richard Sandiford wrote:
> The vectoriser normally leaves a later DCE pass to remove the scalar
> code, but we've accumulated various special cases for things that
> DCE can't handle, such as removing the scalar stores that have been
> replaced by vector stores, and the scalar calls to internal functions.
> (The latter must be removed for correctness, since no underlying scalar
> optabs exist for those calls.)
> 
> Now that vec_basic_block gives us an easy way of iterating over the
> original scalar code (ignoring any new code inserted by the vectoriser),
> it seems easier to do the DCE directly.  This involves marking the few
> cases in which the vector code needs part of the original scalar code
> to be kept around.
> 
> 
> 2018-08-28  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	* tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
> 	member variable.
> 	(vect_mark_used_by_vector_code): Declare.
> 	(vect_remove_dead_scalar_stmts): Likewise.
> 	(vect_transform_stmt): Return void.
> 	(vect_remove_stores): Delete.
> 	* tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
> 	* tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
> 	(vectorizable_call, vectorizable_simd_clone_call): Don't remove
> 	scalar calls here.
> 	(vectorizable_shift): Mark shift amounts in a vector-scalar shift
> 	as being used by the vector code.
> 	(vectorizable_load): Mark unhoisted scalar loads that feed a
> 	load-and-broadcast operation as being needed by the vector code.
> 	(vect_transform_stmt): Return void.
> 	(vect_remove_stores): Delete.
> 	(vect_maybe_remove_scalar_stmt): New function.
> 	(vect_remove_dead_scalar_stmts): Likewise.
> 	* tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
> 	(vect_remove_slp_scalar_calls): Delete.
> 	(vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
> 	* tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
> 	are retained by the vector code.
> 	(vectorizable_live_operation): Mark scalar live-out statements that
> 	are retained by the vector code.
> 	(vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
> 	in nested loops as being needed by the vector code.  Replace the
> 	outer loop's gcond with a dummy condition.
> 	(vect_transform_loop): Update calls accordingly.  Don't remove
> 	scalar stores or calls here; call vect_remove_dead_scalar_stmts
> 	instead.
> 	* tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
> 	remove PHIs that were classified as reductions but never actually
> 	vectorized.
Presumably you'll look at killing the actual DCE pass we're running as a
follow-up.

I'm hoping this will help the tendency of the vectorizer to leak
SSA_NAMEs.  Though it's late enough in the pipeline that leaking isn't
that bad.





> Index: gcc/tree-vectorizer.c
> ===================================================================
> --- gcc/tree-vectorizer.c	2018-08-28 12:05:11.466982927 +0100
> +++ gcc/tree-vectorizer.c	2018-08-28 12:05:14.014961439 +0100
> @@ -654,8 +654,13 @@ vec_info::remove_stmt (stmt_vec_info stm
>    set_vinfo_for_stmt (stmt_info->stmt, NULL);
>    gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
>    unlink_stmt_vdef (stmt_info->stmt);
> -  gsi_remove (&si, true);
> -  release_defs (stmt_info->stmt);
> +  if (is_a <gphi *> (stmt_info->stmt))
> +    remove_phi_node (&si, true);
> +  else
> +    {
> +      gsi_remove (&si, true);
> +      release_defs (stmt_info->stmt);
> +    }
More than once I've wondered if we should factor this into its own
little function.  I'm pretty sure I've seen this style of code
elsewhere.  Your call whether or not to clean that up.

OK with or without creating a little helper for the code above.
jeff
Richard Biener Aug. 29, 2018, 7:15 a.m. UTC | #2
On Wed, Aug 29, 2018 at 1:01 AM Jeff Law <law@redhat.com> wrote:
>
> On 08/28/2018 05:23 AM, Richard Sandiford wrote:
> > The vectoriser normally leaves a later DCE pass to remove the scalar
> > code, but we've accumulated various special cases for things that
> > DCE can't handle, such as removing the scalar stores that have been
> > replaced by vector stores, and the scalar calls to internal functions.
> > (The latter must be removed for correctness, since no underlying scalar
> > optabs exist for those calls.)
> >
> > Now that vec_basic_block gives us an easy way of iterating over the
> > original scalar code (ignoring any new code inserted by the vectoriser),
> > it seems easier to do the DCE directly.  This involves marking the few
> > cases in which the vector code needs part of the original scalar code
> > to be kept around.
> >
> >
> > 2018-08-28  Richard Sandiford  <richard.sandiford@arm.com>
> >
> > gcc/
> >       * tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
> >       member variable.
> >       (vect_mark_used_by_vector_code): Declare.
> >       (vect_remove_dead_scalar_stmts): Likewise.
> >       (vect_transform_stmt): Return void.
> >       (vect_remove_stores): Delete.
> >       * tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
> >       * tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
> >       (vectorizable_call, vectorizable_simd_clone_call): Don't remove
> >       scalar calls here.
> >       (vectorizable_shift): Mark shift amounts in a vector-scalar shift
> >       as being used by the vector code.
> >       (vectorizable_load): Mark unhoisted scalar loads that feed a
> >       load-and-broadcast operation as being needed by the vector code.
> >       (vect_transform_stmt): Return void.
> >       (vect_remove_stores): Delete.
> >       (vect_maybe_remove_scalar_stmt): New function.
> >       (vect_remove_dead_scalar_stmts): Likewise.
> >       * tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
> >       (vect_remove_slp_scalar_calls): Delete.
> >       (vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
> >       * tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
> >       are retained by the vector code.
> >       (vectorizable_live_operation): Mark scalar live-out statements that
> >       are retained by the vector code.
> >       (vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
> >       in nested loops as being needed by the vector code.  Replace the
> >       outer loop's gcond with a dummy condition.
> >       (vect_transform_loop): Update calls accordingly.  Don't remove
> >       scalar stores or calls here; call vect_remove_dead_scalar_stmts
> >       instead.
> >       * tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
> >       remove PHIs that were classified as reductions but never actually
> >       vectorized.
> Presumably you'll look at killing the actual DCE pass we're running as a
> follow-up.
>
> I'm hoping this will help the tendency of the vectorizer to leak
> SSA_NAMEs.  Though it's late enough in the pipeline that leaking isn't
> that bad.
>
>
>
>
>
> > Index: gcc/tree-vectorizer.c
> > ===================================================================
> > --- gcc/tree-vectorizer.c     2018-08-28 12:05:11.466982927 +0100
> > +++ gcc/tree-vectorizer.c     2018-08-28 12:05:14.014961439 +0100
> > @@ -654,8 +654,13 @@ vec_info::remove_stmt (stmt_vec_info stm
> >    set_vinfo_for_stmt (stmt_info->stmt, NULL);
> >    gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
> >    unlink_stmt_vdef (stmt_info->stmt);
> > -  gsi_remove (&si, true);
> > -  release_defs (stmt_info->stmt);
> > +  if (is_a <gphi *> (stmt_info->stmt))
> > +    remove_phi_node (&si, true);
> > +  else
> > +    {
> > +      gsi_remove (&si, true);
> > +      release_defs (stmt_info->stmt);
> > +    }
> More than once I've wondered if we should factor this into its own
> little function.  I'm pretty sure I've seen this style of code
> elsewhere.  Your call whether or not to clean that up.

I think "merging" remove_phi_node and gsi_remove would make more sense.
There's the non-obvious semantic difference of the boolean argument so that
would need cleaning up.  Usually removing also involves calling
unlink_stmt_vdef.

Richard.

> OK with or without creating a little helper for the code above.
> jeff
diff mbox series

Patch

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2018-08-28 12:05:11.466982927 +0100
+++ gcc/tree-vectorizer.h	2018-08-28 12:05:14.014961439 +0100
@@ -925,6 +925,10 @@  struct _stmt_vec_info {
   /* For both loads and stores.  */
   bool simd_lane_access_p;
 
+  /* True if the vectorized code keeps this statement in its current form.
+     Only meaningful for statements that were in the original scalar code.  */
+  bool used_by_vector_code_p;
+
   /* Classifies how the load or store is going to be implemented
      for loop vectorization.  */
   vect_memory_access_type memory_access_type;
@@ -1522,6 +1526,7 @@  extern stmt_vec_info vect_finish_replace
 extern stmt_vec_info vect_finish_stmt_generation (stmt_vec_info, gimple *,
 						  gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
+extern void vect_mark_used_by_vector_code (stmt_vec_info);
 extern tree vect_get_store_rhs (stmt_vec_info);
 extern tree vect_get_vec_def_for_operand_1 (stmt_vec_info, enum vect_def_type);
 extern tree vect_get_vec_def_for_operand (tree, stmt_vec_info, tree = NULL);
@@ -1532,9 +1537,8 @@  extern void vect_get_vec_defs_for_stmt_c
 extern tree vect_init_vector (stmt_vec_info, tree, tree,
                               gimple_stmt_iterator *);
 extern tree vect_get_vec_def_for_stmt_copy (vec_info *, tree);
-extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
+extern void vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
 				 slp_tree, slp_instance);
-extern void vect_remove_stores (stmt_vec_info);
 extern bool vect_analyze_stmt (stmt_vec_info, bool *, slp_tree, slp_instance,
 			       stmt_vector_for_cost *);
 extern bool vectorizable_condition (stmt_vec_info, gimple_stmt_iterator *,
@@ -1554,6 +1558,7 @@  extern gcall *vect_gen_while (tree, tree
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree *);
 extern tree vect_get_mask_type_for_stmt (stmt_vec_info);
+extern void vect_remove_dead_scalar_stmts (vec_info *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	2018-08-28 12:05:11.466982927 +0100
+++ gcc/tree-vectorizer.c	2018-08-28 12:05:14.014961439 +0100
@@ -654,8 +654,13 @@  vec_info::remove_stmt (stmt_vec_info stm
   set_vinfo_for_stmt (stmt_info->stmt, NULL);
   gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
   unlink_stmt_vdef (stmt_info->stmt);
-  gsi_remove (&si, true);
-  release_defs (stmt_info->stmt);
+  if (is_a <gphi *> (stmt_info->stmt))
+    remove_phi_node (&si, true);
+  else
+    {
+      gsi_remove (&si, true);
+      release_defs (stmt_info->stmt);
+    }
   stmt_info->block->remove (stmt_info);
   free_stmt_vec_info (stmt_info);
 }
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2018-08-28 12:05:11.466982927 +0100
+++ gcc/tree-vect-stmts.c	2018-08-28 12:05:14.014961439 +0100
@@ -766,6 +766,34 @@  vect_mark_stmts_to_be_vectorized (loop_v
   return true;
 }
 
+/* Record that scalar statement STMT_INFO is needed by the vectorized code.  */
+
+void
+vect_mark_used_by_vector_code (stmt_vec_info stmt_info)
+{
+  vec_info *vinfo = stmt_info->vinfo;
+  auto_vec<stmt_vec_info, 16> worklist;
+  worklist.quick_push (stmt_info);
+  stmt_info->used_by_vector_code_p = true;
+  do
+    {
+      stmt_info = worklist.pop ();
+      ssa_op_iter iter;
+      use_operand_p use;
+      FOR_EACH_PHI_OR_STMT_USE (use, stmt_info->stmt, iter, SSA_OP_USE)
+	{
+	  tree op = USE_FROM_PTR (use);
+	  stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
+	  if (def_stmt_info && !def_stmt_info->used_by_vector_code_p)
+	    {
+	      def_stmt_info->used_by_vector_code_p = 1;
+	      worklist.safe_push (def_stmt_info);
+	    }
+	}
+    }
+  while (!worklist.is_empty ());
+}
+
 /* Compute the prologue cost for invariant or constant operands.  */
 
 static unsigned
@@ -3081,7 +3109,6 @@  vectorizable_call (stmt_vec_info stmt_in
   auto_vec<tree, 8> orig_vargs;
   enum { NARROW, NONE, WIDEN } modifier;
   size_t i, nargs;
-  tree lhs;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -3579,22 +3606,6 @@  vectorizable_call (stmt_vec_info stmt_in
     return false;
 
   vargs.release ();
-
-  /* The call in STMT might prevent it from being removed in dce.
-     We however cannot remove it here, due to the way the ssa name
-     it defines is mapped to the new definition.  So just replace
-     rhs of the statement with something harmless.  */
-
-  if (slp_node)
-    return true;
-
-  stmt_info = vect_orig_stmt (stmt_info);
-  lhs = gimple_get_lhs (stmt_info->stmt);
-
-  gassign *new_stmt
-    = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs)));
-  vinfo->replace_stmt (gsi, stmt_info, new_stmt);
-
   return true;
 }
 
@@ -3703,7 +3714,7 @@  vectorizable_simd_clone_call (stmt_vec_i
 {
   tree vec_dest;
   tree scalar_dest;
-  tree op, type;
+  tree op;
   tree vec_oprnd0 = NULL_TREE;
   stmt_vec_info prev_stmt_info;
   tree vectype;
@@ -3717,7 +3728,7 @@  vectorizable_simd_clone_call (stmt_vec_i
   auto_vec<simd_call_arg_info> arginfo;
   vec<tree> vargs = vNULL;
   size_t i, nargs;
-  tree lhs, rtype, ratype;
+  tree rtype, ratype;
   vec<constructor_elt, va_gc> *ret_ctor_elts = NULL;
 
   /* Is STMT a vectorizable call?   */
@@ -4310,27 +4321,6 @@  vectorizable_simd_clone_call (stmt_vec_i
     }
 
   vargs.release ();
-
-  /* The call in STMT might prevent it from being removed in dce.
-     We however cannot remove it here, due to the way the ssa name
-     it defines is mapped to the new definition.  So just replace
-     rhs of the statement with something harmless.  */
-
-  if (slp_node)
-    return true;
-
-  gimple *new_stmt;
-  if (scalar_dest)
-    {
-      type = TREE_TYPE (scalar_dest);
-      lhs = gimple_call_lhs (vect_orig_stmt (stmt_info)->stmt);
-      new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
-    }
-  else
-    new_stmt = gimple_build_nop ();
-  vinfo->replace_stmt (gsi, vect_orig_stmt (stmt_info), new_stmt);
-  unlink_stmt_vdef (stmt);
-
   return true;
 }
 
@@ -5643,6 +5633,9 @@  vectorizable_shift (stmt_vec_info stmt_i
     dump_printf_loc (MSG_NOTE, vect_location,
                      "transform binary/unary operation.\n");
 
+  if (scalar_shift_arg && op1_def_stmt_info)
+    vect_mark_used_by_vector_code (op1_def_stmt_info);
+
   /* Handle def.  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
@@ -7649,6 +7642,8 @@  vectorizable_load (stmt_vec_info stmt_in
 	    (loop_preheader_edge (loop),
 	     gimple_build_assign (scalar_dest, rhs));
 	}
+      else
+	vect_mark_used_by_vector_code (stmt_info);
       /* These copies are all equivalent, but currently the representation
 	 requires a separate STMT_VINFO_VEC_STMT for each one.  */
       prev_stmt_info = NULL;
@@ -9629,12 +9624,11 @@  vect_analyze_stmt (stmt_vec_info stmt_in
 
    Create a vectorized stmt to replace STMT_INFO, and insert it at BSI.  */
 
-bool
+void
 vect_transform_stmt (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 		     slp_tree slp_node, slp_instance slp_node_instance)
 {
   vec_info *vinfo = stmt_info->vinfo;
-  bool is_store = false;
   stmt_vec_info vec_stmt = NULL;
   bool done;
 
@@ -9689,18 +9683,6 @@  vect_transform_stmt (stmt_vec_info stmt_
     case store_vec_info_type:
       done = vectorizable_store (stmt_info, gsi, &vec_stmt, slp_node, NULL);
       gcc_assert (done);
-      if (STMT_VINFO_GROUPED_ACCESS (stmt_info) && !slp_node)
-	{
-	  /* In case of interleaving, the whole chain is vectorized when the
-	     last store in the chain is reached.  Store stmts before the last
-	     one are skipped, and there vec_stmt_info shouldn't be freed
-	     meanwhile.  */
-	  stmt_vec_info group_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-	  if (DR_GROUP_STORE_COUNT (group_info) == DR_GROUP_SIZE (group_info))
-	    is_store = true;
-	}
-      else
-	is_store = true;
       break;
 
     case condition_vec_info_type:
@@ -9791,30 +9773,9 @@  vect_transform_stmt (stmt_vec_info stmt_
 
   if (vec_stmt)
     STMT_VINFO_VEC_STMT (stmt_info) = vec_stmt;
-
-  return is_store;
 }
 
 
-/* Remove a group of stores (for SLP or interleaving), free their
-   stmt_vec_info.  */
-
-void
-vect_remove_stores (stmt_vec_info first_stmt_info)
-{
-  vec_info *vinfo = first_stmt_info->vinfo;
-  stmt_vec_info next_stmt_info = first_stmt_info;
-
-  while (next_stmt_info)
-    {
-      stmt_vec_info tmp = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
-      next_stmt_info = vect_orig_stmt (next_stmt_info);
-      /* Free the attached stmt_vec_info and remove the stmt.  */
-      vinfo->remove_stmt (next_stmt_info);
-      next_stmt_info = tmp;
-    }
-}
-
 /* Function get_vectype_for_scalar_type_and_size.
 
    Returns the vector type corresponding to SCALAR_TYPE  and SIZE as supported
@@ -10852,3 +10813,118 @@  vect_get_mask_type_for_stmt (stmt_vec_in
     }
   return mask_type;
 }
+
+/* Handle vect_remove_dead_scalar_stmts for statement STMT_INFO.  */
+
+static void
+vect_maybe_remove_scalar_stmt (stmt_vec_info stmt_info)
+{
+  vec_info *vinfo = stmt_info->vinfo;
+  bool bb_p = is_a <bb_vec_info> (vinfo);
+
+  /* Keep scalar statements that are needed by the vectorized code,
+     such as the phi in a "fold left" reduction or the load in a
+     load-and-broadcast operation.  */
+  if (stmt_info->used_by_vector_code_p)
+    return;
+
+  /* Check for types of statement that we can handle.  */
+  if (!is_a <gphi *> (stmt_info->stmt)
+      && !is_a <gassign *> (stmt_info->stmt)
+      && !is_a <gcall *> (stmt_info->stmt))
+    return;
+
+  /* Check whether the function still contains uses of the statement.  */
+  tree lhs = gimple_get_lhs (stmt_info->stmt);
+  if (lhs && TREE_CODE (lhs) == SSA_NAME && !has_zero_uses (lhs))
+    {
+      /* We now have to decide whether the statement is dead despite
+	 still having uses.  We cannot prove this for BB vectorization,
+	 since the vectorization region includes unrelated statements
+	 that might well not be dead.  We also need to keep virtual
+	 operand phis, since except in degenerate cases, the vector
+	 loop will contain virtual operands whenever the scalar loop did.
+
+	 In all other cases, !used_by_vector_code_p guarantees that
+	 the statement is dead for loop vectorization.  */
+      if (bb_p || virtual_operand_p (lhs))
+	return;
+
+      /* Check and process all uses of the lhs.  */
+      imm_use_iterator imm_iter;
+      gimple *use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	{
+	  if (is_gimple_debug (use_stmt))
+	    continue;
+
+	  /* The use must be either a loop phi (which we'll delete later)
+	     or an exit phi.  */
+	  gphi *use_phi = dyn_cast <gphi *> (use_stmt);
+	  gcc_assert (use_phi);
+	  if (!vinfo->lookup_stmt (use_phi))
+	    {
+	      /* It's an exit phi, and the phi must itself be dead code.  */
+	      gcc_assert (has_zero_uses (gimple_phi_result (use_phi)));
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   "deleting exit phi: ");
+		  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, use_phi, 0);
+		}
+	      gimple_stmt_iterator gsi = gsi_for_stmt (use_phi);
+	      remove_phi_node (&gsi, true);
+	    }
+	}
+    }
+
+  /* The loop vectorizer decides for each statement whether the statement
+     should be vectorized, dropped or kept as-is.  We dealt with the last
+     case above, so anything left can be removed.  However, the region
+     used in BB vectorization includes unrelated statements that we should
+     only drop if we can prove they are dead.  */
+  if (bb_p
+      && ((lhs && TREE_CODE (lhs) != SSA_NAME)
+	  || gimple_vdef (stmt_info->stmt)
+	  || gimple_has_side_effects (stmt_info->stmt)))
+    {
+      stmt_vec_info final_info = vect_stmt_to_vectorize (stmt_info);
+      /* Keep the statement unless it was replaced by a vector version.
+
+	 Note that RELEVANT_P is only meaningful if the statement is still
+	 part of an SLP instance.  It can be set on other statements that
+	 were tentatively part of an SLP instance that we had to abandon.  */
+      if (STMT_VINFO_NUM_SLP_USES (final_info) == 0
+	  || !STMT_VINFO_RELEVANT_P (final_info))
+	return;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "deleting scalar stmt: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
+    }
+  vinfo->remove_stmt (stmt_info);
+}
+
+/* Called after vectorizing VINFO.  Remove any scalar statements that
+   are no longer needed.  */
+
+void
+vect_remove_dead_scalar_stmts (vec_info *vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_remove_dead_scalar_stmts");
+
+  unsigned int i;
+  vec_basic_block *vec_bb;
+  FOR_EACH_VEC_ELT_REVERSE (vinfo->blocks, i, vec_bb)
+    {
+      stmt_vec_info prev_stmt_info;
+      for (stmt_vec_info stmt_info = vec_bb->last (); stmt_info;
+	   stmt_info = prev_stmt_info)
+	{
+	  prev_stmt_info = stmt_info->prev;
+	  vect_maybe_remove_scalar_stmt (stmt_info);
+	}
+    }
+}
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-08-28 12:05:11.462982962 +0100
+++ gcc/tree-vect-slp.c	2018-08-28 12:05:14.010961472 +0100
@@ -2975,6 +2975,7 @@  vect_slp_bb (basic_block bb)
 
 	  bb_vinfo->shared->check_datarefs ();
 	  vect_schedule_slp (bb_vinfo);
+	  vect_remove_dead_scalar_stmts (bb_vinfo);
 
 	  unsigned HOST_WIDE_INT bytes;
 	  if (current_vector_size.is_constant (&bytes))
@@ -3954,43 +3955,6 @@  vect_schedule_slp_instance (slp_tree nod
       }
 }
 
-/* Replace scalar calls from SLP node NODE with setting of their lhs to zero.
-   For loop vectorization this is done in vectorizable_call, but for SLP
-   it needs to be deferred until end of vect_schedule_slp, because multiple
-   SLP instances may refer to the same scalar stmt.  */
-
-static void
-vect_remove_slp_scalar_calls (slp_tree node)
-{
-  gimple *new_stmt;
-  gimple_stmt_iterator gsi;
-  int i;
-  slp_tree child;
-  tree lhs;
-  stmt_vec_info stmt_info;
-
-  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
-    return;
-
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-    vect_remove_slp_scalar_calls (child);
-
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-    {
-      gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt);
-      if (!stmt || gimple_bb (stmt) == NULL)
-	continue;
-      if (is_pattern_stmt_p (stmt_info)
-	  || !PURE_SLP_STMT (stmt_info))
-	continue;
-      lhs = gimple_call_lhs (stmt);
-      new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs)));
-      gsi = gsi_for_stmt (stmt);
-      stmt_info->vinfo->replace_stmt (&gsi, stmt_info, new_stmt);
-      SSA_NAME_DEF_STMT (gimple_assign_lhs (new_stmt)) = new_stmt;
-    }
-}
-
 /* Generate vector code for all SLP instances in the loop/basic block.  */
 
 void
@@ -4013,32 +3977,4 @@  vect_schedule_slp (vec_info *vinfo)
                          "vectorizing stmts using SLP.\n");
     }
   delete bst_map;
-
-  FOR_EACH_VEC_ELT (slp_instances, i, instance)
-    {
-      slp_tree root = SLP_INSTANCE_TREE (instance);
-      stmt_vec_info store_info;
-      unsigned int j;
-
-      /* Remove scalar call stmts.  Do not do this for basic-block
-	 vectorization as not all uses may be vectorized.
-	 ???  Why should this be necessary?  DCE should be able to
-	 remove the stmts itself.
-	 ???  For BB vectorization we can as well remove scalar
-	 stmts starting from the SLP tree root if they have no
-	 uses.  */
-      if (is_a <loop_vec_info> (vinfo))
-	vect_remove_slp_scalar_calls (root);
-
-      for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info)
-                  && j < SLP_INSTANCE_GROUP_SIZE (instance); j++)
-        {
-	  if (!STMT_VINFO_DATA_REF (store_info))
-	    break;
-
-	  store_info = vect_orig_stmt (store_info);
-	  /* Free the attached stmt_vec_info and remove the stmt.  */
-	  vinfo->remove_stmt (store_info);
-        }
-    }
 }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-08-28 12:05:11.458982995 +0100
+++ gcc/tree-vect-loop.c	2018-08-28 12:05:14.010961472 +0100
@@ -6086,18 +6086,24 @@  vectorizable_reduction (stmt_vec_info st
 	}
 
       if (STMT_VINFO_REDUC_TYPE (stmt_info) == FOLD_LEFT_REDUCTION)
-	/* Leave the scalar phi in place.  Note that checking
-	   STMT_VINFO_VEC_REDUCTION_TYPE (as below) only works
-	   for reductions involving a single statement.  */
-	return true;
+	{
+	  /* Leave the scalar phi in place.  Note that checking
+	     STMT_VINFO_VEC_REDUCTION_TYPE (as below) only works
+	     for reductions involving a single statement.  */
+	  vect_mark_used_by_vector_code (stmt_info);
+	  return true;
+	}
 
       stmt_vec_info reduc_stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
       reduc_stmt_info = vect_stmt_to_vectorize (reduc_stmt_info);
 
       if (STMT_VINFO_VEC_REDUCTION_TYPE (reduc_stmt_info)
 	  == EXTRACT_LAST_REDUCTION)
-	/* Leave the scalar phi in place.  */
-	return true;
+	{
+	  /* Leave the scalar phi in place.  */
+	  vect_mark_used_by_vector_code (stmt_info);
+	  return true;
+	}
 
       gassign *reduc_stmt = as_a <gassign *> (reduc_stmt_info->stmt);
       for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
@@ -7790,6 +7796,7 @@  vectorizable_live_operation (stmt_vec_in
 	dump_printf_loc (MSG_NOTE, vect_location,
 			 "statement is simple and uses invariant.  Leaving in "
 			 "place.\n");
+      vect_mark_used_by_vector_code (stmt_info);
       return true;
     }
 
@@ -8158,13 +8165,12 @@  scale_profile_for_vect_loop (struct loop
     scale_bbs_frequencies (&loop->latch, 1, exit_l->probability / prob);
 }
 
-/* Vectorize STMT_INFO if relevant, inserting any new instructions before GSI.
-   When vectorizing STMT_INFO as a store, set *SEEN_STORE to its
-   stmt_vec_info.  */
+/* Vectorize STMT_INFO if relevant, inserting any new instructions
+   before GSI.  */
 
 static void
 vect_transform_loop_stmt (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
-			  gimple_stmt_iterator *gsi, stmt_vec_info *seen_store)
+			  gimple_stmt_iterator *gsi)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
@@ -8179,6 +8185,21 @@  vect_transform_loop_stmt (loop_vec_info
   if (MAY_HAVE_DEBUG_BIND_STMTS && !STMT_VINFO_LIVE_P (stmt_info))
     vect_loop_kill_debug_uses (loop, stmt_info);
 
+  if (gcond *cond = dyn_cast <gcond *> (stmt_info->stmt))
+    {
+      if (nested_in_vect_loop_p (loop, stmt_info))
+	/* The gconds for inner loops aren't changed by vectorization.  */
+	vect_mark_used_by_vector_code (stmt_info);
+      else
+	{
+	  /* Replace the scalar gcond with a dummy condition for now,
+	     so that all the scalar inputs to it are dead.  We'll later
+	     replace the condition with the new vector-based one.  */
+	  gimple_cond_make_false (cond);
+	  update_stmt (cond);
+	}
+    }
+
   if (!STMT_VINFO_RELEVANT_P (stmt_info)
       && !STMT_VINFO_LIVE_P (stmt_info))
     return;
@@ -8203,8 +8224,7 @@  vect_transform_loop_stmt (loop_vec_info
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "transform statement.\n");
 
-  if (vect_transform_stmt (stmt_info, gsi, NULL, NULL))
-    *seen_store = stmt_info;
+  vect_transform_stmt (stmt_info, gsi, NULL, NULL);
 }
 
 /* Function vect_transform_loop.
@@ -8389,7 +8409,6 @@  vect_transform_loop (loop_vec_info loop_
 	    loop_vinfo->remove_stmt (stmt_info);
 	  else
 	    {
-	      stmt_vec_info seen_store = NULL;
 	      gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
 	      if (STMT_VINFO_IN_PATTERN_P (stmt_info))
 		{
@@ -8400,49 +8419,19 @@  vect_transform_loop (loop_vec_info loop_
 		      stmt_vec_info pat_stmt_info
 			= loop_vinfo->lookup_stmt (gsi_stmt (subsi));
 		      vect_transform_loop_stmt (loop_vinfo, pat_stmt_info,
-						&si, &seen_store);
+						&si);
 		    }
 		  stmt_vec_info pat_stmt_info
 		    = STMT_VINFO_RELATED_STMT (stmt_info);
-		  vect_transform_loop_stmt (loop_vinfo, pat_stmt_info, &si,
-					    &seen_store);
-		}
-	      vect_transform_loop_stmt (loop_vinfo, stmt_info, &si,
-					&seen_store);
-	      if (seen_store)
-		{
-		  if (STMT_VINFO_GROUPED_ACCESS (seen_store))
-		    /* Interleaving.  If IS_STORE is TRUE, the
-		       vectorization of the interleaving chain was
-		       completed - free all the stores in the chain.  */
-		    vect_remove_stores (DR_GROUP_FIRST_ELEMENT (seen_store));
-		  else
-		    /* Free the attached stmt_vec_info and remove the stmt.  */
-		    loop_vinfo->remove_stmt (stmt_info);
-		}
-	    }
-	}
-
-      /* Stub out scalar statements that must not survive vectorization.
-	 Doing this here helps with grouped statements, or statements that
-	 are involved in patterns.  */
-      for (gimple_stmt_iterator gsi = gsi_start_bb (vec_bb->bb ());
-	   !gsi_end_p (gsi); gsi_next (&gsi))
-	{
-	  gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
-	  if (call && gimple_call_internal_p (call, IFN_MASK_LOAD))
-	    {
-	      tree lhs = gimple_get_lhs (call);
-	      if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
-		{
-		  tree zero = build_zero_cst (TREE_TYPE (lhs));
-		  gimple *new_stmt = gimple_build_assign (lhs, zero);
-		  gsi_replace (&gsi, new_stmt, true);
+		  vect_transform_loop_stmt (loop_vinfo, pat_stmt_info, &si);
 		}
+	      vect_transform_loop_stmt (loop_vinfo, stmt_info, &si);
 	    }
 	}
     }				/* BBs in loop */
 
+  vect_remove_dead_scalar_stmts (loop_vinfo);
+
   /* The vectorization factor is always > 1, so if we use an IV increment of 1.
      a zero NITERS becomes a nonzero NITERS_VECTOR.  */
   if (integer_onep (step_vector))
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2018-07-31 15:26:15.986653204 +0100
+++ gcc/tree-vect-loop-manip.c	2018-08-28 12:05:14.010961472 +0100
@@ -1515,6 +1515,14 @@  vect_update_ivs_after_vectorizer (loop_v
       /* Skip reduction and virtual phis.  */
       if (!iv_phi_p (phi_info))
 	{
+	  /* ??? We can sometimes see dead PHI loops that get classified
+	     as reductions but are never actually vectorized.  In this case
+	     it should be safe to delete the code or use an arbitrary
+	     initial value, but the safest thing is just to keep it
+	     and let a later pass clean it up.  */
+	  if (!virtual_operand_p (PHI_RESULT (phi))
+	      && !STMT_VINFO_RELEVANT_P (phi_info))
+	    vect_mark_used_by_vector_code (phi_info);
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
 			     "reduc or virtual phi. skip.\n");