diff mbox series

[3/21] middle-end: Implement code motion and dependency analysis for early breaks

Message ID ZUiX060rI5bvYKzL@arm.com
State New
Headers show
Series Support early break/return auto-vectorization | expand

Commit Message

Tamar Christina Nov. 6, 2023, 7:37 a.m. UTC
Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
    * values that are live must be reachable through all exits
    * values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
    multiple early exist statements this should be the one closest to the loop
    latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

At the moment we don't support this for epilog nomasks since the additional
vectorized epilog's stmt UIDs are not found.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
	(vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644




--
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   gimple_stmt_iterator *gstmt)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  /* ?? Do I need to move debug statements? not quite sure..  */
+  if (gimple_has_ops (stmt)
+      && !is_gimple_debug (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	dest = gimple_get_lhs (stmt);
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks not supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currently only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the size of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement is live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      /* This statement is to be moved.  */
+	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, gstmt))
+    return false;
+
+  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+    {
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "marked statement for vUSE update: %G", stmt);
+    }
+
+  return true;
+}
+
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  hash_set<tree> chain, fixed;
+  auto_vec<tree> loads;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+  tree vuse = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+      /* Initiaze the vuse chain with the one at the early break.  */
+      if (!vuse)
+	vuse = gimple_vuse (c);
+
+      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+				     &bases, &vuse, &gsi))
+	return opt_result::failure_at (stmt,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", stmt);
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+    }
+
+  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  /* TODO: Remove? It's useful debug statement but may be too much.  */
+  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "updated use: %T, mem_ref: %G",
+			 vuse, g);
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for stmt %G", vuse, p);
+      unlink_stmt_vdef (p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+       case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_early_exit_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies

Comments

Richard Biener Nov. 7, 2023, 10:53 a.m. UTC | #1
On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> When performing early break vectorization we need to be sure that the vector
> operations are safe to perform.  A simple example is e.g.
> 
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_a[i] = x;
>  }
> 
> where the store to vect_b is not allowed to be executed unconditionally since
> if we exit through the early break it wouldn't have been done for the full VF
> iteration.
> 
> Effective the code motion determines:
>   - is it safe/possible to vectorize the function
>   - what updates to the VUSES should be performed if we do
>   - Which statements need to be moved
>   - Which statements can't be moved:
>     * values that are live must be reachable through all exits
>     * values that aren't single use and shared by the use/def chain of the cond
>   - The final insertion point of the instructions.  In the cases we have
>     multiple early exist statements this should be the one closest to the loop
>     latch itself.
> 
> After motion the loop above is:
> 
>  for (int i = 0; i < N; i++)
>  {
>    ... y = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_b[i] = y;
>    vect_a[i] = x;
> 
>  }
> 
> The operation is split into two, during data ref analysis we determine
> validity of the operation and generate a worklist of actions to perform if we
> vectorize.
> 
> After peeling and just before statetement tranformation we replay this worklist
> which moves the statements and updates book keeping only in the main loop that's
> to be vectorized.  This includes updating of USES in exit blocks.
> 
> At the moment we don't support this for epilog nomasks since the additional
> vectorized epilog's stmt UIDs are not found.

As of UIDs note that UIDs are used for dominance checking in
vect_stmt_dominates_stmt_p and that at least is used during
transform when scheduling SLP.  Moving stmts around invalidates
this UID order (I don't see you "renumbering" UIDs).

More comments below.
 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> 	(vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.
> +
> +   Arguments:
> +     - LOOP_VINFO: loop information for the current loop.
> +     - CHAIN: Currently detected sequence of instructions that need to be moved
> +	      if we are to vectorize this early break.
> +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
> +	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
> +	      takes precedence.  This deals with non-single use cases.
> +     - LOADS: List of all loads found during traversal.
> +     - BASES: List of all load data references found during traversal.
> +     - GSTMT: Current position to inspect for validity.  The sequence
> +	      will be moved upwards from this point.
> +     - REACHING_VUSE: The dominating VUSE found so far.  */
> +
> +static bool
> +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
> +			   hash_set<tree> *fixed, vec<tree> *loads,
> +			   vec<data_reference *> *bases, tree *reaching_vuse,
> +			   gimple_stmt_iterator *gstmt)
> +{
> +  if (gsi_end_p (*gstmt))
> +    return true;
> +
> +  gimple *stmt = gsi_stmt (*gstmt);
> +  /* ?? Do I need to move debug statements? not quite sure..  */

I think we reset them.

> +  if (gimple_has_ops (stmt)
> +      && !is_gimple_debug (stmt))
> +    {
> +      tree dest = NULL_TREE;
> +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> +	 use the LHS, if not, assume that the first argument of a call is the
> +	 value being defined.  e.g. MASKED_LOAD etc.  */
> +      if (gimple_has_lhs (stmt))
> +	dest = gimple_get_lhs (stmt);
> +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> +	dest = gimple_arg (call, 0);
> +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> +	{
> +	  /* Operands of conds are ones we can't move.  */
> +	  fixed->add (gimple_cond_lhs (cond));
> +	  fixed->add (gimple_cond_rhs (cond));
> +	}
> +
> +      bool move = false;


So this all looks a bit like spaghetti (sorry).  I think what
you want to do is perform this in two steps:

 a) mark (and check) the dependences of the early break conditions,
    aka populate 'fixed'
 b) walk stmts from the _last_ early break, verifying all earlier
    non-'fixed' stmts can be moved

for marking dependences you want to simply iterate over use
operands:

  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
    USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
    the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
    a worklist with a visited set to gather all of the relevant
    stmts/defs

> +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_vinfo)
> +	{
> +	   if (dump_enabled_p ())
> +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			      "early breaks not supported. Unknown"
> +			      " statement: %G", stmt);
> +	   return false;
> +	}
> +
> +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +      if (dr_ref)
> +	{
> +	   /* We currently only support statically allocated objects due to
> +	      not having first-faulting loads support or peeling for alignment
> +	      support.  Compute the size of the referenced object (it could be
> +	      dynamically allocated).  */
> +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +	     {
> +	       if (dump_enabled_p ())
> +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				  "early breaks only supported on statically"
> +				  " allocated objects.\n");
> +	       return false;
> +	     }
> +
> +	   tree refop = TREE_OPERAND (obj, 0);
> +	   tree refbase = get_base_address (refop);
> +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +	     {
> +	       if (dump_enabled_p ())
> +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				  "early breaks only supported on statically"
> +				  " allocated objects.\n");
> +	       return false;
> +	     }

Note this doesn't ensure in-bound accesses:

int a[4];

void foo ()
{
  for (unsigned int i = 0; i < 32; ++i)
   {
     if (a[i] == 0)
       break;
     /* ... */
   }
}

you'd happily load a V8SImode vector from 'a'.  If the caller
ensures a[3] == 0 the code is fine but your transformed vector
code not.  You need to check that DR_BASE_ADDRESS + DR_OFFSET
+ DR_INIT + niter * DR_STEP is within the object instead.

> +
> +	   if (DR_IS_READ (dr_ref))
> +	     {
> +		loads->safe_push (dest);
> +		bases->safe_push (dr_ref);
> +	     }
> +	   else if (DR_IS_WRITE (dr_ref))
> +	     {
> +		for (auto dr : bases)
> +		  if (same_data_refs_base_objects (dr, dr_ref))
> +		    {

that looks quadratic to me.  So what's this actually?  You've
gathered all loads after this write and now you are checking
that all those loads do not alias the write?  But
same_data_refs_base_objects is only verifying that the
two refs are suitable for classical dependence analysis,
so it's not a conservative test here.  I think you may want to
use dr_may_alias_p instead?

I'm missing some overall idea of what you are doing, like what's
the actual transform and how do you validate its validity?

It looks like you only move stores?

> +		      if (dump_enabled_p ())
> +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +					   vect_location,
> +					   "early breaks only supported,"
> +					   " overlapping loads and stores found"
> +					   " before the break statement.\n");
> +		      return false;
> +		    }
> +		/* Any writes starts a new chain. */
> +		move = true;
> +	     }
> +	}
> +
> +      /* If a statement is live and escapes the loop through usage in the loop
> +	 epilogue then we can't move it since we need to maintain its
> +	 reachability through all exits.  */
> +      bool skip = false;
> +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> +	{
> +	  imm_use_iterator imm_iter;
> +	  use_operand_p use_p;
> +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> +	    {
> +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +	      if (skip)
> +		break;
> +	    }
> +	}
> +
> +      /* If we found the defining statement of a something that's part of the
> +	 chain then expand the chain with the new SSA_VARs being used.  */
> +      if (!skip && (chain->contains (dest) || move))
> +	{
> +	  move = true;
> +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> +	    {
> +	      tree var = gimple_arg (stmt, x);
> +	      if (TREE_CODE (var) == SSA_NAME)
> +		{
> +		  if (fixed->contains (dest))
> +		    {
> +		      move = false;
> +		      fixed->add (var);
> +		    }
> +		  else
> +		    chain->add (var);
> +		}
> +	      else
> +		{
> +		  use_operand_p use_p;
> +		  ssa_op_iter iter;
> +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +		    {
> +		      tree op = USE_FROM_PTR (use_p);
> +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> +		      if (fixed->contains (dest))
> +			{
> +			  move = false;
> +			  fixed->add (op);
> +			}
> +		      else
> +			chain->add (op);
> +		    }
> +		}
> +	    }
> +
> +	  if (dump_enabled_p ())
> +	    {
> +	      if (move)
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				"found chain %G", stmt);
> +	      else
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				"ignored chain %G, not single use", stmt);
> +	    }
> +	}
> +
> +      if (move)
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "==> recording stmt %G", stmt);
> +
> +	  for (tree ref : loads)
> +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> +	      {
> +	        if (dump_enabled_p ())
> +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				   "early breaks not supported as memory used"
> +				   " may alias.\n");
> +	        return false;
> +	      }

Here you check aliasing again?!

I think it might be conceptually easier (and stronger) to instead
think of the 'fixed' set (and the gconds) to be moved earlier instead
of the stores to be sunk.

For example I fail to see how you check for, say

   for (..)
    {
      tem = a[i] / b[i];
      if (c[i]) break;
      d[i] = tem;
    }

where the division might trap.  For this the validation wouldn't
identify anything to move, right?

I'll note that doing the actual movement will be easier with SLP
and it would be a possibility to implement early break with just
SLP support - as we need to start discovery from the gconds
explicitly anyway there's no problem forcing a single-lane SLP
discovery there.

> +
> +	  /* If we've moved a VDEF, extract the defining MEM and update
> +	     usages of it.   */
> +	  tree vdef;
> +	  if ((vdef = gimple_vdef (stmt)))
> +	    {
> +	      /* This statement is to be moved.  */
> +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
> +	      *reaching_vuse = gimple_vuse (stmt);
> +	    }
> +	}
> +    }
> +
> +  gsi_prev (gstmt);
> +
> +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> +				  reaching_vuse, gstmt))
> +    return false;

Please use a loop instead of recursion.  I suggest to do the loop at the 
single caller.

> +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> +    {
> +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "marked statement for vUSE update: %G", stmt);
> +    }
> +
> +  return true;
> +}
> +
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  hash_set<tree> chain, fixed;
> +  auto_vec<tree> loads;
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +  tree vuse = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);

isn't that 'c' already?

> +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +
> +      /* Initiaze the vuse chain with the one at the early break.  */
> +      if (!vuse)
> +	vuse = gimple_vuse (c);

gconds do not have virtual operands.

> +
> +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> +				     &bases, &vuse, &gsi))
> +	return opt_result::failure_at (stmt,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", stmt);
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +    }
> +
> +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;

no edge is the fallthru edge out of a condition, so this always
selects EDGE_SUCC (dest_bb, 1) which cannot be correct (well,
guess you're lucky).  I think you instead want

  dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == 
dest_bb->loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 
1)->dest;

more nicely written, of course.

> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;

Sorting the vector of early breaks as we gather them might be nicer
than this - you'd then simply use the first or last.

> +
> +  /* TODO: Remove? It's useful debug statement but may be too much.  */
> +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "updated use: %T, mem_ref: %G",
> +			 vuse, g);
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);

I suppose dest_bb is the in-loop block following the last early
exit?  I suppose we do not support an "early" exit after the
main IV exit, right?  Instead we'd require loop rotation?

> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +      update_stmt (stmt);

You shouldn't need to update_stmt here I think.

> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for stmt %G", vuse, p);
> +      unlink_stmt_vdef (p);

it's odd to first move the stmts and then propagate out their defs
(which you forget to release?)

> +      gimple_set_vuse (p, vuse);

and now every store gets the same vuse?  I'm quite sure you'll end
up with broken virtual SSA form here.

> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +       case vect_early_exit_def:
> +	  dump_printf (MSG_NOTE, "early exit\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_early_exit_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of statements needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they need
> +     to be moved.  */
> +  auto_vec<gimple *> early_break_conflict;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> 
> 
> 
> 
>
Tamar Christina Nov. 7, 2023, 11:34 a.m. UTC | #2
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 7, 2023 10:53 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 3/21]middle-end: Implement code motion and
> dependency analysis for early breaks
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > When performing early break vectorization we need to be sure that the
> > vector operations are safe to perform.  A simple example is e.g.
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >    vect_b[i] = x + i;
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_a[i] = x;
> >  }
> >
> > where the store to vect_b is not allowed to be executed
> > unconditionally since if we exit through the early break it wouldn't
> > have been done for the full VF iteration.
> >
> > Effective the code motion determines:
> >   - is it safe/possible to vectorize the function
> >   - what updates to the VUSES should be performed if we do
> >   - Which statements need to be moved
> >   - Which statements can't be moved:
> >     * values that are live must be reachable through all exits
> >     * values that aren't single use and shared by the use/def chain of the cond
> >   - The final insertion point of the instructions.  In the cases we have
> >     multiple early exist statements this should be the one closest to the loop
> >     latch itself.
> >
> > After motion the loop above is:
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >    ... y = x + i;
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_b[i] = y;
> >    vect_a[i] = x;
> >
> >  }
> >
> > The operation is split into two, during data ref analysis we determine
> > validity of the operation and generate a worklist of actions to
> > perform if we vectorize.
> >
> > After peeling and just before statetement tranformation we replay this
> > worklist which moves the statements and updates book keeping only in
> > the main loop that's to be vectorized.  This includes updating of USES in exit
> blocks.
> >
> > At the moment we don't support this for epilog nomasks since the
> > additional vectorized epilog's stmt UIDs are not found.
> 
> As of UIDs note that UIDs are used for dominance checking in
> vect_stmt_dominates_stmt_p and that at least is used during transform when
> scheduling SLP.  Moving stmts around invalidates this UID order (I don't see
> you "renumbering" UIDs).
> 

Just some responses to questions while I process the rest.

I see, yeah I didn't encounter it because I punted SLP support.  As you said for SLP
We indeed don't need this.

> More comments below.
> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> > 	(vect_analyze_early_break_dependences): New.
> > 	(vect_analyze_data_ref_dependences): Use them.
> > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > 	early_breaks.
> > 	(move_early_exit_stmts): New.
> > 	(vect_transform_loop): use it/
> > 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> > 	early_break_vuses.
> > 	(LOOP_VINFO_EARLY_BREAKS): New.
> > 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> > 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> > 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index
> >
> d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c
> 32b9ce
> > 7be77f3e1d60 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct
> data_dependence_relation *ddr,
> >    return opt_result::success ();
> >  }
> >
> > +/* This function tries to validate whether an early break vectorization
> > +   is possible for the current instruction sequence. Returns True i
> > +   possible, otherwise False.
> > +
> > +   Requirements:
> > +     - Any memory access must be to a fixed size buffer.
> > +     - There must not be any loads and stores to the same object.
> > +     - Multiple loads are allowed as long as they don't alias.
> > +
> > +   NOTE:
> > +     This implemementation is very conservative. Any overlappig loads/stores
> > +     that take place before the early break statement gets rejected aside from
> > +     WAR dependencies.
> > +
> > +     i.e.:
> > +
> > +	a[i] = 8
> > +	c = a[i]
> > +	if (b[i])
> > +	  ...
> > +
> > +	is not allowed, but
> > +
> > +	c = a[i]
> > +	a[i] = 8
> > +	if (b[i])
> > +	  ...
> > +
> > +	is which is the common case.
> > +
> > +   Arguments:
> > +     - LOOP_VINFO: loop information for the current loop.
> > +     - CHAIN: Currently detected sequence of instructions that need to be
> moved
> > +	      if we are to vectorize this early break.
> > +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are
> reachable from
> > +	      one or more cond conditions.  If this set overlaps with CHAIN then
> FIXED
> > +	      takes precedence.  This deals with non-single use cases.
> > +     - LOADS: List of all loads found during traversal.
> > +     - BASES: List of all load data references found during traversal.
> > +     - GSTMT: Current position to inspect for validity.  The sequence
> > +	      will be moved upwards from this point.
> > +     - REACHING_VUSE: The dominating VUSE found so far.  */
> > +
> > +static bool
> > +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree>
> *chain,
> > +			   hash_set<tree> *fixed, vec<tree> *loads,
> > +			   vec<data_reference *> *bases, tree *reaching_vuse,
> > +			   gimple_stmt_iterator *gstmt)
> > +{
> > +  if (gsi_end_p (*gstmt))
> > +    return true;
> > +
> > +  gimple *stmt = gsi_stmt (*gstmt);
> > +  /* ?? Do I need to move debug statements? not quite sure..  */
> 
> I think we reset them.
> 
> > +  if (gimple_has_ops (stmt)
> > +      && !is_gimple_debug (stmt))
> > +    {
> > +      tree dest = NULL_TREE;
> > +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> > +	 use the LHS, if not, assume that the first argument of a call is the
> > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > +      if (gimple_has_lhs (stmt))
> > +	dest = gimple_get_lhs (stmt);
> > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > +	dest = gimple_arg (call, 0);
> > +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> > +	{
> > +	  /* Operands of conds are ones we can't move.  */
> > +	  fixed->add (gimple_cond_lhs (cond));
> > +	  fixed->add (gimple_cond_rhs (cond));
> > +	}
> > +
> > +      bool move = false;
> 
> 
> So this all looks a bit like spaghetti (sorry).  I think what you want to do is
> perform this in two steps:
> 
>  a) mark (and check) the dependences of the early break conditions,
>     aka populate 'fixed'
>  b) walk stmts from the _last_ early break, verifying all earlier
>     non-'fixed' stmts can be moved
> 
> for marking dependences you want to simply iterate over use
> operands:
> 
>   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
>     USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
>     the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
>     a worklist with a visited set to gather all of the relevant
>     stmts/defs
> 
> > +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > +      if (!stmt_vinfo)
> > +	{
> > +	   if (dump_enabled_p ())
> > +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			      "early breaks not supported. Unknown"
> > +			      " statement: %G", stmt);
> > +	   return false;
> > +	}
> > +
> > +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > +      if (dr_ref)
> > +	{
> > +	   /* We currently only support statically allocated objects due to
> > +	      not having first-faulting loads support or peeling for alignment
> > +	      support.  Compute the size of the referenced object (it could be
> > +	      dynamically allocated).  */
> > +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> > +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > +	     {
> > +	       if (dump_enabled_p ())
> > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				  "early breaks only supported on statically"
> > +				  " allocated objects.\n");
> > +	       return false;
> > +	     }
> > +
> > +	   tree refop = TREE_OPERAND (obj, 0);
> > +	   tree refbase = get_base_address (refop);
> > +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > +	     {
> > +	       if (dump_enabled_p ())
> > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				  "early breaks only supported on statically"
> > +				  " allocated objects.\n");
> > +	       return false;
> > +	     }
> 
> Note this doesn't ensure in-bound accesses:
> 
> int a[4];
> 
> void foo ()
> {
>   for (unsigned int i = 0; i < 32; ++i)
>    {
>      if (a[i] == 0)
>        break;
>      /* ... */
>    }
> }
> 
> you'd happily load a V8SImode vector from 'a'.  If the caller ensures a[3] == 0
> the code is fine but your transformed vector code not.  You need to check that
> DR_BASE_ADDRESS + DR_OFFSET
> + DR_INIT + niter * DR_STEP is within the object instead.
> 
> > +
> > +	   if (DR_IS_READ (dr_ref))
> > +	     {
> > +		loads->safe_push (dest);
> > +		bases->safe_push (dr_ref);
> > +	     }
> > +	   else if (DR_IS_WRITE (dr_ref))
> > +	     {
> > +		for (auto dr : bases)
> > +		  if (same_data_refs_base_objects (dr, dr_ref))
> > +		    {
> 
> that looks quadratic to me.  So what's this actually?  You've gathered all loads
> after this write and now you are checking that all those loads do not alias the
> write?  But same_data_refs_base_objects is only verifying that the two refs are
> suitable for classical dependence analysis, so it's not a conservative test here.  I
> think you may want to use dr_may_alias_p instead?
> 
> I'm missing some overall idea of what you are doing, like what's the actual
> transform and how do you validate its validity?
> 

So the basic idea is that we should move everything with a side effect to after all
the early exits.  I reasoned that most things with side effects would either block
vectorization entirely or are stores.  This is why it essentially just looks at stores
and the statements that create them.

> It looks like you only move stores?

Yeah, though an earlier version of the patch also moved, if possible the statements
creating the value for the stores.  And I think I'll have to go back to that version again.

The reason is that with this new BB layout and how we "layer" the BB for the early exits
and main exit it seems like sched1 no longer is able to schedule instructions over the EBBS.

This leads to us extending the live ranges for the statements creating the values and causing
reload to having to copy the values in some cases.

So 

x = a + I;
y[i] = x;
If (..) { }

and moving the store alone can end up making reload copy the value of x.   To fix this I should
probably move x as well.  This code is also checking if that's possible, since you can't move x if it's
used by something that can't be moved. Say, if the condition was `if (b[i] > x)`.

> 
> > +		      if (dump_enabled_p ())
> > +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > +					   vect_location,
> > +					   "early breaks only supported,"
> > +					   " overlapping loads and stores
> found"
> > +					   " before the break statement.\n");
> > +		      return false;
> > +		    }
> > +		/* Any writes starts a new chain. */
> > +		move = true;
> > +	     }
> > +	}
> > +
> > +      /* If a statement is live and escapes the loop through usage in the loop
> > +	 epilogue then we can't move it since we need to maintain its
> > +	 reachability through all exits.  */
> > +      bool skip = false;
> > +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> > +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> > +	{
> > +	  imm_use_iterator imm_iter;
> > +	  use_operand_p use_p;
> > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> > +	    {
> > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +	      if (skip)
> > +		break;
> > +	    }
> > +	}
> > +
> > +      /* If we found the defining statement of a something that's part of the
> > +	 chain then expand the chain with the new SSA_VARs being used.  */
> > +      if (!skip && (chain->contains (dest) || move))
> > +	{
> > +	  move = true;
> > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > +	    {
> > +	      tree var = gimple_arg (stmt, x);
> > +	      if (TREE_CODE (var) == SSA_NAME)
> > +		{
> > +		  if (fixed->contains (dest))
> > +		    {
> > +		      move = false;
> > +		      fixed->add (var);
> > +		    }
> > +		  else
> > +		    chain->add (var);
> > +		}
> > +	      else
> > +		{
> > +		  use_operand_p use_p;
> > +		  ssa_op_iter iter;
> > +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
> SSA_OP_USE)
> > +		    {
> > +		      tree op = USE_FROM_PTR (use_p);
> > +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> > +		      if (fixed->contains (dest))
> > +			{
> > +			  move = false;
> > +			  fixed->add (op);
> > +			}
> > +		      else
> > +			chain->add (op);
> > +		    }
> > +		}
> > +	    }
> > +
> > +	  if (dump_enabled_p ())
> > +	    {
> > +	      if (move)
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"found chain %G", stmt);
> > +	      else
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"ignored chain %G, not single use", stmt);
> > +	    }
> > +	}
> > +
> > +      if (move)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > +			     "==> recording stmt %G", stmt);
> > +
> > +	  for (tree ref : loads)
> > +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> > +	      {
> > +	        if (dump_enabled_p ())
> > +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				   "early breaks not supported as memory
> used"
> > +				   " may alias.\n");
> > +	        return false;
> > +	      }
> 
> Here you check aliasing again?!
> 
> I think it might be conceptually easier (and stronger) to instead think of the
> 'fixed' set (and the gconds) to be moved earlier instead of the stores to be
> sunk.
> 
> For example I fail to see how you check for, say
> 
>    for (..)
>     {
>       tem = a[i] / b[i];
>       if (c[i]) break;
>       d[i] = tem;
>     }
> 
> where the division might trap.  For this the validation wouldn't identify
> anything to move, right?
> 

Hmm yes I ignored it because I figured we wouldn't vectorize anyway with -ftrapping-math?
I guess I should call gimple_has_side_effects on the stmt but figured we wouldn't get here.

> I'll note that doing the actual movement will be easier with SLP and it would be
> a possibility to implement early break with just SLP support - as we need to
> start discovery from the gconds explicitly anyway there's no problem forcing a
> single-lane SLP discovery there.
> 

Possibly, but I think we'd still have problem with figuring out what to do with the live
range issue.  I guess long term the issue should be fixed in sched1?

> > +
> > +	  /* If we've moved a VDEF, extract the defining MEM and update
> > +	     usages of it.   */
> > +	  tree vdef;
> > +	  if ((vdef = gimple_vdef (stmt)))
> > +	    {
> > +	      /* This statement is to be moved.  */
> > +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo).safe_push (stmt);
> > +	      *reaching_vuse = gimple_vuse (stmt);
> > +	    }
> > +	}
> > +    }
> > +
> > +  gsi_prev (gstmt);
> > +
> > +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> > +				  reaching_vuse, gstmt))
> > +    return false;
> 
> Please use a loop instead of recursion.  I suggest to do the loop at the single
> caller.
> 
> > +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> > +    {
> > +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > +			   "marked statement for vUSE update: %G", stmt);
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Funcion vect_analyze_early_break_dependences.
> > +
> > +   Examime all the data references in the loop and make sure that if we have
> > +   mulitple exits that we are able to safely move stores such that they
> become
> > +   safe for vectorization.  The function also calculates the place where to
> move
> > +   the instructions to and computes what the new vUSE chain should be.
> > +
> > +   This works in tandem with the CFG that will be produced by
> > +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> > +
> > +static opt_result
> > +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) {
> > +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> > +
> > +  hash_set<tree> chain, fixed;
> > +  auto_vec<tree> loads;
> > +  auto_vec<data_reference *> bases;
> > +  basic_block dest_bb = NULL;
> > +  tree vuse = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location,
> > +		     "loop contains multiple exits, analyzing"
> > +		     " statement dependencies.\n");
> > +
> > +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> > +    {
> > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> > +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> > +	continue;
> > +
> > +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
> 
> isn't that 'c' already?
> 
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > +
> > +      /* Initiaze the vuse chain with the one at the early break.  */
> > +      if (!vuse)
> > +	vuse = gimple_vuse (c);
> 
> gconds do not have virtual operands.
> 
> > +
> > +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> > +				     &bases, &vuse, &gsi))
> > +	return opt_result::failure_at (stmt,
> > +				       "can't safely apply code motion to "
> > +				       "dependencies of %G to vectorize "
> > +				       "the early exit.\n", stmt);
> > +
> > +      /* Save destination as we go, BB are visited in order and the last one
> > +	is where statements should be moved to.  */
> > +      if (!dest_bb)
> > +	dest_bb = gimple_bb (c);
> > +      else
> > +	{
> > +	  basic_block curr_bb = gimple_bb (c);
> > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > +	    dest_bb = curr_bb;
> > +	}
> > +    }
> > +
> > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> 
> no edge is the fallthru edge out of a condition, so this always selects
> EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> think you instead want
> 
>   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> >dest;
> 
> more nicely written, of course.
> 
> > +  gcc_assert (dest_bb);
> > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> 
> Sorting the vector of early breaks as we gather them might be nicer than this -
> you'd then simply use the first or last.
> 
> > +
> > +  /* TODO: Remove? It's useful debug statement but may be too much.
> > +*/
> > +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "updated use: %T, mem_ref: %G",
> > +			 vuse, g);
> > +    }
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location,
> > +		     "recorded statements to be moved to BB %d\n",
> > +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> > +
> > +  return opt_result::success ();
> > +}
> > +
> >  /* Function vect_analyze_data_ref_dependences.
> >
> >     Examine all the data references in the loop, and make sure there
> > do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences
> (loop_vec_info loop_vinfo,
> >  	  return res;
> >        }
> >
> > +  /* If we have early break statements in the loop, check to see if they
> > +     are of a form we can vectorizer.  */  if
> > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    return vect_analyze_early_break_dependences (loop_vinfo);
> > +
> >    return opt_result::success ();
> >  }
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a
> 2079c523
> > 4033c3d825ea 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop
> *loop_in, vec_info_shared *shared)
> >      partial_load_store_bias (0),
> >      peeling_for_gaps (false),
> >      peeling_for_niter (false),
> > +    early_breaks (false),
> >      no_data_dependencies (false),
> >      has_mask_store (false),
> >      scalar_loop_scaling (profile_probability::uninitialized ()), @@
> > -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop
> *epilogue, tree advance)
> >    epilogue_vinfo->shared->save_datarefs ();  }
> >
> > +/*  When vectorizing early break statements instructions that happen
> before
> > +    the early break in the current BB need to be moved to after the early
> > +    break.  This function deals with that and assumes that any validity
> > +    checks has already been performed.
> > +
> > +    While moving the instructions if it encounters a VUSE or VDEF it then
> > +    corrects the VUSES as it moves the statements along.  GDEST is the
> location
> > +    in which to insert the new statements.  */
> > +
> > +static void
> > +move_early_exit_stmts (loop_vec_info loop_vinfo) {
> > +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> > +    return;
> > +
> > +  /* Move all stmts that need moving.  */  basic_block dest_bb =
> > + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> 
> I suppose dest_bb is the in-loop block following the last early exit?  I suppose
> we do not support an "early" exit after the main IV exit, right?  Instead we'd
> require loop rotation?

Indeed, this is also keeping in mind that when we add general control flow
we don't want to move it past the control flow inside the loop.  This would
extend the live ranges too much.

> 
> > +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > +
> > +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo))
> > +    {
> > +      /* Check to see if statement is still required for vect or has been
> > +	 elided.  */
> > +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> > +      if (!stmt_info)
> > +	continue;
> > +
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G",
> stmt);
> > +
> > +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> > +      gsi_move_before (&stmt_gsi, &dest_gsi);
> > +      gsi_prev (&dest_gsi);
> > +      update_stmt (stmt);
> 
> You shouldn't need to update_stmt here I think.
> 
> > +    }
> > +
> > +  /* Update all the stmts with their new reaching VUSES.  */
> > +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > +(loop_vinfo).last ());
> > +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > +			   "updating vuse to %T for stmt %G", vuse, p);
> > +      unlink_stmt_vdef (p);
> 
> it's odd to first move the stmts and then propagate out their defs (which you
> forget to release?)
> 
> > +      gimple_set_vuse (p, vuse);
> 
> and now every store gets the same vuse?  I'm quite sure you'll end up with
> broken virtual SSA form here.
> 
No, not every store, but every load.   Since we've moved everything that can
introduce a new vDEF, then all the use of memory before the last early exit
must be using the same vUSE.  The loop never has to update stores because
they are moved in order.

Regards,
Tamar

> > +      update_stmt (p);
> > +    }
> > +}
> > +
> >  /* Function vect_transform_loop.
> >
> >     The analysis phase has determined that the loop is vectorizable.
> > @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES
> (loop_vinfo));
> >      }
> >
> > +  /* Handle any code motion that we need to for early-break vectorization
> after
> > +     we've done peeling but just before we start vectorizing.  */  if
> > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    move_early_exit_stmts (loop_vinfo);
> > +
> >    /* FORNOW: the vectorizer supports only loops which body consist
> >       of one basic block (header + empty latch). When the vectorizer will
> >       support more involved loop forms, the order by which the BBs are
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> >
> 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a9
> 1cb8e74c0
> > 557e75d1ea2c 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info
> *vinfo, enum vect_def_type *dt,
> >  	case vect_first_order_recurrence:
> >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> >  	  break;
> > +       case vect_early_exit_def:
> > +	  dump_printf (MSG_NOTE, "early exit\n");
> > +	  break;
> >  	case vect_unknown_def_type:
> >  	  dump_printf (MSG_NOTE, "unknown\n");
> >  	  break;
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf7835
> 2e29dc9
> > 958746fb9c94 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -66,6 +66,7 @@ enum vect_def_type {
> >    vect_double_reduction_def,
> >    vect_nested_cycle,
> >    vect_first_order_recurrence,
> > +  vect_early_exit_def,
> >    vect_unknown_def_type
> >  };
> >
> > @@ -888,6 +889,10 @@ public:
> >       we need to peel off iterations at the end to form an epilogue loop.  */
> >    bool peeling_for_niter;
> >
> > +  /* When the loop has early breaks that we can vectorize we need to peel
> > +     the loop for the break finding loop.  */  bool early_breaks;
> > +
> >    /* List of loop additional IV conditionals found in the loop.  */
> >    auto_vec<gcond *> conds;
> >
> > @@ -942,6 +947,20 @@ public:
> >    /* The controlling loop IV for the scalar loop being vectorized.  This IV
> >       controls the natural exits of the loop.  */
> >    edge scalar_loop_iv_exit;
> > +
> > +  /* Used to store the list of statements needing to be moved if doing early
> > +     break vectorization as they would violate the scalar loop semantics if
> > +     vectorized in their current location.  These are stored in order that they
> need
> > +     to be moved.  */
> > +  auto_vec<gimple *> early_break_conflict;
> > +
> > +  /* The final basic block where to move statements to.  In the case of
> > +     multiple exits this could be pretty far away.  */  basic_block
> > + early_break_dest_bb;
> > +
> > +  /* Statements whose VUSES need updating if early break vectorization is
> to
> > +     happen.  */
> > +  auto_vec<gimple*> early_break_vuses;
> >  } *loop_vec_info;
> >
> >  /* Access Functions.  */
> > @@ -996,6 +1015,10 @@ public:
> >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> >early_break_conflict
> > +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> >no_data_dependencies
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
Richard Biener Nov. 7, 2023, 2:23 p.m. UTC | #3
On Tue, 7 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, November 7, 2023 10:53 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 3/21]middle-end: Implement code motion and
> > dependency analysis for early breaks
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > When performing early break vectorization we need to be sure that the
> > > vector operations are safe to perform.  A simple example is e.g.
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    vect_b[i] = x + i;
> > >    if (vect_a[i]*2 != x)
> > >      break;
> > >    vect_a[i] = x;
> > >  }
> > >
> > > where the store to vect_b is not allowed to be executed
> > > unconditionally since if we exit through the early break it wouldn't
> > > have been done for the full VF iteration.
> > >
> > > Effective the code motion determines:
> > >   - is it safe/possible to vectorize the function
> > >   - what updates to the VUSES should be performed if we do
> > >   - Which statements need to be moved
> > >   - Which statements can't be moved:
> > >     * values that are live must be reachable through all exits
> > >     * values that aren't single use and shared by the use/def chain of the cond
> > >   - The final insertion point of the instructions.  In the cases we have
> > >     multiple early exist statements this should be the one closest to the loop
> > >     latch itself.
> > >
> > > After motion the loop above is:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    ... y = x + i;
> > >    if (vect_a[i]*2 != x)
> > >      break;
> > >    vect_b[i] = y;
> > >    vect_a[i] = x;
> > >
> > >  }
> > >
> > > The operation is split into two, during data ref analysis we determine
> > > validity of the operation and generate a worklist of actions to
> > > perform if we vectorize.
> > >
> > > After peeling and just before statetement tranformation we replay this
> > > worklist which moves the statements and updates book keeping only in
> > > the main loop that's to be vectorized.  This includes updating of USES in exit
> > blocks.
> > >
> > > At the moment we don't support this for epilog nomasks since the
> > > additional vectorized epilog's stmt UIDs are not found.
> > 
> > As of UIDs note that UIDs are used for dominance checking in
> > vect_stmt_dominates_stmt_p and that at least is used during transform when
> > scheduling SLP.  Moving stmts around invalidates this UID order (I don't see
> > you "renumbering" UIDs).
> > 
> 
> Just some responses to questions while I process the rest.
> 
> I see, yeah I didn't encounter it because I punted SLP support.  As you said for SLP
> We indeed don't need this.
> 
> > More comments below.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> > > 	(vect_analyze_early_break_dependences): New.
> > > 	(vect_analyze_data_ref_dependences): Use them.
> > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > 	early_breaks.
> > > 	(move_early_exit_stmts): New.
> > > 	(vect_transform_loop): use it/
> > > 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> > > 	early_break_vuses.
> > > 	(LOOP_VINFO_EARLY_BREAKS): New.
> > > 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> > > 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> > > 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> > >
> > d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c
> > 32b9ce
> > > 7be77f3e1d60 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct
> > data_dependence_relation *ddr,
> > >    return opt_result::success ();
> > >  }
> > >
> > > +/* This function tries to validate whether an early break vectorization
> > > +   is possible for the current instruction sequence. Returns True i
> > > +   possible, otherwise False.
> > > +
> > > +   Requirements:
> > > +     - Any memory access must be to a fixed size buffer.
> > > +     - There must not be any loads and stores to the same object.
> > > +     - Multiple loads are allowed as long as they don't alias.
> > > +
> > > +   NOTE:
> > > +     This implemementation is very conservative. Any overlappig loads/stores
> > > +     that take place before the early break statement gets rejected aside from
> > > +     WAR dependencies.
> > > +
> > > +     i.e.:
> > > +
> > > +	a[i] = 8
> > > +	c = a[i]
> > > +	if (b[i])
> > > +	  ...
> > > +
> > > +	is not allowed, but
> > > +
> > > +	c = a[i]
> > > +	a[i] = 8
> > > +	if (b[i])
> > > +	  ...
> > > +
> > > +	is which is the common case.
> > > +
> > > +   Arguments:
> > > +     - LOOP_VINFO: loop information for the current loop.
> > > +     - CHAIN: Currently detected sequence of instructions that need to be
> > moved
> > > +	      if we are to vectorize this early break.
> > > +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are
> > reachable from
> > > +	      one or more cond conditions.  If this set overlaps with CHAIN then
> > FIXED
> > > +	      takes precedence.  This deals with non-single use cases.
> > > +     - LOADS: List of all loads found during traversal.
> > > +     - BASES: List of all load data references found during traversal.
> > > +     - GSTMT: Current position to inspect for validity.  The sequence
> > > +	      will be moved upwards from this point.
> > > +     - REACHING_VUSE: The dominating VUSE found so far.  */
> > > +
> > > +static bool
> > > +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree>
> > *chain,
> > > +			   hash_set<tree> *fixed, vec<tree> *loads,
> > > +			   vec<data_reference *> *bases, tree *reaching_vuse,
> > > +			   gimple_stmt_iterator *gstmt)
> > > +{
> > > +  if (gsi_end_p (*gstmt))
> > > +    return true;
> > > +
> > > +  gimple *stmt = gsi_stmt (*gstmt);
> > > +  /* ?? Do I need to move debug statements? not quite sure..  */
> > 
> > I think we reset them.
> > 
> > > +  if (gimple_has_ops (stmt)
> > > +      && !is_gimple_debug (stmt))
> > > +    {
> > > +      tree dest = NULL_TREE;
> > > +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> > > +	 use the LHS, if not, assume that the first argument of a call is the
> > > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > > +      if (gimple_has_lhs (stmt))
> > > +	dest = gimple_get_lhs (stmt);
> > > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > +	dest = gimple_arg (call, 0);
> > > +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> > > +	{
> > > +	  /* Operands of conds are ones we can't move.  */
> > > +	  fixed->add (gimple_cond_lhs (cond));
> > > +	  fixed->add (gimple_cond_rhs (cond));
> > > +	}
> > > +
> > > +      bool move = false;
> > 
> > 
> > So this all looks a bit like spaghetti (sorry).  I think what you want to do is
> > perform this in two steps:
> > 
> >  a) mark (and check) the dependences of the early break conditions,
> >     aka populate 'fixed'
> >  b) walk stmts from the _last_ early break, verifying all earlier
> >     non-'fixed' stmts can be moved
> > 
> > for marking dependences you want to simply iterate over use
> > operands:
> > 
> >   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> >     USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
> >     the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
> >     a worklist with a visited set to gather all of the relevant
> >     stmts/defs
> > 
> > > +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > +      if (!stmt_vinfo)
> > > +	{
> > > +	   if (dump_enabled_p ())
> > > +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			      "early breaks not supported. Unknown"
> > > +			      " statement: %G", stmt);
> > > +	   return false;
> > > +	}
> > > +
> > > +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > +      if (dr_ref)
> > > +	{
> > > +	   /* We currently only support statically allocated objects due to
> > > +	      not having first-faulting loads support or peeling for alignment
> > > +	      support.  Compute the size of the referenced object (it could be
> > > +	      dynamically allocated).  */
> > > +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> > > +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > > +	     {
> > > +	       if (dump_enabled_p ())
> > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				  "early breaks only supported on statically"
> > > +				  " allocated objects.\n");
> > > +	       return false;
> > > +	     }
> > > +
> > > +	   tree refop = TREE_OPERAND (obj, 0);
> > > +	   tree refbase = get_base_address (refop);
> > > +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > > +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > > +	     {
> > > +	       if (dump_enabled_p ())
> > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				  "early breaks only supported on statically"
> > > +				  " allocated objects.\n");
> > > +	       return false;
> > > +	     }
> > 
> > Note this doesn't ensure in-bound accesses:
> > 
> > int a[4];
> > 
> > void foo ()
> > {
> >   for (unsigned int i = 0; i < 32; ++i)
> >    {
> >      if (a[i] == 0)
> >        break;
> >      /* ... */
> >    }
> > }
> > 
> > you'd happily load a V8SImode vector from 'a'.  If the caller ensures a[3] == 0
> > the code is fine but your transformed vector code not.  You need to check that
> > DR_BASE_ADDRESS + DR_OFFSET
> > + DR_INIT + niter * DR_STEP is within the object instead.
> > 
> > > +
> > > +	   if (DR_IS_READ (dr_ref))
> > > +	     {
> > > +		loads->safe_push (dest);
> > > +		bases->safe_push (dr_ref);
> > > +	     }
> > > +	   else if (DR_IS_WRITE (dr_ref))
> > > +	     {
> > > +		for (auto dr : bases)
> > > +		  if (same_data_refs_base_objects (dr, dr_ref))
> > > +		    {
> > 
> > that looks quadratic to me.  So what's this actually?  You've gathered all loads
> > after this write and now you are checking that all those loads do not alias the
> > write?  But same_data_refs_base_objects is only verifying that the two refs are
> > suitable for classical dependence analysis, so it's not a conservative test here.  I
> > think you may want to use dr_may_alias_p instead?

So while thinking about the code motion it's also that when you move
the stores you invalidate all previous data dependence analysis checks
that involved them since a read-after-write dependence might now
become a write-after-read one.  So _maybe_ this check wanted to ensures
we didn't derive any affine dependence distance/direction for the
two refs before as we can only (well, not 100% true..) do this for
same_data_refs_base_objects?  As said a few more comments would really
help here.

Below you're using stmt_may_clobber_ref_p, but that's only valid to
use if you transform a scalar sequence of stmts - what you are doing
is altering stmt ordering across iterations since vectorization with
a VF > 1 involves unrolling.  I think you need to re-formulate this
in terms of datat dependence analysis checks or use
dr_may_alias_p (..., loop_nest) instead of stmt_may_clobber_ref_p.

> > 
> > I'm missing some overall idea of what you are doing, like what's the actual
> > transform and how do you validate its validity?
> > 
> 
> So the basic idea is that we should move everything with a side effect to after all
> the early exits.  I reasoned that most things with side effects would either block
> vectorization entirely or are stores.  This is why it essentially just looks at stores
> and the statements that create them.
> 
> > It looks like you only move stores?
> 
> Yeah, though an earlier version of the patch also moved, if possible the statements
> creating the value for the stores.  And I think I'll have to go back to that version again.
> 
> The reason is that with this new BB layout and how we "layer" the BB for the early exits
> and main exit it seems like sched1 no longer is able to schedule instructions over the EBBS.
> 
> This leads to us extending the live ranges for the statements creating the values and causing
> reload to having to copy the values in some cases.
> 
> So 
> 
> x = a + I;
> y[i] = x;
> If (..) { }
> 
> and moving the store alone can end up making reload copy the value of x.   To fix this I should
> probably move x as well.  This code is also checking if that's possible, since you can't move x if it's
> used by something that can't be moved. Say, if the condition was `if (b[i] > x)`.

If you'd move the cbranch dependences up instead, simply by scheduling
their SLP instances first you would leave the order of the rest of the
stmts unperturbed.  The only adjustment would be needed to
vect_schedule_slp_node to make sure to schedule all other instances
stmts after the last cbranch.

> > 
> > > +		      if (dump_enabled_p ())
> > > +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > +					   vect_location,
> > > +					   "early breaks only supported,"
> > > +					   " overlapping loads and stores
> > found"
> > > +					   " before the break statement.\n");
> > > +		      return false;
> > > +		    }
> > > +		/* Any writes starts a new chain. */
> > > +		move = true;
> > > +	     }
> > > +	}
> > > +
> > > +      /* If a statement is live and escapes the loop through usage in the loop
> > > +	 epilogue then we can't move it since we need to maintain its
> > > +	 reachability through all exits.  */
> > > +      bool skip = false;
> > > +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> > > +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> > > +	{
> > > +	  imm_use_iterator imm_iter;
> > > +	  use_operand_p use_p;
> > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> > > +	    {
> > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +	      if (skip)
> > > +		break;
> > > +	    }
> > > +	}
> > > +
> > > +      /* If we found the defining statement of a something that's part of the
> > > +	 chain then expand the chain with the new SSA_VARs being used.  */
> > > +      if (!skip && (chain->contains (dest) || move))
> > > +	{
> > > +	  move = true;
> > > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > +	    {
> > > +	      tree var = gimple_arg (stmt, x);
> > > +	      if (TREE_CODE (var) == SSA_NAME)
> > > +		{
> > > +		  if (fixed->contains (dest))
> > > +		    {
> > > +		      move = false;
> > > +		      fixed->add (var);
> > > +		    }
> > > +		  else
> > > +		    chain->add (var);
> > > +		}
> > > +	      else
> > > +		{
> > > +		  use_operand_p use_p;
> > > +		  ssa_op_iter iter;
> > > +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
> > SSA_OP_USE)
> > > +		    {
> > > +		      tree op = USE_FROM_PTR (use_p);
> > > +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> > > +		      if (fixed->contains (dest))
> > > +			{
> > > +			  move = false;
> > > +			  fixed->add (op);
> > > +			}
> > > +		      else
> > > +			chain->add (op);
> > > +		    }
> > > +		}
> > > +	    }
> > > +
> > > +	  if (dump_enabled_p ())
> > > +	    {
> > > +	      if (move)
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				"found chain %G", stmt);
> > > +	      else
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				"ignored chain %G, not single use", stmt);
> > > +	    }
> > > +	}
> > > +
> > > +      if (move)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > +			     "==> recording stmt %G", stmt);
> > > +
> > > +	  for (tree ref : loads)
> > > +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> > > +	      {
> > > +	        if (dump_enabled_p ())
> > > +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				   "early breaks not supported as memory
> > used"
> > > +				   " may alias.\n");
> > > +	        return false;
> > > +	      }
> > 
> > Here you check aliasing again?!
> > 
> > I think it might be conceptually easier (and stronger) to instead think of the
> > 'fixed' set (and the gconds) to be moved earlier instead of the stores to be
> > sunk.
> > 
> > For example I fail to see how you check for, say
> > 
> >    for (..)
> >     {
> >       tem = a[i] / b[i];
> >       if (c[i]) break;
> >       d[i] = tem;
> >     }
> > 
> > where the division might trap.  For this the validation wouldn't identify
> > anything to move, right?
> > 
> 
> Hmm yes I ignored it because I figured we wouldn't vectorize anyway with -ftrapping-math?
> I guess I should call gimple_has_side_effects on the stmt but figured we wouldn't get here.
> 
> > I'll note that doing the actual movement will be easier with SLP and it would be
> > a possibility to implement early break with just SLP support - as we need to
> > start discovery from the gconds explicitly anyway there's no problem forcing a
> > single-lane SLP discovery there.
> > 
> 
> Possibly, but I think we'd still have problem with figuring out what to do with the live
> range issue.  I guess long term the issue should be fixed in sched1?
> 
> > > +
> > > +	  /* If we've moved a VDEF, extract the defining MEM and update
> > > +	     usages of it.   */
> > > +	  tree vdef;
> > > +	  if ((vdef = gimple_vdef (stmt)))
> > > +	    {
> > > +	      /* This statement is to be moved.  */
> > > +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo).safe_push (stmt);
> > > +	      *reaching_vuse = gimple_vuse (stmt);
> > > +	    }
> > > +	}
> > > +    }
> > > +
> > > +  gsi_prev (gstmt);
> > > +
> > > +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> > > +				  reaching_vuse, gstmt))
> > > +    return false;
> > 
> > Please use a loop instead of recursion.  I suggest to do the loop at the single
> > caller.
> > 
> > > +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> > > +    {
> > > +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > +			   "marked statement for vUSE update: %G", stmt);
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Funcion vect_analyze_early_break_dependences.
> > > +
> > > +   Examime all the data references in the loop and make sure that if we have
> > > +   mulitple exits that we are able to safely move stores such that they
> > become
> > > +   safe for vectorization.  The function also calculates the place where to
> > move
> > > +   the instructions to and computes what the new vUSE chain should be.
> > > +
> > > +   This works in tandem with the CFG that will be produced by
> > > +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> > > +
> > > +static opt_result
> > > +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) {
> > > +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> > > +
> > > +  hash_set<tree> chain, fixed;
> > > +  auto_vec<tree> loads;
> > > +  auto_vec<data_reference *> bases;
> > > +  basic_block dest_bb = NULL;
> > > +  tree vuse = NULL;
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > +		     "loop contains multiple exits, analyzing"
> > > +		     " statement dependencies.\n");
> > > +
> > > +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> > > +    {
> > > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> > > +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> > > +	continue;
> > > +
> > > +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
> > 
> > isn't that 'c' already?
> > 
> > > +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > > +
> > > +      /* Initiaze the vuse chain with the one at the early break.  */
> > > +      if (!vuse)
> > > +	vuse = gimple_vuse (c);
> > 
> > gconds do not have virtual operands.
> > 
> > > +
> > > +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> > > +				     &bases, &vuse, &gsi))
> > > +	return opt_result::failure_at (stmt,
> > > +				       "can't safely apply code motion to "
> > > +				       "dependencies of %G to vectorize "
> > > +				       "the early exit.\n", stmt);
> > > +
> > > +      /* Save destination as we go, BB are visited in order and the last one
> > > +	is where statements should be moved to.  */
> > > +      if (!dest_bb)
> > > +	dest_bb = gimple_bb (c);
> > > +      else
> > > +	{
> > > +	  basic_block curr_bb = gimple_bb (c);
> > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > +	    dest_bb = curr_bb;
> > > +	}
> > > +    }
> > > +
> > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > 
> > no edge is the fallthru edge out of a condition, so this always selects
> > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > think you instead want
> > 
> >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > >dest;
> > 
> > more nicely written, of course.
> > 
> > > +  gcc_assert (dest_bb);
> > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > 
> > Sorting the vector of early breaks as we gather them might be nicer than this -
> > you'd then simply use the first or last.
> > 
> > > +
> > > +  /* TODO: Remove? It's useful debug statement but may be too much.
> > > +*/
> > > +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "updated use: %T, mem_ref: %G",
> > > +			 vuse, g);
> > > +    }
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > +		     "recorded statements to be moved to BB %d\n",
> > > +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> > > +
> > > +  return opt_result::success ();
> > > +}
> > > +
> > >  /* Function vect_analyze_data_ref_dependences.
> > >
> > >     Examine all the data references in the loop, and make sure there
> > > do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences
> > (loop_vec_info loop_vinfo,
> > >  	  return res;
> > >        }
> > >
> > > +  /* If we have early break statements in the loop, check to see if they
> > > +     are of a form we can vectorizer.  */  if
> > > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    return vect_analyze_early_break_dependences (loop_vinfo);
> > > +
> > >    return opt_result::success ();
> > >  }
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a
> > 2079c523
> > > 4033c3d825ea 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop
> > *loop_in, vec_info_shared *shared)
> > >      partial_load_store_bias (0),
> > >      peeling_for_gaps (false),
> > >      peeling_for_niter (false),
> > > +    early_breaks (false),
> > >      no_data_dependencies (false),
> > >      has_mask_store (false),
> > >      scalar_loop_scaling (profile_probability::uninitialized ()), @@
> > > -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop
> > *epilogue, tree advance)
> > >    epilogue_vinfo->shared->save_datarefs ();  }
> > >
> > > +/*  When vectorizing early break statements instructions that happen
> > before
> > > +    the early break in the current BB need to be moved to after the early
> > > +    break.  This function deals with that and assumes that any validity
> > > +    checks has already been performed.
> > > +
> > > +    While moving the instructions if it encounters a VUSE or VDEF it then
> > > +    corrects the VUSES as it moves the statements along.  GDEST is the
> > location
> > > +    in which to insert the new statements.  */
> > > +
> > > +static void
> > > +move_early_exit_stmts (loop_vec_info loop_vinfo) {
> > > +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> > > +    return;
> > > +
> > > +  /* Move all stmts that need moving.  */  basic_block dest_bb =
> > > + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> > 
> > I suppose dest_bb is the in-loop block following the last early exit?  I suppose
> > we do not support an "early" exit after the main IV exit, right?  Instead we'd
> > require loop rotation?
> 
> Indeed, this is also keeping in mind that when we add general control flow
> we don't want to move it past the control flow inside the loop.  This would
> extend the live ranges too much.
> 
> > 
> > > +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > > +
> > > +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo))
> > > +    {
> > > +      /* Check to see if statement is still required for vect or has been
> > > +	 elided.  */
> > > +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> > > +      if (!stmt_info)
> > > +	continue;
> > > +
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G",
> > stmt);
> > > +
> > > +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> > > +      gsi_move_before (&stmt_gsi, &dest_gsi);
> > > +      gsi_prev (&dest_gsi);
> > > +      update_stmt (stmt);
> > 
> > You shouldn't need to update_stmt here I think.
> > 
> > > +    }
> > > +
> > > +  /* Update all the stmts with their new reaching VUSES.  */
> > > +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > > +(loop_vinfo).last ());
> > > +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > +			   "updating vuse to %T for stmt %G", vuse, p);
> > > +      unlink_stmt_vdef (p);
> > 
> > it's odd to first move the stmts and then propagate out their defs (which you
> > forget to release?)
> > 
> > > +      gimple_set_vuse (p, vuse);
> > 
> > and now every store gets the same vuse?  I'm quite sure you'll end up with
> > broken virtual SSA form here.
> > 
> No, not every store, but every load.   Since we've moved everything that can
> introduce a new vDEF, then all the use of memory before the last early exit
> must be using the same vUSE.  The loop never has to update stores because
> they are moved in order.

Ah, I was confused about the unlink_stmt_vdef - if 
LOOP_VINFO_EARLY_BRK_VUSES are only loads then there are no VDEFs
involved and unlink_stmt_vdef does nothing.

Richard.

> Regards,
> Tamar
> 
> > > +      update_stmt (p);
> > > +    }
> > > +}
> > > +
> > >  /* Function vect_transform_loop.
> > >
> > >     The analysis phase has determined that the loop is vectorizable.
> > > @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES
> > (loop_vinfo));
> > >      }
> > >
> > > +  /* Handle any code motion that we need to for early-break vectorization
> > after
> > > +     we've done peeling but just before we start vectorizing.  */  if
> > > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    move_early_exit_stmts (loop_vinfo);
> > > +
> > >    /* FORNOW: the vectorizer supports only loops which body consist
> > >       of one basic block (header + empty latch). When the vectorizer will
> > >       support more involved loop forms, the order by which the BBs are
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > >
> > 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a9
> > 1cb8e74c0
> > > 557e75d1ea2c 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info
> > *vinfo, enum vect_def_type *dt,
> > >  	case vect_first_order_recurrence:
> > >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> > >  	  break;
> > > +       case vect_early_exit_def:
> > > +	  dump_printf (MSG_NOTE, "early exit\n");
> > > +	  break;
> > >  	case vect_unknown_def_type:
> > >  	  dump_printf (MSG_NOTE, "unknown\n");
> > >  	  break;
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf7835
> > 2e29dc9
> > > 958746fb9c94 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -66,6 +66,7 @@ enum vect_def_type {
> > >    vect_double_reduction_def,
> > >    vect_nested_cycle,
> > >    vect_first_order_recurrence,
> > > +  vect_early_exit_def,
> > >    vect_unknown_def_type
> > >  };
> > >
> > > @@ -888,6 +889,10 @@ public:
> > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > >    bool peeling_for_niter;
> > >
> > > +  /* When the loop has early breaks that we can vectorize we need to peel
> > > +     the loop for the break finding loop.  */  bool early_breaks;
> > > +
> > >    /* List of loop additional IV conditionals found in the loop.  */
> > >    auto_vec<gcond *> conds;
> > >
> > > @@ -942,6 +947,20 @@ public:
> > >    /* The controlling loop IV for the scalar loop being vectorized.  This IV
> > >       controls the natural exits of the loop.  */
> > >    edge scalar_loop_iv_exit;
> > > +
> > > +  /* Used to store the list of statements needing to be moved if doing early
> > > +     break vectorization as they would violate the scalar loop semantics if
> > > +     vectorized in their current location.  These are stored in order that they
> > need
> > > +     to be moved.  */
> > > +  auto_vec<gimple *> early_break_conflict;
> > > +
> > > +  /* The final basic block where to move statements to.  In the case of
> > > +     multiple exits this could be pretty far away.  */  basic_block
> > > + early_break_dest_bb;
> > > +
> > > +  /* Statements whose VUSES need updating if early break vectorization is
> > to
> > > +     happen.  */
> > > +  auto_vec<gimple*> early_break_vuses;
> > >  } *loop_vec_info;
> > >
> > >  /* Access Functions.  */
> > > @@ -996,6 +1015,10 @@ public:
> > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > >early_break_conflict
> > > +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > > +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > >no_data_dependencies
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
>
Tamar Christina Dec. 19, 2023, 10:11 a.m. UTC | #4
> > > > +      /* Save destination as we go, BB are visited in order and the last one
> > > > +	is where statements should be moved to.  */
> > > > +      if (!dest_bb)
> > > > +	dest_bb = gimple_bb (c);
> > > > +      else
> > > > +	{
> > > > +	  basic_block curr_bb = gimple_bb (c);
> > > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > > +	    dest_bb = curr_bb;
> > > > +	}
> > > > +    }
> > > > +
> > > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > >
> > > no edge is the fallthru edge out of a condition, so this always selects
> > > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > > think you instead want
> > >
> > >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > > >dest;
> > >
> > > more nicely written, of course.
> > >
> > > > +  gcc_assert (dest_bb);
> > > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > >
> > > Sorting the vector of early breaks as we gather them might be nicer than this -
> > > you'd then simply use the first or last.
> > >

I opted not to do the sorting since I don't really need a full order between the exits here
And only need to find the last one.   A sort would be more expensive than the linear
Check here.  But I also couldn't think of a good sort key since all you have is dominate yes/no.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_57.c: New test.
	* gcc.dg/vect/vect-early-break_79.c: New test.
	* gcc.dg/vect/vect-early-break_80.c: New test.
	* gcc.dg/vect/vect-early-break_81.c: New test.
	* gcc.dg/vect/vect-early-break_83.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..8e9e780e01fd349b30da1f0a762c0306ec257ff7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,377 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable
+	      from one or more cond conditions.  If this set overlaps with CHAIN
+	      then FIXED takes precedence.  This deals with non-single use
+	      cases.
+     - BASES: List of all load data references found during traversal.  */
+  hash_set<tree> chain, fixed;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* First determine the list of statements that we can't move because they
+	 are required for the early break vectorization itself.  */
+      auto_vec <gimple *> workset;
+      workset.safe_push (c);
+      do {
+	gimple *op = workset.pop ();
+	if (visited.add (op)
+	    || is_a <gphi *> (op)
+	    || is_gimple_debug (op))
+	  continue;
+
+	if (gimple_has_lhs (op))
+	  fixed.add (gimple_get_lhs (op));
+
+	stmt_vec_info def_info = loop_vinfo->lookup_stmt (op);
+	if (!def_info)
+	  continue;
+
+	gimple *def_stmt = STMT_VINFO_STMT (def_info);
+	FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
+	  {
+	    tree use = USE_FROM_PTR (use_p);
+	    if (TREE_CODE (use) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (use))
+	      continue;
+
+	    if (gimple *g = SSA_NAME_DEF_STMT (use))
+	      workset.safe_push (g);
+	  }
+      } while (!workset.is_empty ());
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  tree dest = NULL_TREE;
+	  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	     use the LHS, if not, assume that the first argument of a call is
+	     the value being defined.  e.g. MASKED_LOAD etc.  */
+	  if (gimple_has_lhs (stmt))
+	    dest = gimple_get_lhs (stmt);
+	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	    dest = gimple_arg (call, 0);
+
+	  bool move = chain.contains (dest);
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  if (!stmt_vinfo)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported. Unknown"
+				 " statement: %G", stmt);
+	      return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+	    }
+
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (dr_ref)
+	    {
+	      /* We currently only support statically allocated objects due to
+		 not having first-faulting loads support or peeling for
+		 alignment support.  Compute the size of the referenced object
+		 (it could be dynamically allocated).  */
+	      tree obj = DR_BASE_ADDRESS (dr_ref);
+	      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "early breaks only supported on statically"
+				     " allocated objects.\n");
+		  return opt_result::failure_at (c,
+				     "can't safely apply code motion to "
+				     "dependencies of %G to vectorize "
+				     "the early exit.\n", c);
+		}
+
+	      tree refop = TREE_OPERAND (obj, 0);
+	      tree refbase = get_base_address (refop);
+	      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+		  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+		{
+		  if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "early breaks only supported on"
+				       " statically allocated objects.\n");
+		  return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+		}
+
+	      /* Check if vector accesses to the object will be within
+		 bounds.  */
+	      tree stype = TREE_TYPE (DECL_SIZE (refbase));
+	      tree access = fold_build2 (PLUS_EXPR, stype, DR_OFFSET (dr_ref),
+					 DR_INIT (dr_ref));
+	      tree final_adj
+		= fold_build2 (MULT_EXPR, stype, LOOP_VINFO_NITERS (loop_vinfo),
+			       DR_STEP (dr_ref));
+
+	      /* must be a constant or assume loop will be versioned or niters
+		 bounded by VF so accesses are within range.  */
+	      if (TREE_CODE (access) == INTEGER_CST
+		  && TREE_CODE (final_adj) == INTEGER_CST)
+		{
+		  access = fold_build2 (PLUS_EXPR, stype, access, final_adj);
+		  wide_int size = wi::to_wide (DECL_SIZE (refbase));
+		  wide_int off = wi::to_wide (access);
+		  if (wi::ge_p (off, size, UNSIGNED))
+		    {
+		      if (dump_enabled_p ())
+			dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+					 "early breaks not supported:"
+					 " vectorization would read beyond size"
+					 " of object %T.\n", obj);
+		      return opt_result::failure_at (c,
+					 "can't safely apply code motion to "
+					 "dependencies of %G to vectorize "
+					 "the early exit.\n", c);
+		    }
+		}
+
+	      if (DR_IS_READ (dr_ref))
+		bases.safe_push (dr_ref);
+	      else if (DR_IS_WRITE (dr_ref))
+		{
+		  /* We are moving writes down in the CFG.  To be sure that this
+		     is valid after vectorization we have to check all the loads
+		     we are hoisting the stores past to see if any of them may
+		     alias or are the same object.
+
+		     Same objects will not be an issue because unless the store
+		     is marked volatile the value can be forwarded.  If the
+		     store is marked volatile we don't vectorize the loop
+		     anyway.
+
+		     That leaves the check for aliasing.  We don't really need
+		     to care about the stores aliasing with each other since the
+		     stores are moved in order so the effects are still observed
+		     correctly.  This leaves the check for WAR dependencies
+		     which we would be introducing here if the DR can alias.
+		     The check is quadratic in loads/stores but I have not found
+		     a better API to do this.  I believe all loads and stores
+		     must be checked.  We also must check them when we
+		     encountered the store, since we don't care about loads past
+		     the store.  */
+
+		  for (auto dr_read : bases)
+		    if (dr_may_alias_p (dr_read, dr_ref, loop_nest))
+		      {
+			if (dump_enabled_p ())
+			    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					     vect_location,
+					     "early breaks not supported: "
+					     "overlapping loads and stores "
+					     "found before the break "
+					     "statement.\n");
+
+			return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		      }
+
+		  /* Any writes starts a new chain. */
+		  move = true;
+		}
+	    }
+
+	  /* If a statement is live and escapes the loop through usage in the
+	     loop epilogue then we can't move it since we need to maintain its
+	     reachability through all exits.  */
+	  bool skip = false;
+	  if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	      && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	    {
+	      imm_use_iterator imm_iter;
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+		{
+		  basic_block bb = gimple_bb (USE_STMT (use_p));
+		  skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+		  if (skip)
+		    break;
+		}
+	    }
+
+	  /* If we found the defining statement of a something that's part of
+	     the chain then expand the chain with the new SSA_VARs being
+	     used.  */
+	  if (!skip && move)
+	    {
+	      use_operand_p use_p;
+	      ssa_op_iter iter;
+	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		{
+		  tree op = USE_FROM_PTR (use_p);
+		  gcc_assert (TREE_CODE (op) == SSA_NAME);
+		  if (fixed.contains (dest))
+		    {
+		      move = false;
+		      fixed.add (op);
+		    }
+		  else
+		    chain.add (op);
+		}
+
+	      if (dump_enabled_p ())
+		{
+		  if (move)
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "found chain %G", stmt);
+		  else
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "ignored chain %G, not single use", stmt);
+		}
+	    }
+
+	  if (move)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      /* If we've moved a VDEF, extract the defining MEM and update
+		 usages of it.   */
+	      tree vdef;
+	      /* This statement is to be moved.  */
+	      if ((vdef = gimple_vdef (stmt)))
+		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (
+		    stmt);
+	    }
+
+	  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +1028,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..0a90d2860b8d037b72fd41d4240804aa390467ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..3c9478a3dc8750c71e0bf2a36a5b0815afc3fd94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
Richard Biener Dec. 19, 2023, 2:05 p.m. UTC | #5
On Tue, 19 Dec 2023, Tamar Christina wrote:

> > > > > +      /* Save destination as we go, BB are visited in order and the last one
> > > > > +	is where statements should be moved to.  */
> > > > > +      if (!dest_bb)
> > > > > +	dest_bb = gimple_bb (c);
> > > > > +      else
> > > > > +	{
> > > > > +	  basic_block curr_bb = gimple_bb (c);
> > > > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > > > +	    dest_bb = curr_bb;
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > > >
> > > > no edge is the fallthru edge out of a condition, so this always selects
> > > > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > > > think you instead want
> > > >
> > > >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > > > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > > > >dest;
> > > >
> > > > more nicely written, of course.
> > > >
> > > > > +  gcc_assert (dest_bb);
> > > > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > > >
> > > > Sorting the vector of early breaks as we gather them might be nicer than this -
> > > > you'd then simply use the first or last.
> > > >
> 
> I opted not to do the sorting since I don't really need a full order between the exits here
> And only need to find the last one.   A sort would be more expensive than the linear
> Check here.  But I also couldn't think of a good sort key since all you have is dominate yes/no.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_57.c: New test.
> 	* gcc.dg/vect/vect-early-break_79.c: New test.
> 	* gcc.dg/vect/vect-early-break_80.c: New test.
> 	* gcc.dg/vect/vect-early-break_81.c: New test.
> 	* gcc.dg/vect/vect-early-break_83.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,6 +5,7 @@
>  /* { dg-additional-options "-Ofast" } */
>  
>  /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>  
>  void abort ();
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#undef N
> +#define N 32
> +
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < 1024; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> @@ -0,0 +1,43 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +
> +int x;
> +__attribute__ ((noinline, noipa))
> +void foo (int *a, int *b)
> +{
> +  int local_x = x;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      if (i + local_x == 13)
> +        break;
> +      a[i] = 2 * b[i];
> +    }
> +}
> +
> +int main ()
> +{
> +  int a[1024] = {0};
> +  int b[1024] = {0};
> +
> +  for (int i = 0; i < 1024; i++)
> +    b[i] = i;
> +
> +  x = -512;
> +  foo (a, b);
> +
> +  if (a[524] != 1048)
> +    abort ();
> +
> +  if (a[525] != 0)
> +    abort ();
> +
> +  if (a[1023] != 0)
> +    abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +void abort ();
> +
> +unsigned short sa[32];
> +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[32];
> +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +int main2 (int n)
> +{
> +  int i;
> +  for (i = 0; i < n - 3; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   volatile complex double z = vect_b[i];
> +   vect_b[i] = x + i + z;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..8e9e780e01fd349b30da1f0a762c0306ec257ff7 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,377 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.
> +
> +   This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  /* - CHAIN: Currently detected sequence of instructions that need to be moved
> +	      if we are to vectorize this early break.
> +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable
> +	      from one or more cond conditions.  If this set overlaps with CHAIN
> +	      then FIXED takes precedence.  This deals with non-single use
> +	      cases.
> +     - BASES: List of all load data references found during traversal.  */
> +  hash_set<tree> chain, fixed;
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +
> +  hash_set <gimple *> visited;
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  class loop *loop_nest = loop_outer (loop);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple_stmt_iterator gsi = gsi_for_stmt (c);
> +
> +      /* First determine the list of statements that we can't move because they
> +	 are required for the early break vectorization itself.  */
> +      auto_vec <gimple *> workset;
> +      workset.safe_push (c);
> +      do {
> +	gimple *op = workset.pop ();
> +	if (visited.add (op)
> +	    || is_a <gphi *> (op)
> +	    || is_gimple_debug (op))
> +	  continue;
> +
> +	if (gimple_has_lhs (op))
> +	  fixed.add (gimple_get_lhs (op));

so this adds the LHS of stmts not in the loop - wouldn't it be
easier to add the operand itself ... (X)

> +	stmt_vec_info def_info = loop_vinfo->lookup_stmt (op);
> +	if (!def_info)
> +	  continue;
> +
> +	gimple *def_stmt = STMT_VINFO_STMT (def_info);

that's actually 'op', no?

> +	FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
> +	  {
> +	    tree use = USE_FROM_PTR (use_p);
> +	    if (TREE_CODE (use) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (use))
> +	      continue;

(X) ... here?  Or if we only care about in-loop defs add that
after the !def_info check.

> +	    if (gimple *g = SSA_NAME_DEF_STMT (use))
> +	      workset.safe_push (g);
> +	  }
> +      } while (!workset.is_empty ());
> +
> +      /* Now analyze all the remaining statements and try to determine which
> +	 instructions are allowed/needed to be moved.  */
> +      while (!gsi_end_p (gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  gsi_prev (&gsi);
> +	  if (!gimple_has_ops (stmt)
> +	      || is_gimple_debug (stmt))
> +	    continue;
> +
> +	  tree dest = NULL_TREE;
> +	  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> +	     use the LHS, if not, assume that the first argument of a call is
> +	     the value being defined.  e.g. MASKED_LOAD etc.  */
> +	  if (gimple_has_lhs (stmt))
> +	    dest = gimple_get_lhs (stmt);
> +	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> +	    dest = gimple_arg (call, 0);

FOR_EACH_SSA_DEF_OPERAND (...)

(asms can have multiple defs)

> +	  bool move = chain.contains (dest);

move this down to where used first

> +
> +	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +	  if (!stmt_vinfo)
> +	    {

I wonder when this hits?

> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks not supported. Unknown"
> +				 " statement: %G", stmt);
> +	      return opt_result::failure_at (c,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", c);
> +	    }
> +
> +	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +	  if (dr_ref)
> +	    {
> +	      /* We currently only support statically allocated objects due to
> +		 not having first-faulting loads support or peeling for
> +		 alignment support.  Compute the size of the referenced object
> +		 (it could be dynamically allocated).  */
> +	      tree obj = DR_BASE_ADDRESS (dr_ref);
> +	      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +		{
> +		  if (dump_enabled_p ())
> +		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				     "early breaks only supported on statically"
> +				     " allocated objects.\n");
> +		  return opt_result::failure_at (c,
> +				     "can't safely apply code motion to "
> +				     "dependencies of %G to vectorize "
> +				     "the early exit.\n", c);
> +		}
> +
> +	      tree refop = TREE_OPERAND (obj, 0);
> +	      tree refbase = get_base_address (refop);
> +	      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +		  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +		{
> +		  if (dump_enabled_p ())
> +		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				       "early breaks only supported on"
> +				       " statically allocated objects.\n");
> +		  return opt_result::failure_at (c,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", c);
> +		}
> +
> +	      /* Check if vector accesses to the object will be within
> +		 bounds.  */
> +	      tree stype = TREE_TYPE (DECL_SIZE (refbase));
> +	      tree access = fold_build2 (PLUS_EXPR, stype, DR_OFFSET (dr_ref),
> +					 DR_INIT (dr_ref));
> +	      tree final_adj
> +		= fold_build2 (MULT_EXPR, stype, LOOP_VINFO_NITERS (loop_vinfo),
> +			       DR_STEP (dr_ref));
> +
> +	      /* must be a constant or assume loop will be versioned or niters
> +		 bounded by VF so accesses are within range.  */
> +	      if (TREE_CODE (access) == INTEGER_CST
> +		  && TREE_CODE (final_adj) == INTEGER_CST)
> +		{
> +		  access = fold_build2 (PLUS_EXPR, stype, access, final_adj);
> +		  wide_int size = wi::to_wide (DECL_SIZE (refbase));
> +		  wide_int off = wi::to_wide (access);
> +		  if (wi::ge_p (off, size, UNSIGNED))
> +		    {
> +		      if (dump_enabled_p ())
> +			dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +					 "early breaks not supported:"
> +					 " vectorization would read beyond size"
> +					 " of object %T.\n", obj);
> +		      return opt_result::failure_at (c,
> +					 "can't safely apply code motion to "
> +					 "dependencies of %G to vectorize "
> +					 "the early exit.\n", c);
> +		    }
> +		}

missing

       else
         return opt_result::failure_at (....);

because you couldn't prove the access is not out-of-bounds.

I think you want to do this totally different, looking at DR_REF
instead and use code like if-conversions ref_within_array_bound
(you possibly can use that literally here).


> +
> +	      if (DR_IS_READ (dr_ref))
> +		bases.safe_push (dr_ref);
> +	      else if (DR_IS_WRITE (dr_ref))
> +		{
> +		  /* We are moving writes down in the CFG.  To be sure that this
> +		     is valid after vectorization we have to check all the loads
> +		     we are hoisting the stores past to see if any of them may
> +		     alias or are the same object.
> +
> +		     Same objects will not be an issue because unless the store
> +		     is marked volatile the value can be forwarded.  If the
> +		     store is marked volatile we don't vectorize the loop
> +		     anyway.
> +
> +		     That leaves the check for aliasing.  We don't really need
> +		     to care about the stores aliasing with each other since the
> +		     stores are moved in order so the effects are still observed
> +		     correctly.  This leaves the check for WAR dependencies
> +		     which we would be introducing here if the DR can alias.
> +		     The check is quadratic in loads/stores but I have not found
> +		     a better API to do this.  I believe all loads and stores
> +		     must be checked.  We also must check them when we
> +		     encountered the store, since we don't care about loads past
> +		     the store.  */
> +
> +		  for (auto dr_read : bases)
> +		    if (dr_may_alias_p (dr_read, dr_ref, loop_nest))

I think you need to swap dr_read and dr_ref operands, since you
are walking stmts backwards and thus all reads from 'bases' are
after the write.

> +		      {
> +			if (dump_enabled_p ())
> +			    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +					     vect_location,
> +					     "early breaks not supported: "
> +					     "overlapping loads and stores "
> +					     "found before the break "
> +					     "statement.\n");
> +
> +			return opt_result::failure_at (stmt,
> +			     "can't safely apply code motion to dependencies"
> +			     " to vectorize the early exit. %G may alias with"
> +			     " %G\n", stmt, dr_read->stmt);
> +		      }
> +
> +		  /* Any writes starts a new chain. */
> +		  move = true;
> +		}
> +	    }
> +
> +	  /* If a statement is live and escapes the loop through usage in the
> +	     loop epilogue then we can't move it since we need to maintain its
> +	     reachability through all exits.  */
> +	  bool skip = false;
> +	  if (STMT_VINFO_LIVE_P (stmt_vinfo)
> +	      && !(dr_ref && DR_IS_WRITE (dr_ref)))

You should be able to assert this?

> +	    {
> +	      imm_use_iterator imm_iter;
> +	      use_operand_p use_p;
> +	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> +		{
> +		  basic_block bb = gimple_bb (USE_STMT (use_p));
> +		  skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +		  if (skip)
> +		    break;
> +		}
> +	    }
> +
> +	  /* If we found the defining statement of a something that's part of
> +	     the chain then expand the chain with the new SSA_VARs being
> +	     used.  */
> +	  if (!skip && move)
> +	    {
> +	      use_operand_p use_p;
> +	      ssa_op_iter iter;
> +	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +		{
> +		  tree op = USE_FROM_PTR (use_p);
> +		  gcc_assert (TREE_CODE (op) == SSA_NAME);
> +		  if (fixed.contains (dest))
> +		    {
> +		      move = false;
> +		      fixed.add (op);

This looks odd.  When the LHS (dest) of 'stmt' is fixed, any of its
operands should already be fixed.  And if you perform special handling
of this here with respect to 'chain' then this becomes dependent on
the order of processing of exits.

IIRC I suggested you first fully populate 'fixed' based on _all_
exits and then in a second loop produce 'chain'?

> +		    }
> +		  else
> +		    chain.add (op);
> +		}
> +
> +	      if (dump_enabled_p ())
> +		{
> +		  if (move)
> +		    dump_printf_loc (MSG_NOTE, vect_location,
> +				     "found chain %G", stmt);
> +		  else
> +		    dump_printf_loc (MSG_NOTE, vect_location,
> +				     "ignored chain %G, not single use", stmt);
> +		}
> +	    }
> +
> +	  if (move)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "==> recording stmt %G", stmt);
> +
> +	      /* If we've moved a VDEF, extract the defining MEM and update
> +		 usages of it.   */
> +	      tree vdef;
> +	      /* This statement is to be moved.  */
> +	      if ((vdef = gimple_vdef (stmt)))
> +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (
> +		    stmt);

I'm also unsure why you need 'chain' at all given you have the vector
of stores to be moved?

Thanks,
Richard.

> +	    }
> +
> +	  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> +	    {
> +	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "marked statement for vUSE update: %G", stmt);
> +	    }
> +	}
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +
> +      /* Mark the statement as a condition.  */
> +      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
> +    }
> +
> +  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> +  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> +  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
> +  /* We don't allow outer -> inner loop transitions which should have been
> +     trapped already during loop form analysis.  */
> +  gcc_assert (dest_bb->loop_father == loop);
> +
> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> +
> +  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> +    {
> +      /* All uses shall be updated to that of the first load.  Entries are
> +	 stored in reverse order.  */
> +      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
> +      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +	{
> +	  if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "will update use: %T, mem_ref: %G", vuse, g);
> +	}
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +1028,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fb8d999ee6bfaff551ac06ac2f3aea5354914659..0a90d2860b8d037b72fd41d4240804aa390467ea 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("move_early_exit_stmts");
> +
> +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse
> +    = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for load %G", vuse, p);
> +      gimple_set_vuse (p, vuse);
> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>    *live_p = false;
>  
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (dyn_cast <gcond *> (stmt))
>      *relevant = vect_used_in_scope;
>  
>    /* changing memory.  */
> @@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +	case vect_condition_def:
> +	  dump_printf (MSG_NOTE, "control flow\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index e4d7ab4567cef3c018b958f98eeff045d3477725..3c9478a3dc8750c71e0bf2a36a5b0815afc3fd94 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_condition_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of statements needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they need
> +     to be moved.  */
> +  auto_vec<gimple *> early_break_conflict;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
>
Tamar Christina Dec. 20, 2023, 10:51 a.m. UTC | #6
> > +	      /* If we've moved a VDEF, extract the defining MEM and update
> > +		 usages of it.   */
> > +	      tree vdef;
> > +	      /* This statement is to be moved.  */
> > +	      if ((vdef = gimple_vdef (stmt)))
> > +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo).safe_push (
> > +		    stmt);
> 
> I'm also unsure why you need 'chain' at all given you have the vector
> of stores to be moved?
> 

Yeah, so originally I wanted to move statements other than stores.  While stores
are needed for correctness, the other statements would be so we didn't extend the
live range too much for intermediate values.

This proved difficult but eventually I got it to work, but as you saw it was meh code.
Instead I guess the better approach is to teach sched1 in GCC 15 to schedule across
branches in loops.

With that in mind, I changed it to move only stores.  Since stores never produce a
and are sinks, I don't really need fixed nor chain.

So here's a much cleaned up patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-if-conv.cc (ref_within_array_bound): Expose.
	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(ref_within_array_bound): New.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_STORES): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_57.c: Update.
	* gcc.dg/vect/vect-early-break_79.c: New test.
	* gcc.dg/vect/vect-early-break_80.c: New test.
	* gcc.dg/vect/vect-early-break_81.c: New test.
	* gcc.dg/vect/vect-early-break_83.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0bde281c2468d8e7f43afc4fe0f757e221ad5edb..a31e3d5161684878a79817d30a6955c8370444d8 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -844,7 +844,7 @@ idx_within_array_bound (tree ref, tree *idx, void *dta)
 
 /* Return TRUE if ref is a within bound array reference.  */
 
-static bool
+bool
 ref_within_array_bound (gimple *stmt, tree ref)
 {
   class loop *loop = loop_containing_stmt (stmt);
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..85ae75ff2eb12b4299e8b7b91d0cf16e4549d08e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,241 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* List of all load data references found during traversal.  */
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (!dr_ref)
+	    continue;
+
+	  /* We currently only support statically allocated objects due to
+	     not having first-faulting loads support or peeling for
+	     alignment support.  Compute the size of the referenced object
+	     (it could be dynamically allocated).  */
+	  tree obj = DR_BASE_ADDRESS (dr_ref);
+	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on statically"
+				 " allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  tree refop = TREE_OPERAND (obj, 0);
+	  tree refbase = get_base_address (refop);
+	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on"
+				 " statically allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  /* Check if vector accesses to the object will be within bounds.
+	     must be a constant or assume loop will be versioned or niters
+	     bounded by VF so accesses are within range.  */
+	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported: vectorization "
+				 "would %s beyond size of obj.",
+				 DR_IS_READ (dr_ref) ? "read" : "write");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  if (DR_IS_READ (dr_ref))
+	    bases.safe_push (dr_ref);
+	  else if (DR_IS_WRITE (dr_ref))
+	    {
+	      /* We are moving writes down in the CFG.  To be sure that this
+		 is valid after vectorization we have to check all the loads
+		 we are sinking the stores past to see if any of them may
+		 alias or are the same object.
+
+		 Same objects will not be an issue because unless the store
+		 is marked volatile the value can be forwarded.  If the
+		 store is marked volatile we don't vectorize the loop
+		 anyway.
+
+		 That leaves the check for aliasing.  We don't really need
+		 to care about the stores aliasing with each other since the
+		 stores are moved in order so the effects are still observed
+		 correctly.  This leaves the check for WAR dependencies
+		 which we would be introducing here if the DR can alias.
+		 The check is quadratic in loads/stores but I have not found
+		 a better API to do this.  I believe all loads and stores
+		 must be checked.  We also must check them when we
+		 encountered the store, since we don't care about loads past
+		 the store.  */
+
+	      for (auto dr_read : bases)
+		if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
+		  {
+		    if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+				       vect_location,
+				       "early breaks not supported: "
+				       "overlapping loads and stores "
+				       "found before the break "
+				       "statement.\n");
+
+		    return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		  }
+	    }
+
+	  if (gimple_vdef (stmt))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).safe_push (stmt);
+	    }
+	  else if (gimple_vuse (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +892,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..900826567fee36206c0711ea51495602a7a031a1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..744cdc86c969a62574be488df4f9c222b68f7994 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of stores needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they
+     need to be moved.  */
+  auto_vec<gimple *> early_break_stores;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -2298,6 +2321,9 @@ extern opt_result vect_get_vector_types_for_stmt (vec_info *,
 						  tree *, unsigned int = 0);
 extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
+/* In tree-if-conv.cc.  */
+extern bool ref_within_array_bound (gimple *, tree);
+
 /* In tree-vect-data-refs.cc.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
 extern enum dr_alignment_support vect_supportable_dr_alignment
Richard Biener Dec. 20, 2023, 12:24 p.m. UTC | #7
On Wed, 20 Dec 2023, Tamar Christina wrote:

> > > +	      /* If we've moved a VDEF, extract the defining MEM and update
> > > +		 usages of it.   */
> > > +	      tree vdef;
> > > +	      /* This statement is to be moved.  */
> > > +	      if ((vdef = gimple_vdef (stmt)))
> > > +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo).safe_push (
> > > +		    stmt);
> > 
> > I'm also unsure why you need 'chain' at all given you have the vector
> > of stores to be moved?
> > 
> 
> Yeah, so originally I wanted to move statements other than stores.  While stores
> are needed for correctness, the other statements would be so we didn't extend the
> live range too much for intermediate values.
> 
> This proved difficult but eventually I got it to work, but as you saw it was meh code.
> Instead I guess the better approach is to teach sched1 in GCC 15 to schedule across
> branches in loops.
> 
> With that in mind, I changed it to move only stores.  Since stores never produce a
> and are sinks, I don't really need fixed nor chain.
> 
> So here's a much cleaned up patch.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.
 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (ref_within_array_bound): Expose.
> 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(ref_within_array_bound): New.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_STORES): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_57.c: Update.
> 	* gcc.dg/vect/vect-early-break_79.c: New test.
> 	* gcc.dg/vect/vect-early-break_80.c: New test.
> 	* gcc.dg/vect/vect-early-break_81.c: New test.
> 	* gcc.dg/vect/vect-early-break_83.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,6 +5,7 @@
>  /* { dg-additional-options "-Ofast" } */
>  
>  /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>  
>  void abort ();
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#undef N
> +#define N 32
> +
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < 1024; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> @@ -0,0 +1,43 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +
> +int x;
> +__attribute__ ((noinline, noipa))
> +void foo (int *a, int *b)
> +{
> +  int local_x = x;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      if (i + local_x == 13)
> +        break;
> +      a[i] = 2 * b[i];
> +    }
> +}
> +
> +int main ()
> +{
> +  int a[1024] = {0};
> +  int b[1024] = {0};
> +
> +  for (int i = 0; i < 1024; i++)
> +    b[i] = i;
> +
> +  x = -512;
> +  foo (a, b);
> +
> +  if (a[524] != 1048)
> +    abort ();
> +
> +  if (a[525] != 0)
> +    abort ();
> +
> +  if (a[1023] != 0)
> +    abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +void abort ();
> +
> +unsigned short sa[32];
> +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[32];
> +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +int main2 (int n)
> +{
> +  int i;
> +  for (i = 0; i < n - 3; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   volatile complex double z = vect_b[i];
> +   vect_b[i] = x + i + z;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 0bde281c2468d8e7f43afc4fe0f757e221ad5edb..a31e3d5161684878a79817d30a6955c8370444d8 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -844,7 +844,7 @@ idx_within_array_bound (tree ref, tree *idx, void *dta)
>  
>  /* Return TRUE if ref is a within bound array reference.  */
>  
> -static bool
> +bool
>  ref_within_array_bound (gimple *stmt, tree ref)
>  {
>    class loop *loop = loop_containing_stmt (stmt);
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..85ae75ff2eb12b4299e8b7b91d0cf16e4549d08e 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,241 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.
> +
> +   This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  /* List of all load data references found during traversal.  */
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +
> +  hash_set <gimple *> visited;
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  class loop *loop_nest = loop_outer (loop);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple_stmt_iterator gsi = gsi_for_stmt (c);
> +
> +      /* Now analyze all the remaining statements and try to determine which
> +	 instructions are allowed/needed to be moved.  */
> +      while (!gsi_end_p (gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  gsi_prev (&gsi);
> +	  if (!gimple_has_ops (stmt)
> +	      || is_gimple_debug (stmt))
> +	    continue;
> +
> +	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +	  if (!dr_ref)
> +	    continue;
> +
> +	  /* We currently only support statically allocated objects due to
> +	     not having first-faulting loads support or peeling for
> +	     alignment support.  Compute the size of the referenced object
> +	     (it could be dynamically allocated).  */
> +	  tree obj = DR_BASE_ADDRESS (dr_ref);
> +	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks only supported on statically"
> +				 " allocated objects.\n");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  tree refop = TREE_OPERAND (obj, 0);
> +	  tree refbase = get_base_address (refop);
> +	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks only supported on"
> +				 " statically allocated objects.\n");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  /* Check if vector accesses to the object will be within bounds.
> +	     must be a constant or assume loop will be versioned or niters
> +	     bounded by VF so accesses are within range.  */
> +	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks not supported: vectorization "
> +				 "would %s beyond size of obj.",
> +				 DR_IS_READ (dr_ref) ? "read" : "write");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  if (DR_IS_READ (dr_ref))
> +	    bases.safe_push (dr_ref);
> +	  else if (DR_IS_WRITE (dr_ref))
> +	    {
> +	      /* We are moving writes down in the CFG.  To be sure that this
> +		 is valid after vectorization we have to check all the loads
> +		 we are sinking the stores past to see if any of them may
> +		 alias or are the same object.
> +
> +		 Same objects will not be an issue because unless the store
> +		 is marked volatile the value can be forwarded.  If the
> +		 store is marked volatile we don't vectorize the loop
> +		 anyway.
> +
> +		 That leaves the check for aliasing.  We don't really need
> +		 to care about the stores aliasing with each other since the
> +		 stores are moved in order so the effects are still observed
> +		 correctly.  This leaves the check for WAR dependencies
> +		 which we would be introducing here if the DR can alias.
> +		 The check is quadratic in loads/stores but I have not found
> +		 a better API to do this.  I believe all loads and stores
> +		 must be checked.  We also must check them when we
> +		 encountered the store, since we don't care about loads past
> +		 the store.  */
> +
> +	      for (auto dr_read : bases)
> +		if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
> +		  {
> +		    if (dump_enabled_p ())
> +		      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +				       vect_location,
> +				       "early breaks not supported: "
> +				       "overlapping loads and stores "
> +				       "found before the break "
> +				       "statement.\n");
> +
> +		    return opt_result::failure_at (stmt,
> +			     "can't safely apply code motion to dependencies"
> +			     " to vectorize the early exit. %G may alias with"
> +			     " %G\n", stmt, dr_read->stmt);
> +		  }
> +	    }
> +
> +	  if (gimple_vdef (stmt))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "==> recording stmt %G", stmt);
> +
> +	      LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).safe_push (stmt);
> +	    }
> +	  else if (gimple_vuse (stmt))
> +	    {
> +	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "marked statement for vUSE update: %G", stmt);
> +	    }
> +	}
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +
> +      /* Mark the statement as a condition.  */
> +      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
> +    }
> +
> +  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> +  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> +  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
> +  /* We don't allow outer -> inner loop transitions which should have been
> +     trapped already during loop form analysis.  */
> +  gcc_assert (dest_bb->loop_father == loop);
> +
> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> +
> +  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> +    {
> +      /* All uses shall be updated to that of the first load.  Entries are
> +	 stored in reverse order.  */
> +      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
> +      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +	{
> +	  if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "will update use: %T, mem_ref: %G", vuse, g);
> +	}
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +892,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fb8d999ee6bfaff551ac06ac2f3aea5354914659..900826567fee36206c0711ea51495602a7a031a1 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("move_early_exit_stmts");
> +
> +  if (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse
> +    = gimple_vuse (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for load %G", vuse, p);
> +      gimple_set_vuse (p, vuse);
> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>    *live_p = false;
>  
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (dyn_cast <gcond *> (stmt))
>      *relevant = vect_used_in_scope;
>  
>    /* changing memory.  */
> @@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +	case vect_condition_def:
> +	  dump_printf (MSG_NOTE, "control flow\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index e4d7ab4567cef3c018b958f98eeff045d3477725..744cdc86c969a62574be488df4f9c222b68f7994 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_condition_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of stores needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they
> +     need to be moved.  */
> +  auto_vec<gimple *> early_break_stores;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> @@ -2298,6 +2321,9 @@ extern opt_result vect_get_vector_types_for_stmt (vec_info *,
>  						  tree *, unsigned int = 0);
>  extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
>  
> +/* In tree-if-conv.cc.  */
> +extern bool ref_within_array_bound (gimple *, tree);
> +
>  /* In tree-vect-data-refs.cc.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
>  extern enum dr_alignment_support vect_supportable_dr_alignment
>
diff mbox series

Patch

--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,332 @@  vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   gimple_stmt_iterator *gstmt)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  /* ?? Do I need to move debug statements? not quite sure..  */
+  if (gimple_has_ops (stmt)
+      && !is_gimple_debug (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	dest = gimple_get_lhs (stmt);
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks not supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currently only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the size of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement is live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      /* This statement is to be moved.  */
+	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, gstmt))
+    return false;
+
+  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+    {
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "marked statement for vUSE update: %G", stmt);
+    }
+
+  return true;
+}
+
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  hash_set<tree> chain, fixed;
+  auto_vec<tree> loads;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+  tree vuse = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+      /* Initiaze the vuse chain with the one at the early break.  */
+      if (!vuse)
+	vuse = gimple_vuse (c);
+
+      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+				     &bases, &vuse, &gsi))
+	return opt_result::failure_at (stmt,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", stmt);
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+    }
+
+  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  /* TODO: Remove? It's useful debug statement but may be too much.  */
+  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "updated use: %T, mem_ref: %G",
+			 vuse, g);
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +983,11 @@  vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11392,6 +11393,55 @@  update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for stmt %G", vuse, p);
+      unlink_stmt_vdef (p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11541,6 +11591,11 @@  vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13511,6 +13511,9 @@  vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+       case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@  enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_early_exit_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@  public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@  public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@  public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies