diff mbox series

[RFC,vect] PR 91246: Prototype for vectorization of loops with breaks

Message ID aca8e145-1522-514b-451b-8c3a85dbe329@arm.com
State New
Headers show
Series [RFC,vect] PR 91246: Prototype for vectorization of loops with breaks | expand

Commit Message

Andre Vieira March 12, 2020, 9:15 a.m. UTC
Hi all,

This is a prototype I have put together to look at the feasibility and 
profitability of vectorizing loops with breaks as suggested in PR 91246. 
I am posting this here for comments as I am curious what your opinions 
are on my approach.  I don't expect much attention to this in stage 4, 
but I wanted to get it out now before I forget about it.

The idea is that during ifcvt it checks whether it can safely transform 
the break away, version the loop and in the "to vectorize" version 
replace it with a hopefully a reduceable comparison and MIN_EXPR. 
Currently I have limited the cases it can transform to loops that meet 
all of the following conditions:
- is a inner-loop with one break point,
- the break condition is an EQ_EXPR or NE_EXPR of integers,
- the entire loop is vectorize, i.e. it has no scalar epilogue,
- the loop has no writes to memory and no variable escapes the loop,
- all memory accesses are valid (i.e. within bounds) even if the loop 
doesn't break early.

To check the last condition I copied and modified the checks used to 
warn for out of bounds accesses.  I modified these checks such that they 
are more conservative, we want to reject the transformation if we can't 
guarantee the accesses are within bounds for any possible range. I have 
noticed these range checks aren't very good yet, I am hoping that as 
project ranger evolves this will get easier. However, I am also worried 
that if they start to take control flow into consideration and 
"understand" the break, they will use it to reason that with the break 
in place the range of loop induction variables are limited. This 
obviously makes it unusable for us to check whether removing the break 
will cause out of bounds accesses.  Ideally the new ranger would allow 
us to "ignore" certain expressions in the range calculation. All this 
was with targets in mind that do not support masking, for targets with 
the ability to mask vector operations we could do without such analysis, 
as we may be able to prevent memory accesses taking place after the 
break condition has been met.

In trying to improve performance of this transformation I made a small 
change in the vectorizer, such that if we are dealing with such loops 
with breaks the ifcvt pass stores the variable holding the break 
condition in the loop struct under 'early_break_cond'. The vectorizer 
can then use this variable to check whether we have hit the break 
condition every iteration, preventing us to continue to go through the 
loop if we know we can stop early. This will benefit cases where we 
would break early, on the other hand if we break late, then we now 
introduced extra checks within the loop... However there is no way for 
us to know, one small improvement would be to use the known iteration 
count to decide whether or not to add these checks, one could argue that 
if we know the number of iterations to be small there is little point in 
adding these.  Thoughts welcome...

Another issue I encountered was that the early loop unroller often gets 
in the way of things. For this reason I tried to teach it not to unroll 
inner loops which we can remove breaks from. Unfortunately the early 
loop unroller is performed before some other simplification 
transformations and loop canonicalisation, I have seen cases where these 
transformations were crucial in making our memory access analysis accept 
loops. I suggest disabling the early loop unroller to check whether this 
pass is able to handle certain loops, i.e. pass 
'-fdisable-tree-cunrolli', you can even go as far as use 
-fdisable-tree-cunrolli=<function_name> for more accurate 
benchmarking/analysis.

I made some changes to the testcase on the PR ticket such that the 
compiler has enough information to determine all memory accesses are 
within bounds, see testcase below. This is one of those cases that 
requires '-fdisable-tree-cunrolli'.

#include <stdbool.h>
#define SIZE 8
int s, sum;

int *g_board;
int *g_moves;

static void f(int *moves, int *board, int cnt, int lastmove, int ko)
{
     for (int i = 0; i < cnt; i++) {
         if (moves[i] == ko) continue;

         bool found = false;
         for (int j = 0; j < SIZE; j++) {
             if ((lastmove + board[j]) == moves[j]) {
                 found = true;
                 break;
             }
         }

         if (!found) {
             s *= 20;
         }

         if (s >= 40) {
             sum += s;
         }
     }
}

void foo(int cnt, int lastmove, int ko)
{
   int board[SIZE];
   int moves[SIZE];
   for (int i = 0; i < SIZE; ++i)
     {
       board[i] = g_board[i];
       moves[i] = g_moves[i];
     }

   f (moves, board, cnt, lastmove, ko);
}


I have not found this transformation to have a big impact on performance 
so far.

Kind Regards,
Andre

Comments

Li, Pan2 via Gcc-patches March 12, 2020, 9:50 a.m. UTC | #1
On Thu, Mar 12, 2020 at 10:15 AM Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi all,
>
> This is a prototype I have put together to look at the feasibility and
> profitability of vectorizing loops with breaks as suggested in PR 91246.
> I am posting this here for comments as I am curious what your opinions
> are on my approach.  I don't expect much attention to this in stage 4,
> but I wanted to get it out now before I forget about it.
>
> The idea is that during ifcvt it checks whether it can safely transform
> the break away, version the loop and in the "to vectorize" version
> replace it with a hopefully a reduceable comparison and MIN_EXPR.
> Currently I have limited the cases it can transform to loops that meet
> all of the following conditions:
> - is a inner-loop with one break point,
> - the break condition is an EQ_EXPR or NE_EXPR of integers,
> - the entire loop is vectorize, i.e. it has no scalar epilogue,
> - the loop has no writes to memory and no variable escapes the loop,
> - all memory accesses are valid (i.e. within bounds) even if the loop
> doesn't break early.
>
> To check the last condition I copied and modified the checks used to
> warn for out of bounds accesses.  I modified these checks such that they
> are more conservative, we want to reject the transformation if we can't
> guarantee the accesses are within bounds for any possible range. I have
> noticed these range checks aren't very good yet, I am hoping that as
> project ranger evolves this will get easier. However, I am also worried
> that if they start to take control flow into consideration and
> "understand" the break, they will use it to reason that with the break
> in place the range of loop induction variables are limited. This
> obviously makes it unusable for us to check whether removing the break
> will cause out of bounds accesses.  Ideally the new ranger would allow
> us to "ignore" certain expressions in the range calculation. All this
> was with targets in mind that do not support masking, for targets with
> the ability to mask vector operations we could do without such analysis,
> as we may be able to prevent memory accesses taking place after the
> break condition has been met.

Not looking at the patch but just throwing in thoughts.

I think at some point we need to stop relying on a separate if-conversion
pass since that requires a GIMPLE representation for the scalar
if-converted loop which is really difficult if you start do apply predication
to more than just loads and stores.

That is, I'd like to see us experimenting with doing the if-conversion
"on the fly" during vectorization.  That probably means doing what
if-conversion does but represent the result in some non-IL data
structure - since we're going to do all-SLP it'll be the SLP graph,
maybe simply attach a vector of predicates to each node.
The benefit is that we'll eventually be able to if-convert for
non-loop vectorization as well.

Possibly easiest is of course to try do away with if-conversion as it
is now and only have the existing feature-set moved to the vectorizer.
I'd probably really start with copying the if-conversion analysis code
and simply skip its "code-generation", then handle multiple BBs
in all of the vectorizer code and see how analysis and code generation
would have to be adjusted.  Given the general direction I'd focus
on full-SLP testcases but in reality I cannot promise I'll finish
the only-SLP transition in time for GCC 11 (I hope to have a good
chunk ready for the Cauldron though).

> In trying to improve performance of this transformation I made a small
> change in the vectorizer, such that if we are dealing with such loops
> with breaks the ifcvt pass stores the variable holding the break
> condition in the loop struct under 'early_break_cond'. The vectorizer
> can then use this variable to check whether we have hit the break
> condition every iteration, preventing us to continue to go through the
> loop if we know we can stop early. This will benefit cases where we
> would break early, on the other hand if we break late, then we now
> introduced extra checks within the loop... However there is no way for
> us to know, one small improvement would be to use the known iteration
> count to decide whether or not to add these checks, one could argue that
> if we know the number of iterations to be small there is little point in
> adding these.  Thoughts welcome...
>
> Another issue I encountered was that the early loop unroller often gets
> in the way of things. For this reason I tried to teach it not to unroll
> inner loops which we can remove breaks from. Unfortunately the early
> loop unroller is performed before some other simplification
> transformations and loop canonicalisation, I have seen cases where these
> transformations were crucial in making our memory access analysis accept
> loops. I suggest disabling the early loop unroller to check whether this
> pass is able to handle certain loops, i.e. pass
> '-fdisable-tree-cunrolli', you can even go as far as use
> -fdisable-tree-cunrolli=<function_name> for more accurate
> benchmarking/analysis.
>
> I made some changes to the testcase on the PR ticket such that the
> compiler has enough information to determine all memory accesses are
> within bounds, see testcase below. This is one of those cases that
> requires '-fdisable-tree-cunrolli'.
>
> #include <stdbool.h>
> #define SIZE 8
> int s, sum;
>
> int *g_board;
> int *g_moves;
>
> static void f(int *moves, int *board, int cnt, int lastmove, int ko)
> {
>      for (int i = 0; i < cnt; i++) {
>          if (moves[i] == ko) continue;
>
>          bool found = false;
>          for (int j = 0; j < SIZE; j++) {
>              if ((lastmove + board[j]) == moves[j]) {
>                  found = true;
>                  break;
>              }
>          }
>
>          if (!found) {
>              s *= 20;
>          }
>
>          if (s >= 40) {
>              sum += s;
>          }
>      }
> }
>
> void foo(int cnt, int lastmove, int ko)
> {
>    int board[SIZE];
>    int moves[SIZE];
>    for (int i = 0; i < SIZE; ++i)
>      {
>        board[i] = g_board[i];
>        moves[i] = g_moves[i];
>      }
>
>    f (moves, board, cnt, lastmove, ko);
> }
>
>
> I have not found this transformation to have a big impact on performance
> so far.
>
> Kind Regards,
> Andre
diff mbox series

Patch

diff --git a/gcc/cfganal.h b/gcc/cfganal.h
index 849e537eddbf8e551b5af26a48a4468155bb7635..cccba9f9628b7aab2febea98cb9cece11c99900c 100644
--- a/gcc/cfganal.h
+++ b/gcc/cfganal.h
@@ -81,6 +81,8 @@  extern void bitmap_union_of_preds (sbitmap, sbitmap *, basic_block);
 extern basic_block * single_pred_before_succ_order (void);
 extern edge single_incoming_edge_ignoring_loop_edges (basic_block, bool);
 extern edge single_pred_edge_ignoring_loop_edges (basic_block, bool);
+extern bool can_remove_breaks (class loop *, unsigned int, basic_block *,
+			       edge *, vec<edge> *);
 
 
 #endif /* GCC_CFGANAL_H */
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 11378cadd41c8721e16e6169f8a881971e9f6654..54f83e70bfd2f28167f0f875d700d755e80ff49b 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -268,6 +268,11 @@  public:
      the basic-block from being collected but its index can still be
      reused.  */
   basic_block former_header;
+
+  /* Used for loops that used to contain breaks.  After removing of breaks this
+     tree is set to a variable that is true if we during the current iteration
+     we would have exited the loop due to the break condition.  */
+  tree early_break_cond;
 };
 
 /* Set if the loop is known to be infinite.  */
@@ -876,4 +881,17 @@  gcov_type_to_wide_int (gcov_type val)
 
   return widest_int::from_array (a, 2);
 }
+
+/* Function bb_in_loop_p
+
+   Used as predicate for dfs order traversal of the loop bbs.  */
+
+inline bool
+bb_in_loop_p (const_basic_block bb, const void *data)
+{
+  const class loop *const loop = (const class loop *)data;
+  if (flow_bb_inside_loop_p (loop, bb))
+    return true;
+  return false;
+}
 #endif /* GCC_CFGLOOP_H */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 8d24c18d5de9a982733bfd5a16d4e87cc4ebef1b..08e15d2528465427b2b728e1c4a6c7df5061a360 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -99,6 +99,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "gimplify.h"
 #include "gimple-iterator.h"
+#include "gimple-walk.h"
 #include "gimplify-me.h"
 #include "tree-cfg.h"
 #include "tree-into-ssa.h"
@@ -2727,7 +2728,8 @@  combine_blocks (class loop *loop)
    consistent after the condition is folded in the vectorizer.  */
 
 static class loop *
-version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
+version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds,
+				bool save_preds = true)
 {
   basic_block cond_bb;
   tree cond = make_ssa_name (boolean_type_node);
@@ -2741,11 +2743,15 @@  version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 				  integer_zero_node);
   gimple_call_set_lhs (g, cond);
 
-  /* Save BB->aux around loop_version as that uses the same field.  */
-  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
-  void **saved_preds = XALLOCAVEC (void *, save_length);
-  for (unsigned i = 0; i < save_length; i++)
-    saved_preds[i] = ifc_bbs[i]->aux;
+  void **saved_preds;
+  if (save_preds)
+    {
+      /* Save BB->aux around loop_version as that uses the same field.  */
+      save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+      saved_preds = XALLOCAVEC (void *, save_length);
+      for (unsigned i = 0; i < save_length; i++)
+	saved_preds[i] = ifc_bbs[i]->aux;
+    }
 
   initialize_original_copy_tables ();
   /* At this point we invalidate porfile confistency until IFN_LOOP_VECTORIZED
@@ -2757,8 +2763,11 @@  version_loop_for_if_conversion (class loop *loop, vec<gimple *> *preds)
 			   profile_probability::always (), true);
   free_original_copy_tables ();
 
-  for (unsigned i = 0; i < save_length; i++)
-    ifc_bbs[i]->aux = saved_preds[i];
+  if (save_preds)
+    {
+      for (unsigned i = 0; i < save_length; i++)
+	ifc_bbs[i]->aux = saved_preds[i];
+    }
 
   if (new_loop == NULL)
     return NULL;
@@ -2813,6 +2822,404 @@  versionable_outer_loop_p (class loop *loop)
   return true;
 }
 
+bool
+mem_ref_maybe_out_of_bounds (tree ref)
+{
+  tree arg = TREE_OPERAND (ref, 0);
+  /* The constant and variable offset of the reference.  */
+  tree cstoff = TREE_OPERAND (ref, 1);
+  tree varoff = NULL_TREE;
+
+  const offset_int maxobjsize = tree_to_shwi (max_object_size ());
+
+  /* The array or string constant bounds in bytes.  Initially set
+     to [-MAXOBJSIZE - 1, MAXOBJSIZE]  until a tighter bound is
+     determined.  */
+  offset_int arrbounds[2] = { -maxobjsize - 1, maxobjsize };
+
+  offset_int ioff = wi::to_offset (fold_convert (ptrdiff_type_node, cstoff));
+
+  /* The range of the byte offset into the reference.  */
+  offset_int offrange[2] = { 0, 0 };
+
+  /* Determine the offsets and increment OFFRANGE for the bounds of each.
+     The loop computes the range of the final offset for expressions such
+     as (A + i0 + ... + iN)[CSTOFF] where i0 through iN are SSA_NAMEs in
+     some range.  */
+  const unsigned limit = 10;
+  for (unsigned n = 0; TREE_CODE (arg) == SSA_NAME && n < limit; ++n)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (arg);
+      if (!is_gimple_assign (def))
+	break;
+
+      tree_code code = gimple_assign_rhs_code (def);
+      if (code == POINTER_PLUS_EXPR)
+	{
+	  arg = gimple_assign_rhs1 (def);
+	  varoff = gimple_assign_rhs2 (def);
+	}
+      else if (code == ASSERT_EXPR)
+	{
+	  arg = TREE_OPERAND (gimple_assign_rhs1 (def), 0);
+	  continue;
+	}
+      else
+	return true;
+
+      /* VAROFF should always be a SSA_NAME here (and not even
+	 INTEGER_CST) but there's no point in taking chances.  */
+      if (TREE_CODE (varoff) != SSA_NAME)
+	return true;
+
+      wide_int min_wi, max_wi;
+
+      enum value_range_kind range_type
+	= determine_value_range (varoff, &min_wi, &max_wi);
+
+      if (range_type != VR_RANGE)
+	return true;
+
+      offset_int min
+	= wi::to_offset (wide_int_to_tree (ptrdiff_type_node, min_wi));
+      offset_int max
+	= wi::to_offset (wide_int_to_tree (ptrdiff_type_node, max_wi));
+      if (min < max)
+	{
+	  offrange[0] += min;
+	  offrange[1] += max;
+	}
+      else
+	{
+	  /* When MIN >= MAX, the offset is effectively in a union
+	     of two ranges: [-MAXOBJSIZE -1, MAX] and [MIN, MAXOBJSIZE].
+	     Since there is no way to represent such a range across
+	     additions, conservatively add [-MAXOBJSIZE -1, MAXOBJSIZE]
+	     to OFFRANGE.  */
+	  offrange[0] += arrbounds[0];
+	  offrange[1] += arrbounds[1];
+	}
+
+      if (offrange[0] < arrbounds[0])
+	offrange[0] = arrbounds[0];
+
+      if (offrange[1] > arrbounds[1])
+	offrange[1] = arrbounds[1];
+    }
+
+  if (TREE_CODE (arg) == ADDR_EXPR)
+    {
+      arg = TREE_OPERAND (arg, 0);
+      if (TREE_CODE (arg) != STRING_CST
+	  && TREE_CODE (arg) != PARM_DECL
+	  && TREE_CODE (arg) != VAR_DECL)
+	return true;
+    }
+  else if (TREE_CODE (arg) == SSA_NAME)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (arg);
+      tree pointer = TREE_TYPE (arg);
+      if (def != NULL && gimple_code (def) == GIMPLE_NOP
+	  && POINTER_TYPE_P (pointer)
+	  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (pointer))
+	  && TYPE_CXX_ODR_P (TREE_TYPE (pointer)))
+	/* We are dealing with a global object of sorts, always OK to access
+	   it.  */
+	return false;
+      else
+	return true;
+    }
+  else
+    return true;
+
+  /* The type of the object being referred to.  It can be an array,
+     string literal, or a non-array type when the MEM_REF represents
+     a reference/subscript via a pointer to an object that is not
+     an element of an array.  Incomplete types are excluded as well
+     because their size is not known.  */
+  tree reftype = TREE_TYPE (arg);
+  if (POINTER_TYPE_P (reftype)
+      || !COMPLETE_TYPE_P (reftype)
+      || TREE_CODE (TYPE_SIZE_UNIT (reftype)) != INTEGER_CST)
+    return true;
+
+  /* Except in declared objects, references to trailing array members
+     of structs and union objects are excluded because MEM_REF doesn't
+     make it possible to identify the member where the reference
+     originated.  */
+  if (RECORD_OR_UNION_TYPE_P (reftype)
+      && (!VAR_P (arg)
+	  || (DECL_EXTERNAL (arg) && array_at_struct_end_p (ref))))
+    return true;
+
+  arrbounds[0] = 0;
+
+  offset_int eltsize;
+  if (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      eltsize = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (reftype)));
+      if (tree dom = TYPE_DOMAIN (reftype))
+	{
+	  tree bnds[] = { TYPE_MIN_VALUE (dom), TYPE_MAX_VALUE (dom) };
+	  if (TREE_CODE (arg) == COMPONENT_REF)
+	    {
+	      offset_int size = maxobjsize;
+	      if (tree fldsize = component_ref_size (arg))
+		size = wi::to_offset (fldsize);
+	      arrbounds[1] = wi::lrshift (size, wi::floor_log2 (eltsize));
+	    }
+	  else if (array_at_struct_end_p (arg) || !bnds[0] || !bnds[1])
+	    arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
+	  else
+	    arrbounds[1] = (wi::to_offset (bnds[1]) - wi::to_offset (bnds[0])) * eltsize;
+	}
+      else
+	arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
+
+      if (TREE_CODE (ref) == MEM_REF)
+	{
+	  /* For MEM_REF determine a tighter bound of the non-array
+	     element type.  */
+	  tree eltype = TREE_TYPE (reftype);
+	  while (TREE_CODE (eltype) == ARRAY_TYPE)
+	    eltype = TREE_TYPE (eltype);
+	  eltsize = wi::to_offset (TYPE_SIZE_UNIT (eltype));
+	}
+    }
+  else
+    {
+      eltsize = 1;
+      tree size = TYPE_SIZE_UNIT (reftype);
+      if (VAR_P (arg))
+	if (tree initsize = DECL_SIZE_UNIT (arg))
+	  if (tree_int_cst_lt (size, initsize))
+	    size = initsize;
+
+      arrbounds[1] = wi::to_offset (size);
+    }
+
+  offrange[0] += ioff;
+  offrange[1] += ioff;
+
+  if (arrbounds[0] == arrbounds[1]
+      || offrange[0] >= arrbounds[1]
+      || offrange[1] < arrbounds[0])
+    return true;
+
+  return !(offrange[0] >= arrbounds[0] && offrange[1] <= arrbounds[1]);
+}
+
+bool
+array_ref_maybe_out_of_bounds (tree ref)
+{
+  tree low_sub = TREE_OPERAND (ref, 1);
+  tree up_sub = low_sub;
+  tree up_bound = array_ref_up_bound (ref);
+
+  tree up_bound_p1;
+
+  if (!up_bound
+      || TREE_CODE (up_bound) != INTEGER_CST)
+    return true;
+
+
+  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+				 build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree low_bound = array_ref_low_bound (ref);
+
+  /* Empty array.  */
+  if (tree_int_cst_equal (low_bound, up_bound_p1))
+    return true;
+
+
+  enum value_range_kind range_type = VR_UNDEFINED;
+  if (TREE_CODE (low_sub) == SSA_NAME)
+    {
+      wide_int min_wi, max_wi;
+      range_type = determine_value_range (low_sub, &min_wi, &max_wi);
+
+      if (range_type != VR_UNDEFINED && range_type != VR_VARYING)
+	{
+	  tree min = wide_int_to_tree (TREE_TYPE (low_sub), min_wi);
+	  tree max = wide_int_to_tree (TREE_TYPE (low_sub), max_wi);
+	  low_sub = range_type == VR_RANGE ? max : min;
+	  up_sub = range_type == VR_RANGE ? min : max;
+	}
+    }
+
+  if (range_type == VR_ANTI_RANGE)
+    {
+      if (up_bound
+	  && TREE_CODE (up_sub) == INTEGER_CST
+	  && tree_int_cst_le (up_bound, up_sub)
+	  && TREE_CODE (low_sub) == INTEGER_CST
+	  && tree_int_cst_le (low_sub, low_bound))
+	return true;
+    }
+  else if (up_bound
+       && TREE_CODE (up_sub) == INTEGER_CST
+       && !tree_int_cst_le (up_sub, up_bound))
+    return true;
+  else if (TREE_CODE (low_sub) == INTEGER_CST
+	   && tree_int_cst_lt (low_sub, low_bound))
+    return true;
+
+  return false;
+}
+
+static tree
+check_array_bounds (tree *tp, int *walk_subtree, void *data)
+{
+  tree t = *tp;
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  bool *maybe_out_of_bounds = (bool*) wi->info;
+
+  if (*maybe_out_of_bounds)
+    {
+      *walk_subtree = FALSE;
+      return NULL_TREE;
+    }
+
+  *walk_subtree = TRUE;
+
+  if (TREE_CODE (t) == ARRAY_REF)
+    *maybe_out_of_bounds = array_ref_maybe_out_of_bounds (t);
+  else if (TREE_CODE (t) == MEM_REF)
+    *maybe_out_of_bounds = mem_ref_maybe_out_of_bounds (t);
+
+  return NULL_TREE;
+}
+
+
+bool
+can_remove_breaks (class loop *loop, unsigned int nbbs, basic_block *bbs,
+		   edge *natural_loop_exit_p, vec<edge> *breaks)
+{
+  if (loop->num_nodes <= 2 || loop->inner || single_exit (loop))
+    return false;
+
+  for (unsigned int i = 0; i < nbbs; ++i)
+    {
+      basic_block bb = bbs[i];
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  struct walk_stmt_info wi;
+
+	  switch (gimple_code (stmt))
+	    {
+	    case GIMPLE_ASSIGN:
+	      if (TREE_CODE (gimple_assign_lhs (stmt)) == MEM_REF)
+		return false;
+	      break;
+	      /* Do not remove breaks in loops with calls or inline
+		 assembly.  */
+	    case GIMPLE_ASM:
+	    case GIMPLE_CALL:
+	      return false;
+	    default:
+	      break;
+	    }
+
+	  if (is_gimple_debug (stmt))
+	    continue;
+
+	  bool maybe_out_of_bounds = false;
+	  memset (&wi, 0, sizeof (wi));
+	  wi.info = &maybe_out_of_bounds;
+	  walk_gimple_op (stmt, check_array_bounds, &wi);
+	  if (maybe_out_of_bounds)
+	    return false;
+	}
+      }
+
+  /* TODO: enable cases with if (cond) { break; } else { something else; }  */
+  if (!single_pred_p (loop->latch))
+    return false;
+
+  /* Find the early exit and the condition for it.  */
+  basic_block bb_with_latch_cond = single_pred (loop->latch);
+  if (!bb_with_latch_cond)
+    return false;
+
+  gimple *latch_condition = gsi_stmt (gsi_last_bb (bb_with_latch_cond));
+
+  if (gimple_code (latch_condition) != GIMPLE_COND)
+    return false;
+
+  /* The latch should only have one predecessor.  */
+  edge latch_pred_e = single_pred_edge (loop->latch);
+  if (latch_pred_e == NULL)
+    return false;
+
+  /* Look for the natural loop exit, this should be the non-latch successor edge
+     of the latch's predecessor.  */
+  edge e;
+  edge_iterator ei;
+  edge natural_loop_exit = NULL;
+  FOR_EACH_EDGE (e, ei, latch_pred_e->src->succs)
+    {
+      if (e != latch_pred_e)
+	natural_loop_exit = e;
+    }
+
+  if (natural_loop_exit == NULL)
+    return false;
+
+  vec<edge> exits = get_loop_exit_edges (loop);
+  for(unsigned int i = 0; i < exits.length (); ++i)
+    {
+      if (exits[i] != natural_loop_exit)
+	{
+	  breaks->safe_push (exits[i]);
+	  /* TODO: Support breaks with same destination as natural exit.
+	     Shouldn't be too difficult, I believe adding an extra basic block
+	     with a conditional jump would work.  */
+	  if (exits[i]->dest == natural_loop_exit->dest)
+	    return false;
+
+	  gimple *break_stmt = gsi_stmt (gsi_last_bb (exits[0]->src));
+	  tree_code code = gimple_cond_code (break_stmt);
+	  tree lhs = gimple_cond_lhs (break_stmt);
+	  bool use_min_for_eq = ((code == EQ_EXPR || code == NE_EXPR)
+				 && TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE);
+	  /* Only support cases that we can transform into a MIN expr.  */
+	  if (!use_min_for_eq)
+	    return false;
+	}
+
+      /*  For a loop that is in loop closed SSA form any values escaping the
+	  loop must do so via a PHI-node in an exit BB.  Use this property to
+	  verify that there is no such value escaping the loop.  */
+      for (gphi_iterator si = gsi_start_phis (exits[i]->dest); !gsi_end_p (si);
+	   gsi_next (&si))
+	{
+	  gphi *phi = si.phi ();
+	  for (unsigned int j = 0; j < gimple_phi_num_args (phi); ++j)
+	    {
+	      tree arg = gimple_phi_arg_def (phi, j);
+	      if (TREE_CODE (arg) == SSA_NAME
+		  && SSA_NAME_DEF_STMT (arg)->bb != NULL
+		  && flow_bb_inside_loop_p (loop, SSA_NAME_DEF_STMT (arg)->bb))
+		return false;
+	    }
+	}
+    }
+
+  /* TODO: Support multiple break stmts.  */
+  if (breaks->length () > 1)
+    return false;
+
+  if (natural_loop_exit_p)
+    *natural_loop_exit_p = natural_loop_exit;
+
+  need_to_predicate = true;
+  return true;
+}
+
 /* Performs splitting of critical edges.  Skip splitting and return false
    if LOOP will not be converted because:
 
@@ -2834,7 +3241,7 @@  ifcvt_split_critical_edges (class loop *loop, bool aggressive_if_conv)
   auto_vec<edge> critical_edges;
 
   /* Loop is not well formed.  */
-  if (num <= 2 || loop->inner || !single_exit (loop))
+  if (num > 2 || loop->inner || !single_exit (loop))
     return false;
 
   body = get_loop_body (loop);
@@ -3004,6 +3411,207 @@  ifcvt_local_dce (class loop *loop)
     }
 }
 
+static unsigned int
+simplify_loop_control_flow (class loop *loop, edge natural_loop_exit,
+			    vec<edge> breaks)
+{
+  unsigned int todo = 0;
+
+  /* Add PHI-node to keep track of new break condition.  */
+  basic_block break_src = breaks[0]->src;
+
+  gimple *break_stmt = gsi_stmt (gsi_last_bb (breaks[0]->src));
+  gcc_assert (gimple_code (break_stmt) == GIMPLE_COND);
+  bool negate = breaks[0]->flags & EDGE_FALSE_VALUE;
+
+  tree_code code = gimple_cond_code (break_stmt);
+  tree lhs = gimple_cond_lhs (break_stmt);
+  tree rhs = gimple_cond_rhs (break_stmt);
+
+  bool use_min_for_eq = ((code == EQ_EXPR || code == NE_EXPR)
+			 && TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE);
+
+  tree cond_var = make_temp_ssa_name (use_min_for_eq
+				      ? unsigned_type_for (TREE_TYPE (lhs))
+				      : boolean_type_node, NULL, "break_cond");
+
+  gphi *break_phi = create_phi_node (cond_var, loop->header);
+  add_phi_arg (break_phi, use_min_for_eq
+			  ? build_one_cst (TREE_TYPE (cond_var))
+			  : build_zero_cst (boolean_type_node),
+	       loop_preheader_edge (loop), UNKNOWN_LOCATION);
+
+  /* Construct break condition variable from original break stmt.  */
+  tree phi_arg;
+
+  if (use_min_for_eq)
+    {
+      phi_arg = fold_build2 (MINUS_EXPR, TREE_TYPE (lhs), lhs, rhs);
+      phi_arg = fold_build1 (NOP_EXPR, TREE_TYPE (cond_var), phi_arg);
+      if (code == NE_EXPR)
+	phi_arg = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (cond_var), phi_arg);
+      phi_arg = fold_build2 (MIN_EXPR, TREE_TYPE (cond_var), phi_arg,
+			     gimple_phi_result (break_phi));
+    }
+  else
+    {
+      phi_arg = build2 (gimple_cond_code (break_stmt), boolean_type_node,
+			 gimple_cond_lhs (break_stmt),
+			 gimple_cond_rhs (break_stmt));
+      if (negate)
+	phi_arg = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, phi_arg);
+
+      phi_arg = fold_build3 (COND_EXPR, boolean_type_node, phi_arg,
+			     build_one_cst (boolean_type_node),
+			     gimple_phi_result (break_phi));
+
+    }
+
+  /* Save break target's BB and remove break edge.  */
+  gimple_stmt_iterator gsi = gsi_last_bb (break_src);
+  gsi_remove (&gsi, true);
+  basic_block break_target = breaks[0]->dest;
+  auto_vec<std::pair<gphi*, tree> >  to_copy_phis;
+
+  for (gphi_iterator itr = gsi_start_phis (breaks[0]->dest);
+      !gsi_end_p(itr); gsi_next (&itr))
+    {
+      gphi *phi = itr.phi ();
+      tree var = gimple_phi_arg_def (phi, breaks[0]->dest_idx);
+      to_copy_phis.safe_push (std::make_pair (phi, var));
+    }
+
+  remove_edge (breaks[0]);
+
+  /* Make sure the other edge is now a FALLTHRU edge.  */
+  edge fallthru_e = single_succ_edge (break_src);
+  gcc_assert (fallthru_e);
+  fallthru_e->flags &= ~(negate ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE);
+  fallthru_e->flags |= EDGE_FALLTHRU;
+
+  /* Update break condition.  */
+  gimple_seq stmts;
+  phi_arg = force_gimple_operand (phi_arg, &stmts, true, NULL);
+  gsi = gsi_last_bb (break_src);
+  gsi_insert_seq_after (&gsi, stmts, GSI_NEW_STMT);
+
+  /* If the basic block where the break condition is computed does not dominate
+     the natural loop exit block, then we must add a PHI-node this to block to
+     ensure correct def-use.  */
+  if (!dominated_by_p (CDI_DOMINATORS, natural_loop_exit->src, break_src))
+    {
+      edge e;
+      edge_iterator ei;
+      cond_var = make_temp_ssa_name (TREE_TYPE (phi_arg), NULL, "break_cond");
+      gphi *new_phi = create_phi_node (cond_var, natural_loop_exit->src);
+
+      FOR_EACH_EDGE (e, ei, natural_loop_exit->src->preds)
+	{
+	  if (dominated_by_p (CDI_DOMINATORS, e->src, break_src))
+	    add_phi_arg (new_phi, phi_arg, e, UNKNOWN_LOCATION);
+	  else
+	    add_phi_arg (new_phi, gimple_phi_result (break_phi), e,
+			 UNKNOWN_LOCATION);
+	}
+
+      phi_arg = gimple_phi_result (new_phi);
+    }
+
+  /* Add updated condition to PHI-node.  */
+  add_phi_arg (break_phi, phi_arg, loop_latch_edge (loop), UNKNOWN_LOCATION);
+
+  loop->early_break_cond = phi_arg;
+
+  /* Construct new block with the new break condition.  */
+  basic_block new_bb = split_edge (natural_loop_exit);
+
+  /* Put the result in a single option PHI-node to make sure
+     loop_closed_ssa_def check doesn't fail.  */
+  cond_var = make_temp_ssa_name (TREE_TYPE (phi_arg), NULL, "break_cond");
+  break_phi = create_phi_node (cond_var, new_bb);
+  add_phi_arg (break_phi, phi_arg, single_pred_edge (new_bb),
+	       UNKNOWN_LOCATION);
+
+  if (use_min_for_eq && ((code == EQ_EXPR) ^ negate))
+    {
+      cond_var = fold_build2 (EQ_EXPR, boolean_type_node, cond_var,
+			      build_zero_cst (TREE_TYPE (cond_var)));
+    }
+
+  gimple *cond_stmt = gimple_build_cond_from_tree (cond_var, NULL_TREE,
+						   NULL_TREE);
+  gsi = gsi_last_bb (new_bb);
+  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
+
+  /* Fix up edges of new block.  */
+  edge false_e = single_succ_edge (new_bb);
+  edge true_e = NULL;
+
+  /* TODO: set probabilities! */
+  if (false_e->dest == break_target)
+    {
+      true_e = false_e;
+      false_e = make_edge (new_bb, natural_loop_exit->dest, EDGE_FALSE_VALUE);
+      true_e->flags &= ~EDGE_FALLTHRU;
+      true_e->flags = EDGE_TRUE_VALUE;
+    }
+  else
+    {
+      false_e->flags &= ~EDGE_FALLTHRU;
+      false_e->flags |= EDGE_FALSE_VALUE;
+      true_e = make_edge (new_bb, break_target, EDGE_TRUE_VALUE);
+    }
+
+  std::pair<gphi*, tree> *el;
+  unsigned int ix;
+  FOR_EACH_VEC_ELT (to_copy_phis, ix, el)
+    {
+      add_phi_arg (el->first, el->second, true_e, UNKNOWN_LOCATION);
+      if (virtual_operand_p (el->second))
+	{
+	  mark_virtual_operand_for_renaming (el->second);
+	  todo |= TODO_update_ssa_only_virtuals;
+	}
+    }
+
+  return todo;
+}
+
+static bool
+version_loop (class loop *loop, class loop **rloop, vec<gimple *> *preds,
+	      bool save_preds)
+{
+  class loop *vloop
+    = (versionable_outer_loop_p (loop_outer (loop))
+       ? loop_outer (loop) : loop);
+  class loop *nloop = version_loop_for_if_conversion (vloop, preds, save_preds);
+  if (nloop == NULL)
+    return false;
+  if (vloop != loop)
+    {
+      /* If versionable_outer_loop_p decided to version the
+	 outer loop, version also the inner loop of the non-vectorized
+	 loop copy.  So we transform:
+	  loop1
+	    loop2
+	 into:
+	  if (LOOP_VECTORIZED (1, 3))
+	    {
+	      loop1
+		loop2
+	    }
+	  else
+	    loop3 (copy of loop1)
+	      if (LOOP_VECTORIZED (4, 5))
+		loop4 (copy of loop2)
+	      else
+		loop5 (copy of loop4)  */
+      gcc_assert (nloop->inner && nloop->inner->next == NULL);
+      *rloop = nloop->inner;
+    }
+  return true;
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -3015,6 +3623,8 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
   bool aggressive_if_conv;
   class loop *rloop;
   bitmap exit_bbs;
+  auto_vec<edge> breaks;
+  edge natural_loop_exit;
 
  again:
   rloop = NULL;
@@ -3033,14 +3643,20 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv))
+  basic_block *bbs = XCNEWVEC (basic_block, loop->num_nodes);
+
+  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+					  loop->num_nodes, loop);
+
+  if (!ifcvt_split_critical_edges (loop, aggressive_if_conv)
+      && !can_remove_breaks (loop, nbbs, bbs, &natural_loop_exit, &breaks))
     goto cleanup;
 
-  if (!if_convertible_loop_p (loop)
+  if (!(if_convertible_loop_p (loop) || !breaks.is_empty ())
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
 
-  if ((need_to_predicate || any_complicated_phi)
+  if ((need_to_predicate || any_complicated_phi || !breaks.is_empty ())
       && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
 	  || loop->dont_vectorize))
     goto cleanup;
@@ -3049,39 +3665,18 @@  tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      specified -ftree-loop-if-convert or unless versioning is required.
      Either version this loop, or if the pattern is right for outer-loop
      vectorization, version the outer loop.  In the latter case we will
-     still if-convert the original inner loop.  */
-  if (need_to_predicate
-      || any_complicated_phi
-      || flag_tree_loop_if_convert != 1)
-    {
-      class loop *vloop
-	= (versionable_outer_loop_p (loop_outer (loop))
-	   ? loop_outer (loop) : loop);
-      class loop *nloop = version_loop_for_if_conversion (vloop, preds);
-      if (nloop == NULL)
-	goto cleanup;
-      if (vloop != loop)
-	{
-	  /* If versionable_outer_loop_p decided to version the
-	     outer loop, version also the inner loop of the non-vectorized
-	     loop copy.  So we transform:
-	      loop1
-		loop2
-	     into:
-	      if (LOOP_VECTORIZED (1, 3))
-		{
-		  loop1
-		    loop2
-		}
-	      else
-		loop3 (copy of loop1)
-		  if (LOOP_VECTORIZED (4, 5))
-		    loop4 (copy of loop2)
-		  else
-		    loop5 (copy of loop4)  */
-	  gcc_assert (nloop->inner && nloop->inner->next == NULL);
-	  rloop = nloop->inner;
-	}
+     still if-convert the original inner loop.  Only do this though if we are
+     certain we will run the vectorizer.  */
+  if (flag_tree_loop_if_convert != 1
+      && !version_loop (loop, &rloop, preds, breaks.is_empty ()))
+    goto cleanup;
+
+  if (!breaks.is_empty ())
+    {
+      todo |= simplify_loop_control_flow (loop, natural_loop_exit, breaks);
+      breaks.block_remove (0, breaks.length ());
+      todo |= TODO_cleanup_cfg;
+      goto cleanup;
     }
 
   /* Now all statements are if-convertible.  Combine all the basic
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8ab6ab3330c5f7302ffddd7fc47c7b20fbed77fc..0449493654f9d0bf1cc35d7586bd7968f09b3b48 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -57,6 +57,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
 #include "cfgloop.h"
+#include "cfganal.h"
 #include "tree-chrec.h"
 #include "tree-scalar-evolution.h"
 #include "tree-inline.h"
@@ -1340,6 +1341,14 @@  tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer,
   class loop *inner;
   enum unroll_level ul;
   unsigned num = number_of_loops (cfun);
+  basic_block *bbs = XCNEWVEC (basic_block, loop->num_nodes);
+  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+					  loop->num_nodes, loop);
+  auto_vec<edge> breaks;
+  edge loop_exit;
+
+  if (can_remove_breaks (loop, nbbs, bbs, &loop_exit, &breaks))
+    return false;
 
   /* Process inner loops first.  Don't walk loops added by the recursive
      calls because SSA form is not up-to-date.  They can be handled in the
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 0ee1ab45c07d548d8938ae75f5c655b504b74f1e..ffefa7cb9cd301dbe773cb0818f581b3d03976c0 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -854,8 +854,26 @@  vect_set_loop_condition_unmasked (class loop *loop, tree niters,
   limit = force_gimple_operand_gsi (&loop_cond_gsi, limit, true, NULL_TREE,
 				     true, GSI_SAME_STMT);
 
-  cond_stmt = gimple_build_cond (code, indx_after_incr, limit, NULL_TREE,
-				 NULL_TREE);
+  if (loop->early_break_cond)
+    {
+      bool exit_on_true = exit_edge->flags & EDGE_TRUE_VALUE;
+      /* EARLY_BREAK_COND is 0 if we can break.  */
+      tree early_break
+	= fold_build2 (exit_on_true ? EQ_EXPR : NE_EXPR, boolean_type_node,
+		       loop->early_break_cond,
+		       build_zero_cst (TREE_TYPE (loop->early_break_cond)));
+      tree cond = build2 (code, boolean_type_node, indx_after_incr, limit);
+      cond = fold_build2 (exit_on_true ? TRUTH_OR_EXPR : TRUTH_AND_EXPR,
+			  boolean_type_node, cond, early_break);
+      cond = force_gimple_operand_gsi (&loop_cond_gsi, cond, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+      cond_stmt = gimple_build_cond (NE_EXPR, cond,
+				     build_zero_cst (boolean_type_node),
+				     NULL_TREE, NULL_TREE);
+    }
+  else
+    cond_stmt = gimple_build_cond (code, indx_after_incr, limit, NULL_TREE,
+				   NULL_TREE);
 
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
@@ -1043,9 +1061,10 @@  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  exit = single_exit (scalar_loop);
+  edge scalar_exit = single_exit (scalar_loop);
+
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
-	    &exit, 1, &new_exit, NULL,
+	    &scalar_exit, 1, &new_exit, NULL,
 	    at_exit ? loop->latch : e->src, true);
   exit = single_exit (loop);
   basic_block new_preheader = new_bbs[0];
@@ -1059,8 +1078,7 @@  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
 	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
 	 header) to have current_def set, so copy them over.  */
-      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
-						exit);
+      slpeel_duplicate_current_defs_from_edges (scalar_exit, exit);
       slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
 							   0),
 						EDGE_SUCC (loop->latch, 0));
@@ -2133,7 +2151,8 @@  slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
      for correct vectorization of live stmts.  */
   if (loop == first)
     {
-      basic_block orig_exit = single_exit (second)->dest;
+
+      basic_block orig_exit = find_natural_loop_exit (second)->dest;
       for (gsi_orig = gsi_start_phis (orig_exit);
 	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
 	{
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 53fccb715efef2521e499e5f764025dc80355f3d..e95a8dd5e28ad2c40bbb18adea8de27df2c858f4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -769,19 +769,6 @@  vect_get_loop_niters (class loop *loop, tree *assumptions,
   return cond;
 }
 
-/* Function bb_in_loop_p
-
-   Used as predicate for dfs order traversal of the loop bbs.  */
-
-static bool
-bb_in_loop_p (const_basic_block bb, const void *data)
-{
-  const class loop *const loop = (const class loop *)data;
-  if (flow_bb_inside_loop_p (loop, bb))
-    return true;
-  return false;
-}
-
 
 /* Create and initialize a new loop_vec_info struct for LOOP_IN, as well as
    stmt_vec_info structs for all the stmts in LOOP_IN.  */
@@ -2450,7 +2437,8 @@  vect_joust_loop_vinfos (loop_vec_info new_loop_vinfo,
    for it.  The different analyses will record information in the
    loop_vec_info struct.  */
 opt_loop_vec_info
-vect_analyze_loop (class loop *loop, vec_info_shared *shared)
+vect_analyze_loop (class loop *loop, vec_info_shared *shared,
+		   bool loop_with_breaks)
 {
   auto_vector_modes vector_modes;
 
@@ -2562,6 +2550,33 @@  vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	    mode_i += 1;
 	  }
 
+      if (res && loop_with_breaks)
+	{
+	  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+	    res = opt_result::failure_at (vect_location,
+					  "Can not vectorize loop with break"
+					  "statements if the number of"
+					  " iterations is not constant.\n");
+	  else if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo),
+				LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+	    res = opt_result::failure_at (vect_location,
+					  "Can not vectorize loop with break"
+					  "statements if the number of"
+					  " iterations is not a multiple of"
+					  " VF.\n");
+	  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
+		   || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+		   || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+	    res = opt_result::failure_at (vect_location,
+					  "Can not vectorize loop with break"
+					  "statements if peeling is"
+					  " required.\n");
+	  if (res)
+	    {
+	      gcc_assert (LOOP_VINFO_INT_NITERS (loop_vinfo) > 0);
+	    }
+	}
+
       if (res)
 	{
 	  LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
@@ -8487,6 +8502,31 @@  update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+edge
+find_natural_loop_exit (class loop *loop)
+{
+  edge natural_exit = single_exit (loop);
+  if (!natural_exit)
+    {
+      /* Look for the natural loop exit, this should be the non-latch successor
+	 edge of the latch's predecessor.  */
+      edge e;
+      edge_iterator ei;
+      edge latch_pred_e = single_pred_edge (loop->latch);
+      gcc_assert (latch_pred_e);
+
+      FOR_EACH_EDGE (e, ei, latch_pred_e->src->succs)
+	{
+	  if (e != latch_pred_e)
+	    natural_exit = e;
+	}
+      gcc_assert (natural_exit);
+    }
+
+  return natural_exit;
+}
+
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -8558,7 +8598,7 @@  vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
      loop closed PHI nodes on the exit.  */
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
     {
-      e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
+      e = find_natural_loop_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
       if (! single_pred_p (e->dest))
 	{
 	  split_loop_exit_edge (e, true);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f7becb34ab41c645e5e76065377d78f2af39a09a..9964644a640cdaae8630810ced7528d334e90ac7 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1801,6 +1801,7 @@  extern tree vect_create_addr_base_for_vector_ref (stmt_vec_info, gimple_seq *,
 
 /* In tree-vect-loop.c.  */
 extern widest_int vect_iv_limit_for_full_masking (loop_vec_info loop_vinfo);
+extern edge find_natural_loop_exit (class loop *);
 /* Used in tree-vect-loop-manip.c */
 extern void determine_peel_for_niter (loop_vec_info);
 /* Used in gimple-loop-interchange.c and tree-parloops.c.  */
@@ -1808,7 +1809,7 @@  extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree,
 				  enum tree_code);
 extern bool needs_fold_left_reduction_p (tree, tree_code);
 /* Drive for loop analysis stage.  */
-extern opt_loop_vec_info vect_analyze_loop (class loop *, vec_info_shared *);
+extern opt_loop_vec_info vect_analyze_loop (class loop *, vec_info_shared *, bool);
 extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL);
 extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *,
 					 tree *, bool);
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 8f9444d58a3248c028a192625c22ce1650fc9027..a475953f76b7bdeced07943913cfde2380d8424b 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -873,6 +873,7 @@  try_vectorize_loop_1 (hash_table<simduid_to_vf> *&simduid_to_vf_htab,
   vec_info_shared shared;
   auto_purge_vect_location sentinel;
   vect_location = find_loop_location (loop);
+  bool loop_with_breaks = false;
 
   if (LOCATION_LOCUS (vect_location.get_location_t ()) != UNKNOWN_LOCATION
       && dump_enabled_p ())
@@ -888,8 +889,14 @@  try_vectorize_loop_1 (hash_table<simduid_to_vf> *&simduid_to_vf_htab,
     loop_vinfo = opt_loop_vec_info::success (vinfo);
   else
     {
+      if (loop_vectorized_call)
+	{
+	  tree arg = gimple_call_arg (loop_vectorized_call, 1);
+	  loop_with_breaks
+	    = !single_exit (get_loop (cfun, tree_to_shwi (arg)));
+	}
       /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p.  */
-      loop_vinfo = vect_analyze_loop (loop, &shared);
+      loop_vinfo = vect_analyze_loop (loop, &shared, loop_with_breaks);
       loop->aux = loop_vinfo;
     }