diff mbox

[PR66846] Mark inner loop for fixup in parloops

Message ID 55B8BB44.2090703@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 29, 2015, 11:38 a.m. UTC
On 28/07/15 12:11, Richard Biener wrote:
> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 20/07/15 15:04, Tom de Vries wrote:
>>>
>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>
>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>> <Tom_deVries@mentor.com> wrote:
>>>>>
>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I.
>>>>>>>
>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>> matching
>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>> expand_omp_for_generic:
>>>>>>> ...
>>>>>>>          struct loop *outer_loop;
>>>>>>>          if (seq_loop)
>>>>>>>            outer_loop = l0_bb->loop_father;
>>>>>>>          else
>>>>>>>            {
>>>>>>>              outer_loop = alloc_loop ();
>>>>>>>              outer_loop->header = l0_bb;
>>>>>>>              outer_loop->latch = l2_bb;
>>>>>>>              add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>            }
>>>>>>>
>>>>>>>          if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>            {
>>>>>>>              struct loop *loop = alloc_loop ();
>>>>>>>              loop->header = l1_bb;
>>>>>>>              /* The loop may have multiple latches.  */
>>>>>>>              add_loop (loop, outer_loop);
>>>>>>>            }
>>>>>>> ...
>>>>>>>
>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>> fixup, in
>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>> ...
>>>>>>>          /* When the OMP expansion process cannot guarantee an
>>>>>>> up-to-date
>>>>>>>             loop tree arrange for the child function to fixup
>>>>>>> loops.  */
>>>>>>>          if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>            child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>> ...
>>>>>>>
>>>>>>> and expand_omp_for:
>>>>>>> ...
>>>>>>>      else
>>>>>>>        /* If there isn't a continue then this is a degerate case where
>>>>>>>           the introduction of abnormal edges during lowering will
>>>>>>> prevent
>>>>>>>           original loops from being detected.  Fix that up.  */
>>>>>>>        loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>> ...
>>>>>>>
>>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>>> with
>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>
>>>>>>> The new child function contains a function call to
>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>> subsequently
>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>
>>>>>>>
>>>>>>> II.
>>>>>>>
>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>> processing
>>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>>> fixup.
>>>>>>>
>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>> loop is
>>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>>> during
>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>> loop (the
>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>> verification failure:
>>>>>>> ...
>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>> ...
>>>>>>>
>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>> testcase
>>>>>>> on
>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>> different
>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>> function
>>>>>>> is processed by. ]
>>>>>>>
>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>> child
>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>> since it
>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>> postpones
>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>> can
>>>>>>> succeed.
>>>>>>>
>>>>>>>
>>>
>>> <SNIP>
>>>
>>>> Can we fix the root-cause of the issue instead?  That is, build a
>>>> valid loop
>>>> structure in the first place?
>>>>
>>>
>>> This patch manages to keep the loop structure, that is, to not cancel
>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>> end of parloops.
>>>
>>> The transformation to insert the omp_for invalidates the loop state
>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>> we drop those in parloops.
>>>
>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>> the omp_for, and keep it.
>>>
>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>
>>
>> This updated patch tries a more minimal approach.
>>
>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
>> exit instead.
>>
>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for trunk?
>
> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
> that back on in execute_expand_omp).  The parloops code lacks comments and
> the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
> it should be difficult
> to preserve simple latches as well - is this just because we insert
> the GIMPLE_OMP_CONTINUE
> in it?
>

It's not difficult to do. It's just that the omp-for that is generated 
via the normal route (omp-annotated source code) doesn't have this 
simple latch, and parloops just mimics that cfg shape.

Attached updated patch preserves simple latches in parloops. We need 
some minor adjustments in omp-expand to handle that.

> If execute_expand_omp is not performed in a loop pipeline where things
> expect simple latches
> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
> at all?  If somebody
> needs it it will just request simple latches.
>

I've looked into this. The (first?) pass that gives me trouble is 
pass_iv_optimize. It doesn't request simple latches, but it does need them.

> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
> LOOPS_HAVE_SIMPLE_LATCHES
> unset in execute_expand_omp.
>
> Ok with that change.
>

OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops 
(if bootstrap and reg-test succeeds)?

Thanks,
- Tom

Comments

Richard Biener July 29, 2015, 12:03 p.m. UTC | #1
On Wed, Jul 29, 2015 at 1:38 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 28/07/15 12:11, Richard Biener wrote:
>>
>> On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>>
>>> On 20/07/15 15:04, Tom de Vries wrote:
>>>>
>>>>
>>>> On 16/07/15 12:15, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>>> <Tom_deVries@mentor.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries
>>>>>>> <Tom_deVries@mentor.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I.
>>>>>>>>
>>>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>>>> matching
>>>>>>>> loops in the loop state of the child function, f.i.in
>>>>>>>>
>>>>>>>> expand_omp_for_generic:
>>>>>>>> ...
>>>>>>>>          struct loop *outer_loop;
>>>>>>>>          if (seq_loop)
>>>>>>>>            outer_loop = l0_bb->loop_father;
>>>>>>>>          else
>>>>>>>>            {
>>>>>>>>              outer_loop = alloc_loop ();
>>>>>>>>              outer_loop->header = l0_bb;
>>>>>>>>              outer_loop->latch = l2_bb;
>>>>>>>>              add_loop (outer_loop, l0_bb->loop_father);
>>>>>>>>            }
>>>>>>>>
>>>>>>>>          if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>>>            {
>>>>>>>>              struct loop *loop = alloc_loop ();
>>>>>>>>              loop->header = l1_bb;
>>>>>>>>              /* The loop may have multiple latches.  */
>>>>>>>>              add_loop (loop, outer_loop);
>>>>>>>>            }
>>>>>>>> ...
>>>>>>>>
>>>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>>>> fixup, in
>>>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>>>> ...
>>>>>>>>          /* When the OMP expansion process cannot guarantee an
>>>>>>>> up-to-date
>>>>>>>>             loop tree arrange for the child function to fixup
>>>>>>>> loops.  */
>>>>>>>>          if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>>>            child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>>>> ...
>>>>>>>>
>>>>>>>> and expand_omp_for:
>>>>>>>> ...
>>>>>>>>      else
>>>>>>>>        /* If there isn't a continue then this is a degerate case
>>>>>>>> where
>>>>>>>>           the introduction of abnormal edges during lowering will
>>>>>>>> prevent
>>>>>>>>           original loops from being detected.  Fix that up.  */
>>>>>>>>        loops_state_set (LOOPS_NEED_FIXUP);
>>>>>>>> ...
>>>>>>>>
>>>>>>>> However, loops are fixed up anyway, because the first pass we
>>>>>>>> execute
>>>>>>>> with
>>>>>>>> the new child function is pass_fixup_cfg.
>>>>>>>>
>>>>>>>> The new child function contains a function call to
>>>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>>>> subsequently
>>>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>>>
>>>>>>>>
>>>>>>>> II.
>>>>>>>>
>>>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>>>> processing
>>>>>>>> of the child function, either the loop structure is ok, or marked
>>>>>>>> for
>>>>>>>> fixup.
>>>>>>>>
>>>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>>>> loop is
>>>>>>>> being parallelized, both the outer and inner loop are cancelled.
>>>>>>>> Then
>>>>>>>> during
>>>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>>>> loop (the
>>>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>>>> verification failure:
>>>>>>>> ...
>>>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>>>> ...
>>>>>>>>
>>>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>>>> testcase
>>>>>>>> on
>>>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>>>> different
>>>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>>>> function
>>>>>>>> is processed by. ]
>>>>>>>>
>>>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>>>> child
>>>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>>>> since it
>>>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>>>> postpones
>>>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>>>> can
>>>>>>>> succeed.
>>>>>>>>
>>>>>>>>
>>>>
>>>> <SNIP>
>>>>
>>>>> Can we fix the root-cause of the issue instead?  That is, build a
>>>>> valid loop
>>>>> structure in the first place?
>>>>>
>>>>
>>>> This patch manages to keep the loop structure, that is, to not cancel
>>>> the loop tree in parloops, and guarantee a valid loop structure at the
>>>> end of parloops.
>>>>
>>>> The transformation to insert the omp_for invalidates the loop state
>>>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>>>> we drop those in parloops.
>>>>
>>>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>>>> the omp_for, and keep it.
>>>>
>>>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>>>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>>>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>>>
>>>
>>> This updated patch tries a more minimal approach.
>>>
>>> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the
>>> new
>>> exit instead.
>>>
>>> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
>>> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of
>>> pass_expand_omp_ssa.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for trunk?
>>
>>
>> I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp.
>> turning
>> that back on in execute_expand_omp).  The parloops code lacks comments and
>> the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
>> it should be difficult
>> to preserve simple latches as well - is this just because we insert
>> the GIMPLE_OMP_CONTINUE
>> in it?
>>
>
> It's not difficult to do. It's just that the omp-for that is generated via
> the normal route (omp-annotated source code) doesn't have this simple latch,
> and parloops just mimics that cfg shape.
>
> Attached updated patch preserves simple latches in parloops. We need some
> minor adjustments in omp-expand to handle that.
>
>> If execute_expand_omp is not performed in a loop pipeline where things
>> expect simple latches
>> (well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
>> at all?  If somebody
>> needs it it will just request simple latches.
>>
>
> I've looked into this. The (first?) pass that gives me trouble is
> pass_iv_optimize. It doesn't request simple latches, but it does need them.

Well, pass_iv_optimize is inside the loop pass which ensures we have
LOOPS_NORMAL.
Passes in it rely on that.

>> So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
>> LOOPS_HAVE_SIMPLE_LATCHES
>> unset in execute_expand_omp.
>>
>> Ok with that change.
>>
>
> OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops (if
> bootstrap and reg-test succeeds)?

Yes.

Thanks,
Richard.

> Thanks,
> - Tom
>
diff mbox

Patch

Don't cancel loop tree in parloops

2015-07-28  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
	verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	(execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for
	cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_for_static_nochunk): Handle simple latch bb.  Handle case
	that omp_for already has its own loop struct.
	* tree-parloops.c (create_phi_for_local_result)
	(create_call_for_reduction): Handle simple latch bb.
	(create_parallel_loop): Add simple latch bb to preserve
	LOOPS_HAVE_SIMPLE_LATCHES.  Record new exit.  Handle simple latch bb.
	(gen_parallel_loop): Remove call to cancel_loop_tree.
	(parallelize_loops): Skip loops that are inner loops of parallelized
	loops.
	(pass_parallelize_loops::execute) [ENABLE_CHECKING]: Call
	verify_loop_structure.
---
 gcc/omp-low.c       | 32 ++++++++++++++++++++++++++++++--
 gcc/tree-parloops.c | 49 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 66 insertions(+), 15 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..0f5c0f1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@  expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -6535,7 +6539,8 @@  expand_omp_for_static_nochunk (struct omp_region *region,
   body_bb = single_succ (seq_start_bb);
   if (!broken_loop)
     {
-      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+      gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb
+		  || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb);
       gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
     }
   exit_bb = region->exit;
@@ -6818,6 +6823,11 @@  expand_omp_for_static_nochunk (struct omp_region *region,
   if (!broken_loop)
     {
       ep = find_edge (cont_bb, body_bb);
+      if (ep == NULL)
+	{
+	  ep = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (ep->dest) == body_bb);
+	}
       if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (ep);
@@ -6843,9 +6853,19 @@  expand_omp_for_static_nochunk (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, fin_bb,
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 
+  struct loop *loop = body_bb->loop_father;
+  if (loop != entry_bb->loop_father)
+    {
+      gcc_assert (loop->header == body_bb);
+      gcc_assert (broken_loop
+		  || loop->latch == region->cont
+		  || single_pred (loop->latch) == region->cont);
+      return;
+    }
+
   if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
     {
-      struct loop *loop = alloc_loop ();
+      loop = alloc_loop ();
       loop->header = body_bb;
       if (collapse_bb == NULL)
 	loop->latch = cont_bb;
@@ -8984,6 +9004,10 @@  expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -9492,6 +9516,10 @@  execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
+#ifdef ENABLE_CHECKING
+  if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();
+#endif
   cleanup_tree_cfg ();
 
   free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b06265c..d017479 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1032,21 +1032,22 @@  create_phi_for_local_result (reduction_info **slot, struct loop *loop)
   struct reduction_info *const reduc = *slot;
   edge e;
   gphi *new_phi;
-  basic_block store_bb;
+  basic_block store_bb, continue_bb;
   tree local_res;
   source_location locus;
 
   /* STORE_BB is the block where the phi
      should be stored.  It is the destination of the loop exit.
      (Find the fallthru edge from GIMPLE_OMP_CONTINUE).  */
-  store_bb = FALLTHRU_EDGE (loop->latch)->dest;
+  continue_bb = single_pred (loop->latch);
+  store_bb = FALLTHRU_EDGE (continue_bb)->dest;
 
   /* STORE_BB has two predecessors.  One coming from  the loop
      (the reduction's result is computed at the loop),
      and another coming from a block preceding the loop,
      when no iterations
      are executed (the initial value should be taken).  */
-  if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch))
+  if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (continue_bb))
     e = EDGE_PRED (store_bb, 1);
   else
     e = EDGE_PRED (store_bb, 0);
@@ -1055,7 +1056,7 @@  create_phi_for_local_result (reduction_info **slot, struct loop *loop)
   locus = gimple_location (reduc->reduc_stmt);
   new_phi = create_phi_node (local_res, store_bb);
   add_phi_arg (new_phi, reduc->init, e, locus);
-  add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus);
+  add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (continue_bb), locus);
   reduc->new_phi = new_phi;
 
   return 1;
@@ -1134,7 +1135,8 @@  create_call_for_reduction (struct loop *loop,
 {
   reduction_list->traverse <struct loop *, create_phi_for_local_result> (loop);
   /* Find the fallthru edge from GIMPLE_OMP_CONTINUE.  */
-  ld_st_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest;
+  basic_block continue_bb = single_pred (loop->latch);
+  ld_st_data->load_bb = FALLTHRU_EDGE (continue_bb)->dest;
   reduction_list
     ->traverse <struct clsn_data *, create_call_for_reduction_1> (ld_st_data);
 }
@@ -1981,7 +1983,7 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 		      tree new_data, unsigned n_threads, location_t loc)
 {
   gimple_stmt_iterator gsi;
-  basic_block bb, paral_bb, for_bb, ex_bb;
+  basic_block bb, paral_bb, for_bb, ex_bb, continue_bb;
   tree t, param;
   gomp_parallel *omp_par_stmt;
   gimple omp_return_stmt1, omp_return_stmt2;
@@ -2052,8 +2054,12 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   gcc_assert (exit == single_dom_exit (loop));
 
   guard = make_edge (for_bb, ex_bb, 0);
-  single_succ_edge (loop->latch)->flags = 0;
-  end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+  /* Split the latch edge, so LOOPS_HAVE_SIMPLE_LATCHES is still valid.  */
+  loop->latch = split_edge (single_succ_edge (loop->latch));
+  single_pred_edge (loop->latch)->flags = 0;
+  end = make_edge (single_pred (loop->latch), ex_bb, EDGE_FALLTHRU);
+  rescan_loop_exit (end, true, false);
+
   for (gphi_iterator gpi = gsi_start_phis (ex_bb);
        !gsi_end_p (gpi); gsi_next (&gpi))
     {
@@ -2102,7 +2108,8 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   SSA_NAME_DEF_STMT (initvar) = for_stmt;
 
   /* Emit GIMPLE_OMP_CONTINUE.  */
-  gsi = gsi_last_bb (loop->latch);
+  continue_bb = single_pred (loop->latch);
+  gsi = gsi_last_bb (continue_bb);
   omp_cont_stmt = gimple_build_omp_continue (cvar_next, cvar);
   gimple_set_location (omp_cont_stmt, loc);
   gsi_insert_after (&gsi, omp_cont_stmt, GSI_NEW_STMT);
@@ -2298,10 +2305,6 @@  gen_parallel_loop (struct loop *loop,
 
   scev_reset ();
 
-  /* Cancel the loop (it is simpler to do it here rather than to teach the
-     expander to do it).  */
-  cancel_loop_tree (loop);
-
   /* Free loop bound estimations that could contain references to
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
@@ -2587,6 +2590,7 @@  parallelize_loops (void)
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
   struct loop *loop;
+  struct loop *skip_loop = NULL;
   struct tree_niter_desc niter_desc;
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
@@ -2604,6 +2608,19 @@  parallelize_loops (void)
 
   FOR_EACH_LOOP (loop, 0)
     {
+      if (loop == skip_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Skipping loop %d as inner loop of parallelized loop\n",
+		     loop->num);
+
+	  skip_loop = loop->inner;
+	  continue;
+	}
+      else
+	skip_loop = NULL;
+
       reduction_list.empty ();
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2663,6 +2680,7 @@  parallelize_loops (void)
 	continue;
 
       changed = true;
+      skip_loop = loop->inner;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
 	if (loop->inner)
@@ -2729,6 +2747,11 @@  pass_parallelize_loops::execute (function *fun)
   if (parallelize_loops ())
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
+
+#ifdef ENABLE_CHECKING
+      verify_loop_structure ();
+#endif
+
       return TODO_update_ssa;
     }
 
-- 
1.9.1