diff mbox

[PR66846] Mark inner loop for fixup in parloops

Message ID 55B20F2D.3060403@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 24, 2015, 10:10 a.m. UTC
On 20/07/15 15:04, Tom de Vries wrote:
> On 16/07/15 12:15, Richard Biener wrote:
>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>> <Tom_deVries@mentor.com> wrote:
>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>
>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I.
>>>>>
>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>> matching
>>>>> loops in the loop state of the child function, f.i.in
>>>>> expand_omp_for_generic:
>>>>> ...
>>>>>         struct loop *outer_loop;
>>>>>         if (seq_loop)
>>>>>           outer_loop = l0_bb->loop_father;
>>>>>         else
>>>>>           {
>>>>>             outer_loop = alloc_loop ();
>>>>>             outer_loop->header = l0_bb;
>>>>>             outer_loop->latch = l2_bb;
>>>>>             add_loop (outer_loop, l0_bb->loop_father);
>>>>>           }
>>>>>
>>>>>         if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>           {
>>>>>             struct loop *loop = alloc_loop ();
>>>>>             loop->header = l1_bb;
>>>>>             /* The loop may have multiple latches.  */
>>>>>             add_loop (loop, outer_loop);
>>>>>           }
>>>>> ...
>>>>>
>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>> fixup, in
>>>>> expand_omp_taskreg and expand_omp_target:
>>>>> ...
>>>>>         /* When the OMP expansion process cannot guarantee an
>>>>> up-to-date
>>>>>            loop tree arrange for the child function to fixup
>>>>> loops.  */
>>>>>         if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>           child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>> ...
>>>>>
>>>>> and expand_omp_for:
>>>>> ...
>>>>>     else
>>>>>       /* If there isn't a continue then this is a degerate case where
>>>>>          the introduction of abnormal edges during lowering will
>>>>> prevent
>>>>>          original loops from being detected.  Fix that up.  */
>>>>>       loops_state_set (LOOPS_NEED_FIXUP);
>>>>> ...
>>>>>
>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>> with
>>>>> the new child function is pass_fixup_cfg.
>>>>>
>>>>> The new child function contains a function call to
>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>> subsequently
>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>
>>>>>
>>>>> II.
>>>>>
>>>>> This patch adds a verification that at the end of the omp-expand
>>>>> processing
>>>>> of the child function, either the loop structure is ok, or marked for
>>>>> fixup.
>>>>>
>>>>> This verfication triggered a failure in parloops. When an outer
>>>>> loop is
>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>> during
>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>> loop (the
>>>>> one that is transformed), but not for the inner, which causes the
>>>>> verification failure:
>>>>> ...
>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>> ...
>>>>>
>>>>> [ I ran into this verification failure with an openacc kernels
>>>>> testcase
>>>>> on
>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>> different
>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>> function
>>>>> is processed by. ]
>>>>>
>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>> child
>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>> since it
>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>> postpones
>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>> can
>>>>> succeed.
>>>>>
>>>>>
>
> <SNIP>
>
>> Can we fix the root-cause of the issue instead?  That is, build a
>> valid loop
>> structure in the first place?
>>
>
> This patch manages to keep the loop structure, that is, to not cancel
> the loop tree in parloops, and guarantee a valid loop structure at the
> end of parloops.
>
> The transformation to insert the omp_for invalidates the loop state
> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
> we drop those in parloops.
>
> In expand_omp_for_static_nochunk, we detect the existing loop struct of
> the omp_for, and keep it.
>
> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
> LOOPS_HAVE_SIMPLE_LATCHES back.
>

This updated patch tries a more minimal approach.

Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the 
new exit instead.

And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we 
just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

Comments

Richard Biener July 28, 2015, 10:11 a.m. UTC | #1
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 20/07/15 15:04, Tom de Vries wrote:
>>
>> On 16/07/15 12:15, Richard Biener wrote:
>>>
>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>> <Tom_deVries@mentor.com> wrote:
>>>>
>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I.
>>>>>>
>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>> matching
>>>>>> loops in the loop state of the child function, f.i.in
>>>>>> expand_omp_for_generic:
>>>>>> ...
>>>>>>         struct loop *outer_loop;
>>>>>>         if (seq_loop)
>>>>>>           outer_loop = l0_bb->loop_father;
>>>>>>         else
>>>>>>           {
>>>>>>             outer_loop = alloc_loop ();
>>>>>>             outer_loop->header = l0_bb;
>>>>>>             outer_loop->latch = l2_bb;
>>>>>>             add_loop (outer_loop, l0_bb->loop_father);
>>>>>>           }
>>>>>>
>>>>>>         if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>>           {
>>>>>>             struct loop *loop = alloc_loop ();
>>>>>>             loop->header = l1_bb;
>>>>>>             /* The loop may have multiple latches.  */
>>>>>>             add_loop (loop, outer_loop);
>>>>>>           }
>>>>>> ...
>>>>>>
>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>> fixup, in
>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>> ...
>>>>>>         /* When the OMP expansion process cannot guarantee an
>>>>>> up-to-date
>>>>>>            loop tree arrange for the child function to fixup
>>>>>> loops.  */
>>>>>>         if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>>           child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>> ...
>>>>>>
>>>>>> and expand_omp_for:
>>>>>> ...
>>>>>>     else
>>>>>>       /* If there isn't a continue then this is a degerate case where
>>>>>>          the introduction of abnormal edges during lowering will
>>>>>> prevent
>>>>>>          original loops from being detected.  Fix that up.  */
>>>>>>       loops_state_set (LOOPS_NEED_FIXUP);
>>>>>> ...
>>>>>>
>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>> with
>>>>>> the new child function is pass_fixup_cfg.
>>>>>>
>>>>>> The new child function contains a function call to
>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>> subsequently
>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>
>>>>>>
>>>>>> II.
>>>>>>
>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>> processing
>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>> fixup.
>>>>>>
>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>> loop is
>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>> during
>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>> loop (the
>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>> verification failure:
>>>>>> ...
>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>> ...
>>>>>>
>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>> testcase
>>>>>> on
>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>> different
>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>> function
>>>>>> is processed by. ]
>>>>>>
>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>> child
>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>> since it
>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>> postpones
>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>> can
>>>>>> succeed.
>>>>>>
>>>>>>
>>
>> <SNIP>
>>
>>> Can we fix the root-cause of the issue instead?  That is, build a
>>> valid loop
>>> structure in the first place?
>>>
>>
>> This patch manages to keep the loop structure, that is, to not cancel
>> the loop tree in parloops, and guarantee a valid loop structure at the
>> end of parloops.
>>
>> The transformation to insert the omp_for invalidates the loop state
>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>> we drop those in parloops.
>>
>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>> the omp_for, and keep it.
>>
>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>
>
> This updated patch tries a more minimal approach.
>
> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
> exit instead.
>
> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?

I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp).  The parloops code lacks comments and
the /* Prepare cfg.  */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?

If execute_expand_omp is not performed in a loop pipeline where things
expect simple latches
(well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
at all?  If somebody
needs it it will just request simple latches.

So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
LOOPS_HAVE_SIMPLE_LATCHES
unset in execute_expand_omp.

Ok with that change.

Thanks,
Richard.

> Thanks,
> - Tom
>
diff mbox

Patch

Don't cancel loop tree in parloops

2015-07-20  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/66846
	* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
	verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
	(expand_omp_target) [ENABLE_CHECKING]: Same.
	(execute_expand_omp): Reinstate LOOPS_HAVE_SIMPLE_LATCHES if in ssa.
	[ENABLE_CHECKING]: Call verify_loop_structure for cfun if
	!LOOPS_NEED_FIXUP.
	(expand_omp_for_static_nochunk): Handle case that omp_for already has
	its own loop struct.
	* tree-parloops.c (create_parallel_loop): Add comment about breaking
	LOOPS_HAVE_SIMPLE_LATCHES.  Record new exit.
	(gen_parallel_loop): Remove call to cancel_loop_tree.
	(parallelize_loops): Skip loops that are inner loops of parallelized
	loops.
	(pass_parallelize_loops::execute): Clear LOOPS_HAVE_SIMPLE_LATCHES on
	loop state.
	[ENABLE_CHECKING]: Call verify_loop_structure.
---
 gcc/omp-low.c       | 27 ++++++++++++++++++++++++++-
 gcc/tree-parloops.c | 32 ++++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..ce83abf 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@  expand_omp_taskreg (struct omp_region *region)
 	}
       if (gimple_in_ssa_p (cfun))
 	update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -6843,9 +6847,17 @@  expand_omp_for_static_nochunk (struct omp_region *region,
   set_immediate_dominator (CDI_DOMINATORS, fin_bb,
 			   recompute_dominator (CDI_DOMINATORS, fin_bb));
 
+  struct loop *loop = body_bb->loop_father;
+  if (loop != entry_bb->loop_father)
+    {
+      gcc_assert (loop->header == body_bb);
+      gcc_assert (broken_loop || loop->latch == region->cont);
+      return;
+    }
+
   if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
     {
-      struct loop *loop = alloc_loop ();
+      loop = alloc_loop ();
       loop->header = body_bb;
       if (collapse_bb == NULL)
 	loop->latch = cont_bb;
@@ -8984,6 +8996,10 @@  expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+#ifdef ENABLE_CHECKING
+      if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+	verify_loop_structure ();
+#endif
       pop_cfun ();
     }
 
@@ -9492,6 +9508,15 @@  execute_expand_omp (void)
 
   expand_omp (root_omp_region);
 
+  /* We dropped this property in parloops because of the omp_for.  Now that the
+     omp_for has been expanded, reinstate it.  */
+  if (gimple_in_ssa_p (cfun))
+    loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
+
+#ifdef ENABLE_CHECKING
+  if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+    verify_loop_structure ();
+#endif
   cleanup_tree_cfg ();
 
   free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ec41834..1899052 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2045,7 +2045,14 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 
   guard = make_edge (for_bb, ex_bb, 0);
   single_succ_edge (loop->latch)->flags = 0;
+
+  /* After creating this edge, the latch has two successors, so
+     LOOPS_HAVE_SIMPLE_LATCHES is no longer valid.  We'll update the loop state
+     as such at the end of the pass, since it's not needed earlier, and doing it
+     earlier will invalidate info for loops we still need to process.  */
   end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+  rescan_loop_exit (end, true, false);
+
   for (gphi_iterator gpi = gsi_start_phis (ex_bb);
        !gsi_end_p (gpi); gsi_next (&gpi))
     {
@@ -2282,10 +2289,6 @@  gen_parallel_loop (struct loop *loop,
 
   scev_reset ();
 
-  /* Cancel the loop (it is simpler to do it here rather than to teach the
-     expander to do it).  */
-  cancel_loop_tree (loop);
-
   /* Free loop bound estimations that could contain references to
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
@@ -2521,6 +2524,7 @@  parallelize_loops (void)
   unsigned n_threads = flag_tree_parallelize_loops;
   bool changed = false;
   struct loop *loop;
+  struct loop *skip_loop = NULL;
   struct tree_niter_desc niter_desc;
   struct obstack parloop_obstack;
   HOST_WIDE_INT estimated;
@@ -2538,6 +2542,19 @@  parallelize_loops (void)
 
   FOR_EACH_LOOP (loop, 0)
     {
+      if (loop == skip_loop)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Skipping loop %d as inner loop of parallelized loop\n",
+		     loop->num);
+
+	  skip_loop = loop->inner;
+	  continue;
+	}
+      else
+	skip_loop = NULL;
+
       reduction_list.empty ();
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
@@ -2597,6 +2614,7 @@  parallelize_loops (void)
 	continue;
 
       changed = true;
+      skip_loop = loop->inner;
       if (dump_file && (dump_flags & TDF_DETAILS))
       {
 	if (loop->inner)
@@ -2663,6 +2681,12 @@  pass_parallelize_loops::execute (function *fun)
   if (parallelize_loops ())
     {
       fun->curr_properties &= ~(PROP_gimple_eomp);
+
+      loops_state_clear (LOOPS_HAVE_SIMPLE_LATCHES);
+#ifdef ENABLE_CHECKING
+      verify_loop_structure ();
+#endif
+
       return TODO_update_ssa;
     }
 
-- 
1.9.1