diff mbox

Run pass_expand_omp_ssa after pass_paralellize_loops

Message ID 54635DA8.6030603@mentor.com
State New
Headers show

Commit Message

Tom de Vries Nov. 12, 2014, 1:16 p.m. UTC
[ moved from gcc@ to gcc-patches@ ]
[ subject was: Re: [gomp4] openacc kernels directive support ]
On 30-09-14 15:37, Tom de Vries wrote:
>> I would be happily accepting splitting the current autopar pass
>> that way, that is, do
>>
>> NEXT_PASS (pass_parallelize_loops)
>> PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
>> NEXT_PASS (pass_expand_omp)
>> POP_INSERT_PASSES ()
>>
>> and make the analysis code handle lowered OMP form.
>>
>
> To explore that, I created a tentative patch on top of the gomp-4_0-branch,
> which allows a non-bootstrap build and a gcc dg.exp run, so at least a couple of
> parloops test-cases. I can put this through bootstrap and reg-test if you
> confirm this patch is what you want.
>
> I'm not sure though OACC and autopar can share the actual function split-off.
> autopar is run rather late, later than the lto-stream point, while we need the
> split-off done before that for oacc. I'm also not sure what the point would be
> to have lowered OMP form in all those passes in between, I'd think you want to
> omp-expand it asap.

Richard,

This patch implements your proposal. It uses pass_expand_omp after 
pass_parallelize_loops to expand the omp constructs inserted by 
pass_parallelize_loops.

Note: the patch doesn't remove omp_expand_local, since I'm still using that in 
my oacc kernels directive patch series.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

Comments

David Malcolm Nov. 12, 2014, 2:06 p.m. UTC | #1
On Wed, 2014-11-12 at 14:16 +0100, Tom de Vries wrote:

I can't speak to the rest of the patch, but one readability nitpick:

> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2305d67..dd91718 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -241,6 +241,9 @@ along with GCC; see the file COPYING3.  If not see
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_iv_canon);
>           NEXT_PASS (pass_parallelize_loops);
> +         PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
> +         NEXT_PASS (pass_expand_omp_ssa);
> +         POP_INSERT_PASSES ()

Please can you add an indentation level within the PUSH/POP pair, giving
something like this?

+         PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
+             NEXT_PASS (pass_expand_omp_ssa);
+         POP_INSERT_PASSES ()
 
(the rest of the file is using 4 spaces worth of indentation for each
nesting level, tabifying multiples of 8)

Thanks
Dave
Richard Biener Nov. 12, 2014, 2:17 p.m. UTC | #2
On Wed, 12 Nov 2014, Tom de Vries wrote:

> [ moved from gcc@ to gcc-patches@ ]
> [ subject was: Re: [gomp4] openacc kernels directive support ]
> On 30-09-14 15:37, Tom de Vries wrote:
> > > I would be happily accepting splitting the current autopar pass
> > > that way, that is, do
> > > 
> > > NEXT_PASS (pass_parallelize_loops)
> > > PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
> > > NEXT_PASS (pass_expand_omp)
> > > POP_INSERT_PASSES ()
> > > 
> > > and make the analysis code handle lowered OMP form.
> > > 
> > 
> > To explore that, I created a tentative patch on top of the gomp-4_0-branch,
> > which allows a non-bootstrap build and a gcc dg.exp run, so at least a
> > couple of
> > parloops test-cases. I can put this through bootstrap and reg-test if you
> > confirm this patch is what you want.
> > 
> > I'm not sure though OACC and autopar can share the actual function
> > split-off.
> > autopar is run rather late, later than the lto-stream point, while we need
> > the
> > split-off done before that for oacc. I'm also not sure what the point would
> > be
> > to have lowered OMP form in all those passes in between, I'd think you want
> > to
> > omp-expand it asap.
> 
> Richard,
> 
> This patch implements your proposal. It uses pass_expand_omp after
> pass_parallelize_loops to expand the omp constructs inserted by
> pass_parallelize_loops.
> 
> Note: the patch doesn't remove omp_expand_local, since I'm still using that in
> my oacc kernels directive patch series.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?

Hmm, we have used properties to communicate this kind of lowering
need in the past.  So I would prefer you introduce

#define PROP_gimple_eomp (1 << 13)      /* no OpenMP directives */

provide that property by the omp expansion pass, clear it from
parloops and gate the omp expand pass if the property is already set.

Look at how PROP_gimple_lcx is handled.

Thanks,
Richard.

> Thanks,
> - Tom
>
diff mbox

Patch

2014-11-12  Tom de Vries  <tom@codesourcery.com>

	* function.h (struct function): Add omp_expand_needed field.
	* omp-low.c (pass_data pass_data_expand_omp_ssa): New pass_data.
	(class pass_expand_omp_ssa): New pass.
	(make_pass_expand_omp_ssa): New function.
	* passes.def (pass_parallelize_loops): Use PUSH_INSERT_PASSES_WITHIN
	instead of NEXT_PASS.
	(pass_expand_omp_ssa): Add after pass_parallelize_loops.
	* tree-parloops.c (gen_parallel_loop): Remove call to omp_expand_local.
	(pass_parallelize_loops::execute): Don't do cleanups TODO_cleanup_cfg
	and TODO_rebuild_alias yet.  Add TODO_update_ssa.  Set
	cfun->omp_expand_needed.
	* tree-pass.h (make_pass_expand_omp_ssa): Declare.
---
 gcc/function.h      |  3 +++
 gcc/omp-low.c       | 42 ++++++++++++++++++++++++++++++++++++++++++
 gcc/passes.def      |  3 +++
 gcc/tree-parloops.c | 18 +++++++-----------
 gcc/tree-pass.h     |  1 +
 5 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 3a6305c..1afce69 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -667,6 +667,9 @@  struct GTY(()) function {
 
   /* Set when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Set when an omp_expand is needed.  */
+  unsigned int omp_expand_needed : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5210de1..b748ee1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -8832,6 +8832,48 @@  make_pass_expand_omp (gcc::context *ctxt)
 {
   return new pass_expand_omp (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_expand_omp_ssa =
+{
+  GIMPLE_PASS, /* type */
+  "ompexpssa", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg | PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+};
+
+class pass_expand_omp_ssa : public gimple_opt_pass
+{
+public:
+  pass_expand_omp_ssa (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_expand_omp_ssa, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return (bool)cfun->omp_expand_needed; }
+
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+      cfun->omp_expand_needed = 0;
+      return res;
+    }
+
+}; // class pass_expand_omp_ssa
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_expand_omp_ssa (gcc::context *ctxt)
+{
+  return new pass_expand_omp_ssa (ctxt);
+}
 
 /* Routines to lower OpenMP directives into OMP-GIMPLE.  */
 
diff --git a/gcc/passes.def b/gcc/passes.def
index 2305d67..dd91718 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -241,6 +241,9 @@  along with GCC; see the file COPYING3.  If not see
 	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_iv_canon);
 	  NEXT_PASS (pass_parallelize_loops);
+	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
+	  NEXT_PASS (pass_expand_omp_ssa);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 09b3f16..c5e3041 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1753,7 +1753,6 @@  gen_parallel_loop (struct loop *loop,
   tree many_iterations_cond, type, nit;
   tree arg_struct, new_arg_struct;
   gimple_seq stmts;
-  basic_block parallel_head;
   edge entry, exit;
   struct clsn_data clsn_data;
   unsigned prob;
@@ -1891,8 +1890,8 @@  gen_parallel_loop (struct loop *loop,
   cond_stmt = last_stmt (loop->header);
   if (cond_stmt)
     loc = gimple_location (cond_stmt);
-  parallel_head = create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
-					new_arg_struct, n_threads, loc);
+  create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
+			new_arg_struct, n_threads, loc);
   if (reduction_list->elements () > 0)
     create_call_for_reduction (loop, reduction_list, &clsn_data);
 
@@ -1906,13 +1905,6 @@  gen_parallel_loop (struct loop *loop,
      removed statements.  */
   FOR_EACH_LOOP (loop, 0)
     free_numbers_of_iterations_estimates_loop (loop);
-
-  /* Expand the parallel constructs.  We do it directly here instead of running
-     a separate expand_omp pass, since it is more efficient, and less likely to
-     cause troubles with further analyses not being able to deal with the
-     OMP trees.  */
-
-  omp_expand_local (parallel_head);
 }
 
 /* Returns true when LOOP contains vector phi nodes.  */
@@ -2284,7 +2276,11 @@  pass_parallelize_loops::execute (function *fun)
     return 0;
 
   if (parallelize_loops ())
-    return TODO_cleanup_cfg | TODO_rebuild_alias;
+    {
+      fun->omp_expand_needed = 1;
+      return TODO_update_ssa;
+    }
+
   return 0;
 }
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a3efdd8..d68979f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -399,6 +399,7 @@  extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
-- 
1.9.1