diff mbox

Expand oacc kernels after pass_fre

Message ID 557076A5.7050207@mentor.com
State New
Headers show

Commit Message

Tom de Vries June 4, 2015, 4:02 p.m. UTC
On 22/04/15 09:36, Richard Biener wrote:
> On Tue, 21 Apr 2015, Thomas Schwinge wrote:
>
>> Hi!
>>
>> On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 24-11-14 11:56, Tom de Vries wrote:
>>>> On 15-11-14 18:19, Tom de Vries wrote:
>>>>> On 15-11-14 13:14, Tom de Vries wrote:
>>>>>> I'm submitting a patch series with initial support for the oacc kernels
>>>>>> directive.
>>>>>>
>>>>>> The patch series uses pass_parallelize_loops to implement parallelization of
>>>>>> loops in the oacc kernels region.
>>>>>>
>>>>>> The patch series consists of these 8 patches:
>>>>>> ...
>>>>>>       1  Expand oacc kernels after pass_build_ealias
>>>>>>       2  Add pass_oacc_kernels
>>>>>>       3  Add pass_ch_oacc_kernels to pass_oacc_kernels
>>>>>>       4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
>>>>>>       5  Add pass_loop_im to pass_oacc_kernels
>>>>>>       6  Add pass_ccp to pass_oacc_kernels
>>>>>>       7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
>>>>>>       8  Do simple omp lowering for no address taken var
>>>>>> ...
>>>>>
>>>>> This patch moves omp expansion of the oacc kernels directive to after
>>>>> pass_build_ealias.
>>>>>
>>>>> The rationale is that in order to use pass_parallelize_loops for analysis and
>>>>> transformation of an oacc kernels region, we postpone omp expansion of that
>>>>> region until the earliest point in the pass list where enough information is
>>>>> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
>>>>>
>>>>> The patch postpones expansion in expand_omp, and ensures expansion by adding
>>>>> pass_expand_omp_ssa:
>>>>> - after pass_build_ealias, and
>>>>> - after pass_all_early_optimizations for the case we're not optimizing.
>>>>>
>>>>> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
>>>>> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
>>>>> lowered omp code, to handle it conservatively.
>>>>>
>>>>> The patch contains changes in expand_omp_target to deal with ssa-code, similar
>>>>> to what is already present in expand_omp_taskreg.
>>>>>
>>>>> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
>>>>> static for oacc kernels. It does this to get some references to .omp_data_sizes
>>>>> and .omp_data_kinds in the ssa code.  Without these references, the definitions
>>>>> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
>>>>> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
>>>>> kludge for this purpose ].
>>>>>
>>>>> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
>>>>> original function of which the definition has been removed (as in moved to the
>>>>> split off function). TODO_remove_unused_locals takes care of some of them, but
>>>>> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
>>>>> dangling SSA_NAMEs and releases them.
>>>>>
>>>>
>>>> Reposting with small update: I've replaced the use of the rather generic
>>>> gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
>>>>
>>>> Bootstrapped and reg-tested in the same way as before.
>>>>
>>>
>>> I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
>>>
>>> This allows fre to unify references to the same omp variable before entering
>>> pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
>>>
>>> F.i. this reduction fragment:
>>> ...
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _67 = .omp_data_i_59->sumD.2270;
>>>     # VUSE <.MEM_8>
>>>     _68 = *_67;
>>>
>>>     _70 = _66 + _68;
>>>
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _69 = .omp_data_i_59->sumD.2270;
>>>     # .MEM_71 = VDEF <.MEM_8>
>>>     *_69 = _70;
>>> ...
>>>
>>> is transformed by fre into:
>>> ...
>>>     # VUSE <.MEM_8>
>>>     # PT = { D.2282 }
>>>     _67 = .omp_data_i_59->sumD.2270;
>>>     # VUSE <.MEM_8>
>>>     _68 = *_67;
>>>
>>>     _70 = _66 + _68;
>>>
>>>     # .MEM_71 = VDEF <.MEM_8>
>>>     *_67 = _70;
>>> ...
>>>
>>> In order for pass_fre to respect the kernels region boundaries, I've added a
>>> change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
>>>
>>> Bootstrapped and reg-tested as before.
>>>
>>> OK for trunk?
>>
>> Committed to gomp-4_0-branch in r222279:
>>
>> commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
>> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Apr 21 19:37:19 2015 +0000
>>
>>      Expand oacc kernels after pass_fre
>>
>>      	gcc/
>>      	* omp-low.c: Include gimple-pretty-print.h.
>>      	(release_first_vuse_in_edge_dest): New function.
>>      	(expand_omp_target): When not in ssa, don't split off oacc kernels
>>      	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>>      	expanssion, and add GOACC_kernels_internal call.
>>      	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>>      	into GOACC_kernels call.  Handle ssa-code.
>>      	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>>      	properties_provided field.
>>      	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>>      	cfun->curr_properties tentatively.
>>      	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>>      	todo_flags_finish field.
>>      	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>>      	execute_expand_omp.
>>      	(gimple_stmt_ssa_operand_references_var_p)
>>      	(gimple_stmt_omp_data_i_init_p): New function.
>>      	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>>      	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>>      	pass_expand_omp_ssa after pass_all_early_optimizations.
>>      	* tree-ssa-ccp.c: Include omp-low.h.
>>      	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>>      	conservatively.
>>      	* tree-ssa-forwprop.c: Include omp-low.h.
>>      	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>>      	* tree-ssa-sccvn.c: Include omp-low.h.
>>      	(visit_use): Handle .omp_data_i init conservatively.
>>      	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>>      	functions.
>>
>>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
>> ---
>>   gcc/ChangeLog.gomp      |   30 +++++++
>>   gcc/cgraph.c            |    9 ++
>>   gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
>>   gcc/omp-low.h           |    1 +
>>   gcc/passes.def          |    2 +
>>   gcc/tree-ssa-ccp.c      |    6 ++
>>   gcc/tree-ssa-forwprop.c |    4 +-
>>   gcc/tree-ssa-sccvn.c    |    4 +-
>>   8 files changed, 257 insertions(+), 13 deletions(-)
>>
>> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
>> index 7885189..1f86160 100644
>> --- gcc/ChangeLog.gomp
>> +++ gcc/ChangeLog.gomp
>> @@ -1,5 +1,35 @@
>>   2015-04-21  Tom de Vries  <tom@codesourcery.com>
>>
>> +	* omp-low.c: Include gimple-pretty-print.h.
>> +	(release_first_vuse_in_edge_dest): New function.
>> +	(expand_omp_target): When not in ssa, don't split off oacc kernels
>> +	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>> +	expanssion, and add GOACC_kernels_internal call.
>> +	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>> +	into GOACC_kernels call.  Handle ssa-code.
>> +	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>> +	properties_provided field.
>> +	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>> +	cfun->curr_properties tentatively.
>> +	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>> +	todo_flags_finish field.
>> +	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>> +	execute_expand_omp.
>> +	(gimple_stmt_ssa_operand_references_var_p)
>> +	(gimple_stmt_omp_data_i_init_p): New function.
>> +	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>> +	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>> +	pass_expand_omp_ssa after pass_all_early_optimizations.
>> +	* tree-ssa-ccp.c: Include omp-low.h.
>> +	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>> +	conservatively.
>> +	* tree-ssa-forwprop.c: Include omp-low.h.
>> +	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>> +	* tree-ssa-sccvn.c: Include omp-low.h.
>> +	(visit_use): Handle .omp_data_i init conservatively.
>> +	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>> +	functions.
>> +
>>   	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
>>   	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
>>   	DEF_ATTR_TREE_LIST.
>> diff --git gcc/cgraph.c gcc/cgraph.c
>> index e099856..c608d7e 100644
>> --- gcc/cgraph.c
>> +++ gcc/cgraph.c
>> @@ -1706,6 +1706,15 @@ release_function_body (tree decl)
>>   void
>>   cgraph_node::release_body (bool keep_arguments)
>>   {
>> +  /* The omp-expansion of the oacc kernels directive is post-poned till after
>> +     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
>> +     release the body of the offload function, is run before omp_expand_target
>> +     can process the oacc kernels directive,  and omp_expand_target would crash
>> +     trying to access the body.  This snippet works around this problem.
>> +     FIXME: This should probably be fixed in a different way.  */
>> +  if (offloadable)
>> +    return;
>> +
>>     ipa_transforms_to_apply.release ();
>>     if (!used_as_abstract_origin && symtab->state != PARSING)
>>       {
>> diff --git gcc/omp-low.c gcc/omp-low.c
>> index 4134f3d..16d9a5e 100644
>> --- gcc/omp-low.c
>> +++ gcc/omp-low.c
>> @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "context.h"
>>   #include "lto-section-names.h"
>>   #include "gomp-constants.h"
>> +#include "gimple-pretty-print.h"
>>
>>
>>   /* Lowering of OMP parallel and workshare constructs proceeds in two
>> @@ -5353,6 +5354,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
>>       }
>>   }
>>
>> +static void
>> +release_first_vuse_in_edge_dest (edge e)
>
> All functions need a comment with documentation.
>

Fixed and committed (ommitting patch as trivial).

>> +{
>> +  gimple_stmt_iterator i;
>> +  basic_block bb = e->dest;
>> +
>> +  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
>> +    {
>> +      gimple phi = gsi_stmt (i);
>> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
>> +
>> +      if (!virtual_operand_p (arg))
>> +	continue;
>> +
>> +      mark_virtual_operand_for_renaming (arg);
>> +      return;
>> +    }
>> +
>> +  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
>> +    {
>> +      gimple stmt = gsi_stmt (i);
>> +      if (gimple_vuse (stmt) == NULL_TREE)
>> +	continue;
>> +
>> +      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
>> +      return;
>> +    }
>> +}
>> +
>>   /* Expand the OpenMP parallel or task directive starting at REGION.  */
>>
>>   static void
>> @@ -8770,8 +8800,11 @@ expand_omp_target (struct omp_region *region)
>>     gimple stmt;
>>     edge e;
>>     bool offloaded, data_region;
>> +  bool do_emit_library_call = true;
>> +  bool do_splitoff = true;
>>
>>     entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
>> +
>>     new_bb = region->entry;
>>
>>     offloaded = is_gimple_omp_offloaded (entry_stmt);
>> @@ -8804,12 +8837,48 @@ expand_omp_target (struct omp_region *region)
>>     /* Supported by expand_omp_taskreg, but not here.  */
>>     if (child_cfun != NULL)
>>       gcc_checking_assert (!child_cfun->cfg);
>> -  gcc_checking_assert (!gimple_in_ssa_p (cfun));
>>
>>     entry_bb = region->entry;
>>     exit_bb = region->exit;
>>
>> -  if (offloaded)
>> +  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
>> +    {
>> +      if (!gimple_in_ssa_p (cfun))
>> +	{
>> +	  /* We need to do analysis and optimizations on the kernels region
>> +	     before splitoff.  Since that's hard to do on low gimple, we
>> +	     postpone the splitoff until we're in SSA.
>> +	     However, we do the emit of the corresponding function call already,
>> +	     in order to keep the arguments of the call alive until the
>> +	     splitoff.
>> +	     Since at this point the function that is called is empty, we can
>> +	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
>> +	     some of it's function arguments as non-escaping, so it acts less
>> +	     as an optimization barrier.  */
>> +	  do_splitoff = false;
>> +	  cfun->curr_properties &= ~PROP_gimple_eomp;
>> +	}
>> +      else
>> +	{
>> +	  /* Don't emit the library call.  We've already done that.  */
>> +	  do_emit_library_call = false;
>> +	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
>> +	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
>> +	     split off, we can no longer regard the omp_data_array reference as
>> +	     non-escaping.  */
>> +	  gsi = gsi_last_bb (entry_bb);
>> +	  gsi_prev (&gsi);
>> +	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
>> +	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
>> +	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
>> +	  gimple_call_set_fndecl (call, fndecl);
>> +	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
>> +	  gimple_call_reset_alias_info (call);
>> +	}
>> +    }
>> +
>> +  if (offloaded
>> +      && do_splitoff)
>>       {
>>         unsigned srcidx, dstidx, num;
>>
>> @@ -8831,7 +8900,7 @@ expand_omp_target (struct omp_region *region)
>>   	{
>>   	  basic_block entry_succ_bb = single_succ (entry_bb);
>>   	  gimple_stmt_iterator gsi;
>> -	  tree arg;
>> +	  tree arg, narg;
>>   	  gimple tgtcopy_stmt = NULL;
>>   	  tree sender = TREE_VEC_ELT (data_arg, 0);
>>
>> @@ -8861,8 +8930,27 @@ expand_omp_target (struct omp_region *region)
>>   	  gcc_assert (tgtcopy_stmt != NULL);
>>   	  arg = DECL_ARGUMENTS (child_fn);
>>
>> -	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
>> -	  gsi_remove (&gsi, true);
>> +	  if (!gimple_in_ssa_p (cfun))
>> +	    {
>> +	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
>> +	      gsi_remove (&gsi, true);
>> +	    }
>> +	  else
>> +	    {
>> +	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
>> +			  == arg);
>> +
>> +	      /* If we are in ssa form, we must load the value from the default
>> +		 definition of the argument.  That should not be defined now,
>> +		 since the argument is not used uninitialized.  */
>> +	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
>> +	      narg = make_ssa_name (arg, gimple_build_nop ());
>> +	      set_ssa_default_def (cfun, arg, narg);
>> +	      /* ?? Is setting the subcode really necessary ??  */
>> +	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
>> +	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
>> +	      update_stmt (tgtcopy_stmt);
>> +	    }
>>   	}
>>
>>         /* Declare local variables needed in CHILD_CFUN.  */
>> @@ -8905,11 +8993,23 @@ expand_omp_target (struct omp_region *region)
>>   	  stmt = gimple_build_return (NULL);
>>   	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
>>   	  gsi_remove (&gsi, true);
>> +
>> +	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
>> +	     which is about to be split off.  Mark the vdef for renaming.  */
>> +	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
>>   	}
>>
>>         /* Move the offloading region into CHILD_CFUN.  */
>>
>> -      block = gimple_block (entry_stmt);
>> +      if (gimple_in_ssa_p (cfun))
>> +	{
>> +	  init_tree_ssa (child_cfun);
>> +	  init_ssa_operands (child_cfun);
>> +	  child_cfun->gimple_df->in_ssa_p = true;
>> +	  block = NULL_TREE;
>> +	}
>> +      else
>> +	block = gimple_block (entry_stmt);
>>
>>         new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
>>         if (exit_bb)
>> @@ -8969,9 +9069,18 @@ expand_omp_target (struct omp_region *region)
>>   	  if (changed)
>>   	    cleanup_tree_cfg ();
>>   	}
>> +      if (gimple_in_ssa_p (cfun))
>> +	update_ssa (TODO_update_ssa);
>>         pop_cfun ();
>>       }
>>
>> +  if (!do_emit_library_call)
>> +    {
>> +      if (gimple_in_ssa_p (cfun))
>> +	update_ssa (TODO_update_ssa_only_virtuals);
>> +      return;
>> +    }
>> +
>>     /* Emit a library call to launch the offloading region, or do data
>>        transfers.  */
>>     tree t1, t2, t3, t4, device, cond, c, clauses;
>> @@ -8993,7 +9102,7 @@ expand_omp_target (struct omp_region *region)
>>         start_ix = BUILT_IN_GOACC_PARALLEL;
>>         break;
>>       case GF_OMP_TARGET_KIND_OACC_KERNELS:
>> -      start_ix = BUILT_IN_GOACC_KERNELS;
>> +      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
>>         break;
>>       case GF_OMP_TARGET_KIND_OACC_DATA:
>>         start_ix = BUILT_IN_GOACC_DATA_START;
>> @@ -9128,6 +9237,7 @@ expand_omp_target (struct omp_region *region)
>>       case BUILT_IN_GOACC_DATA_START:
>>       case BUILT_IN_GOACC_ENTER_EXIT_DATA:
>>       case BUILT_IN_GOACC_KERNELS:
>> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>>       case BUILT_IN_GOACC_PARALLEL:
>>       case BUILT_IN_GOACC_UPDATE:
>>         break;
>> @@ -9146,6 +9256,7 @@ expand_omp_target (struct omp_region *region)
>>       case BUILT_IN_GOMP_TARGET_UPDATE:
>>         break;
>>       case BUILT_IN_GOACC_KERNELS:
>> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>>       case BUILT_IN_GOACC_PARALLEL:
>>         {
>>   	tree t_num_gangs, t_num_workers, t_vector_length;
>> @@ -9249,6 +9360,8 @@ expand_omp_target (struct omp_region *region)
>>         gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
>>         gsi_remove (&gsi, true);
>>       }
>> +  if (gimple_in_ssa_p (cfun))
>> +    update_ssa (TODO_update_ssa_only_virtuals);
>>   }
>>
>>
>> @@ -9503,7 +9616,7 @@ const pass_data pass_data_expand_omp =
>>     OPTGROUP_NONE, /* optinfo_flags */
>>     TV_NONE, /* tv_id */
>>     PROP_gimple_any, /* properties_required */
>> -  PROP_gimple_eomp, /* properties_provided */
>> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
>>     0, /* properties_destroyed */
>>     0, /* todo_flags_start */
>>     0, /* todo_flags_finish */
>> @@ -9517,12 +9630,14 @@ public:
>>     {}
>>
>>     /* opt_pass methods: */
>> -  virtual unsigned int execute (function *)
>> +  virtual unsigned int execute (function *fun)
>>       {
>>         bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
>>   		    || flag_openmp_simd != 0)
>>   		   && !seen_error ());
>>
>> +      fun->curr_properties |= PROP_gimple_eomp;
>> +
>>         /* This pass always runs, to provide PROP_gimple_eomp.
>>   	 But often, there is nothing to do.  */
>>         if (!gate)
>> @@ -9553,7 +9668,8 @@ const pass_data pass_data_expand_omp_ssa =
>>     PROP_gimple_eomp, /* properties_provided */
>>     0, /* properties_destroyed */
>>     0, /* todo_flags_start */
>> -  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
>> +  TODO_cleanup_cfg | TODO_rebuild_alias
>> +  | TODO_remove_unused_locals, /* todo_flags_finish */
>>   };
>>
>>   class pass_expand_omp_ssa : public gimple_opt_pass
>> @@ -9568,7 +9684,48 @@ public:
>>       {
>>         return !(fun->curr_properties & PROP_gimple_eomp);
>>       }
>> -  virtual unsigned int execute (function *) { return execute_expand_omp (); }
>> +  virtual unsigned int execute (function *)
>
> Please move this out of the class body.
>

Fixed and committed (ommitting patch as trivial).

>> +    {
>> +      unsigned res = execute_expand_omp ();
>> +
>> +      /* After running pass_expand_omp_ssa to expand the oacc kernels
>> +	 directive, we are left in the original function with anonymous
>> +	 SSA_NAMEs, with a defining statement that has been deleted.  This
>> +	 pass finds those SSA_NAMEs and releases them.
>> +	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
>> +      unsigned int i;
>> +      for (i = 1; i < num_ssa_names; ++i)
>> +	{
>> +	  tree name = ssa_name (i);
>> +	  if (name == NULL_TREE)
>> +	    continue;
>> +
>> +	  gimple stmt = SSA_NAME_DEF_STMT (name);
>> +	  bool found = false;
>> +
>> +	  ssa_op_iter op_iter;
>> +	  def_operand_p def_p;
>> +	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
>> +	    {
>> +	      tree def = DEF_FROM_PTR (def_p);
>> +	      if (def == name)
>> +		{
>> +		  found = true;
>> +		  break;
>> +		}
>> +	    }
>> +
>> +	  if (!found)
>> +	    {
>> +	      if (dump_file)
>> +		fprintf (dump_file, "Released dangling ssa name %u\n", i);
>> +	      release_ssa_name (name);
>> +	    }
>> +	}
>> +
>> +      return res;
>> +    }
>> +  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
>>
>>   }; // class pass_expand_omp_ssa
>>
>> @@ -13728,4 +13885,39 @@ omp_finish_file (void)
>>       }
>>   }
>>
>> +static bool
>> +gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
>> +					  unsigned int nr_varnames,
>> +					  unsigned int flags)
>
> Missing comment.
>
>> +{
>> +  tree use;
>> +  ssa_op_iter iter;
>> +  const char *s;
>> +
>> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
>> +    {
>> +      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
>> +	continue;
>> +      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
>> +
>> +      unsigned int i;
>> +      for (i = 0; i < nr_varnames; ++i)
>> +	if (strcmp (varnames[i], s) == 0)
>> +	  return true;
>
> Eh?  This surely is crap - you can't ever have semantics depend on
> identifiers.
>
>> +    }
>> +
>> +  return false;
>> +}
>> +
>> +/* Return true if STMT is .omp_data_i init.  */
>> +
>> +bool
>> +gimple_stmt_omp_data_i_init_p (gimple stmt)
>> +{
>> +  const char *varnames[] = { ".omp_data_i" };
>> +  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
>> +  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
>> +						   SSA_OP_DEF);
>
> So no - this isn't possible this way and I suspect it's not reliable
> anyway.
>

Rewritten gimple_stmt_omp_data_i_init_p to not use identifier names, 
attached and committed.

>> +}
>> +
>>   #include "gt-omp-low.h"
>> diff --git gcc/omp-low.h gcc/omp-low.h
>> index 8a4052e..3d30c3b 100644
>> --- gcc/omp-low.h
>> +++ gcc/omp-low.h
>> @@ -28,6 +28,7 @@ extern void free_omp_regions (void);
>>   extern tree omp_reduction_init (tree, tree);
>>   extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
>>   extern void omp_finish_file (void);
>> +extern bool gimple_stmt_omp_data_i_init_p (gimple);
>>
>>   extern GTY(()) vec<tree, va_gc> *offload_funcs;
>>   extern GTY(()) vec<tree, va_gc> *offload_vars;
>> diff --git gcc/passes.def gcc/passes.def
>> index 2bc5dcd..db0dd18 100644
>> --- gcc/passes.def
>> +++ gcc/passes.def
>> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>>   	     execute TODO_rebuild_alias at this point.  */
>>   	  NEXT_PASS (pass_build_ealias);
>>   	  NEXT_PASS (pass_fre);
>> +	  NEXT_PASS (pass_expand_omp_ssa);
>>   	  NEXT_PASS (pass_merge_phi);
>>   	  NEXT_PASS (pass_cd_dce);
>>   	  NEXT_PASS (pass_early_ipa_sra);
>> @@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>>   	      late.  */
>>   	  NEXT_PASS (pass_split_functions);
>>         POP_INSERT_PASSES ()
>> +      NEXT_PASS (pass_expand_omp_ssa);
>>         NEXT_PASS (pass_release_ssa_names);
>>         NEXT_PASS (pass_rebuild_cgraph_edges);
>>         NEXT_PASS (pass_inline_parameters);
>> diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
>> index d45a3ff..46fe1c7 100644
>> --- gcc/tree-ssa-ccp.c
>> +++ gcc/tree-ssa-ccp.c
>> @@ -172,6 +172,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "wide-int-print.h"
>>   #include "builtins.h"
>>   #include "tree-chkp.h"
>> +#include "omp-low.h"
>>
>>
>>   /* Possible lattice values.  */
>> @@ -796,6 +797,9 @@ surely_varying_stmt_p (gimple stmt)
>>         && gimple_code (stmt) != GIMPLE_CALL)
>>       return true;
>>
>> +  if (gimple_stmt_omp_data_i_init_p (stmt))
>> +    return true;
>> +
>
> No.
>
>>     return false;
>>   }
>>
>> @@ -2329,6 +2333,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
>>     switch (gimple_code (stmt))
>>       {
>>         case GIMPLE_ASSIGN:
>> +	if (gimple_stmt_omp_data_i_init_p (stmt))
>> +	  break;
>>           /* If the statement is an assignment that produces a single
>>              output value, evaluate its RHS to see if the lattice value of
>>              its output has changed.  */
>> diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
>> index d8db20a..554a5a5 100644
>> --- gcc/tree-ssa-forwprop.c
>> +++ gcc/tree-ssa-forwprop.c
>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "tree-cfgcleanup.h"
>>   #include "tree-into-ssa.h"
>>   #include "cfganal.h"
>> +#include "omp-low.h"
>>
>>   /* This pass propagates the RHS of assignment statements into use
>>      sites of the LHS of the assignment.  It's basically a specialized
>> @@ -2155,7 +2156,8 @@ pass_forwprop::execute (function *fun)
>>   	  tree lhs, rhs;
>>   	  enum tree_code code;
>>
>> -	  if (!is_gimple_assign (stmt))
>> +	  if (!is_gimple_assign (stmt)
>> +	      || gimple_stmt_omp_data_i_init_p (stmt))
>
> No.
>
>>   	    {
>>   	      gsi_next (&gsi);
>>   	      continue;
>> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
>> index e417a15..449a615 100644
>> --- gcc/tree-ssa-sccvn.c
>> +++ gcc/tree-ssa-sccvn.c
>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "ipa-ref.h"
>>   #include "plugin-api.h"
>>   #include "cgraph.h"
>> +#include "omp-low.h"
>>
>>   /* This algorithm is based on the SCC algorithm presented by Keith
>>      Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>>       {
>>         if (gimple_code (stmt) == GIMPLE_PHI)
>>   	changed = visit_phi (stmt);
>> -      else if (gimple_has_volatile_ops (stmt))
>> +      else if (gimple_has_volatile_ops (stmt)
>> +	       || gimple_stmt_omp_data_i_init_p (stmt))
>
> No.
>
> What is the intent of these changes?
>

These are changes to handle the kernels region conservatively, in order 
to not undo the omp-lowering before getting to the oacc-parloops pass.

Thanks,
- Tom

Comments

Richard Biener June 8, 2015, 7:25 a.m. UTC | #1
On Thu, 4 Jun 2015, Tom de Vries wrote:

> > >   	    {
> > >   	      gsi_next (&gsi);
> > >   	      continue;
> > > diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
> > > index e417a15..449a615 100644
> > > --- gcc/tree-ssa-sccvn.c
> > > +++ gcc/tree-ssa-sccvn.c
> > > @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
> > >   #include "ipa-ref.h"
> > >   #include "plugin-api.h"
> > >   #include "cgraph.h"
> > > +#include "omp-low.h"
> > > 
> > >   /* This algorithm is based on the SCC algorithm presented by Keith
> > >      Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> > > @@ -3542,7 +3543,8 @@ visit_use (tree use)
> > >       {
> > >         if (gimple_code (stmt) == GIMPLE_PHI)
> > >   	changed = visit_phi (stmt);
> > > -      else if (gimple_has_volatile_ops (stmt))
> > > +      else if (gimple_has_volatile_ops (stmt)
> > > +	       || gimple_stmt_omp_data_i_init_p (stmt))
> > 
> > No.
> > 
> > What is the intent of these changes?
> > 
> 
> These are changes to handle the kernels region conservatively, in order to not
> undo the omp-lowering before getting to the oacc-parloops pass.

Still it feels too much like the MPX mistake (maintainance cost and
compile-time cost).  How can any pass "undo" omp-lowering?

Richard.
Tom de Vries June 19, 2015, 7:56 a.m. UTC | #2
On 08/06/15 09:25, Richard Biener wrote:
> On Thu, 4 Jun 2015, Tom de Vries wrote:
>
>>>>    	    {
>>>>    	      gsi_next (&gsi);
>>>>    	      continue;
>>>> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
>>>> index e417a15..449a615 100644
>>>> --- gcc/tree-ssa-sccvn.c
>>>> +++ gcc/tree-ssa-sccvn.c
>>>> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>    #include "ipa-ref.h"
>>>>    #include "plugin-api.h"
>>>>    #include "cgraph.h"
>>>> +#include "omp-low.h"
>>>>
>>>>    /* This algorithm is based on the SCC algorithm presented by Keith
>>>>       Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
>>>> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>>>>        {
>>>>          if (gimple_code (stmt) == GIMPLE_PHI)
>>>>    	changed = visit_phi (stmt);
>>>> -      else if (gimple_has_volatile_ops (stmt))
>>>> +      else if (gimple_has_volatile_ops (stmt)
>>>> +	       || gimple_stmt_omp_data_i_init_p (stmt))
>>>
>>> No.
>>>
>>> What is the intent of these changes?
>>>
>>
>> These are changes to handle the kernels region conservatively, in order to not
>> undo the omp-lowering before getting to the oacc-parloops pass.
>
> Still it feels too much like the MPX mistake (maintainance cost and
> compile-time cost).  How can any pass "undo" omp-lowering?
>

I'm talking about the rewriting of the variables in terms of 
.omp_data_i. Passes like copy_prop and fre undo this rewriting, and 
propagate the variables from outside the kernels region back into the 
kernels region, eliminating .omp_data_i.

Thanks,
- Tom
diff mbox

Patch

Rewrite gimple_stmt_omp_data_i_init_p

2015-06-03  Tom de Vries  <tom@codesourcery.com>

	* omp-low.c (gimple_stmt_ssa_operand_references_var_p): Remove function.
	(gimple_stmt_omp_data_i_init_p): Rewrite without
	gimple_stmt_ssa_operand_references_var_p.
---
 gcc/omp-low.c | 59 ++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f847d5c..0b31992 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15368,39 +15368,40 @@  omp_finish_file (void)
     }
 }
 
-static bool
-gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
-					  unsigned int nr_varnames,
-					  unsigned int flags)
-{
-  tree use;
-  ssa_op_iter iter;
-  const char *s;
-
-  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
-    {
-      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
-	continue;
-      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
-
-      unsigned int i;
-      for (i = 0; i < nr_varnames; ++i)
-	if (strcmp (varnames[i], s) == 0)
-	  return true;
-    }
-
-  return false;
-}
-
-/* Return true if STMT is .omp_data_i init.  */
+/* Return true if STMT is copy assignment .omp_data_i = &.omp_data_arr.  */
 
 bool
 gimple_stmt_omp_data_i_init_p (gimple stmt)
 {
-  const char *varnames[] = { ".omp_data_i" };
-  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
-  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
-						   SSA_OP_DEF);
+  /* Extract obj from stmt 'a = &obj.  */
+  if (!gimple_assign_cast_p (stmt)
+      && !gimple_assign_single_p (stmt))
+    return false;
+  tree rhs = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (rhs) != ADDR_EXPR)
+    return false;
+  tree obj = TREE_OPERAND (rhs, 0);
+
+  /* Check that the last statement in the preceding bb is an oacc kernels
+     stmt.  */
+  basic_block bb = gimple_bb (stmt);
+  if (!single_pred_p (bb))
+    return false;
+  gimple last = last_stmt (single_pred (bb));
+  if (last == NULL
+      || gimple_code (last) != GIMPLE_OMP_TARGET)
+    return false;
+  gomp_target *kernels = as_a <gomp_target *> (last);
+  if (gimple_omp_target_kind (kernels)
+      != GF_OMP_TARGET_KIND_OACC_KERNELS)
+    return false;
+
+  /* Get omp_data_arr from the oacc kernels stmt.  */
+  tree data_arg = gimple_omp_target_data_arg (kernels);
+  tree omp_data_arr = TREE_VEC_ELT (data_arg, 0);
+
+  /* If obj is omp_data_arr, we've found the .omp_data_i init statement.  */
+  return operand_equal_p (obj, omp_data_arr, 0);
 }
 
 namespace {
-- 
1.9.1