diff mbox

[hsa,5/10] OpenMP lowering/expansion changes (gridification)

Message ID 20151207112243.GF24234@virgil.suse.cz
State New
Headers show

Commit Message

Martin Jambor Dec. 7, 2015, 11:22 a.m. UTC
Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.  The plan is to make that class quite a big
bigger, but only the following is ready for submission now.

Basically, whenever the compiler configured for HSAIL generation
encounters the following pattern:

  #pragma omp target
  #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional
  #pragma omp distribute parallel for firstprivate(n,j) private(i) other_sharing_clauses()
    for (i = j + 1; i < n; i += 3)
      some_loop_body


it creates a copy of the entire target body and expands it slightly
differently for concurrent execution on a GPU.  Note that both teams
and distribute constructs are mandatory.  Moreover, currently the
distribute has to be in a combined statement with the inner for
construct.  And there are quite a few other restrictions which I hope
to alleviate over the next year, most notably reductions and collapse
clause now prevent gridification (see the new function
target_follows_gridifiable_pattern to find out what exactly the
restrictions are).

The first phase of the "gridification" process is run before omp
"scanning" phase.  We look for the pattern above, and if we encounter
one, we copy its entire body into a new gimple statement
GIMPLE_OMP_GPUKERNEL.  Within it, we mark the teams, distribute and
parallel constructs with a new flag "kernel_phony."  This flag will
then make OMP lowering phase process their sharing clauses like usual,
but the statements representing the constructs will be removed at
lowering (and thus will never be expanded).  The resulting wasteful
repackaging of data is nicely cleaned by our optimizers even at -O1.

At expansion time, we identify gomp_target statements with a kernel
and expand the kernel into a special function, with the loop
represented by the GPU grid and not control flow.  Afterwards, the
normal body of the target is expanded as usual.  Finally, we need to
take the grid dimensions stored within new fields of the target
statement by the first phase, store in a structure and pass them in a
device-specific argument to GOMP_target_ext.

The patch thus also implements the compiler part of device-specific
target arguments as discussed on the mailing list an IRC.

Originally, when I started with the above pattern matching, I did not
allow any other gimple statements in between the respective omp
constructs.  That however proved to be too restrictive for two
reasons.  First, statements in pre-bodies of both distribute and for
loops needed to be accounted for when calculating the kernel grid size
(which is done before the target statement itself) and second, Fortran
parameter dereferences happily result in interleaving statements when
there were none in the user source code.

Therefore, I allow register-type stores to local non-addressable
variables in pre-bodies and also in between the OMP constructs.  All
of them are copied in front of the target statement and either used
for grid size calculation or removed as useless by later
optimizations.

I hope that eventually I managed to write the gridification in a way
that interferes very little with the rest of the OMP pipeline and yet
only re-implement the bare necessary minimum of functionality that is
already there.  Any feedback is of course still very welcome.

Thanks,

Martin


2015-12-04  Martin Jambor  <mjambor@suse.cz>

	* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
	* fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
	* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GPUKERNEL.
	* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
	GF_OMP_FOR_KIND_KERNEL_BODY.
	(dump_gimple_omp_block): Also handle GIMPLE_OMP_GPUKERNEL.
	(pp_gimple_stmt_1): Likewise.
	* gimple-walk.c (walk_gimple_stmt): Likewise.
	* gimple.c (gimple_build_omp_gpukernel): New function.
	(gimple_copy): Also handle GIMPLE_OMP_GPUKERNEL.
	* gimple.def (GIMPLE_OMP_TEAMS): Moved into its own layout.
	(GIMPLE_OMP_GPUKERNEL): New.
	* gimple.h (gf_mask): Added GF_OMP_FOR_KIND_KERNEL_BODY.
	(gomp_for): New field kernel_phony.
	(gimple_statement_omp_parallel_layout): Likewise.
	(gimple_statement_omp_single_layout): Updated comments.
	(gomp_teams): New field kernel_phony.
	(gimple_build_omp_gpukernel): Declare.
	(gimple_has_substatements): Also handle GIMPLE_OMP_GPUKERNEL.
	(gimple_omp_for_kernel_phony): New.
	(gimple_omp_for_set_kernel_phony): Likewise.
	(gimple_omp_parallel_kernel_phony): Likewise.
	(gimple_omp_parallel_set_kernel_phony): Likewise.
	(gimple_omp_teams_kernel_phony): Likewise.
	(gimple_omp_teams_set_kernel_phony): Likewise.
	(CASE_GIMPLE_OMP): Also handle GIMPLE_OMP_GPUKERNEL.
	* gsstruct.def (GSS_OMP_TEAMS_LAYOUT): New.
	* omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New.
	(BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise.
	(BUILT_IN_GOMP_TARGET): Updated type.
	* omp-low.c: Include symbol-summary.h, hsa.h and params.h.
	(adjust_for_condition): New function.
	(get_omp_for_step_from_incr): Likewise.
	(extract_omp_for_data): Moved parts to adjust_for_condition and
	get_omp_for_step_from_incr.
	(build_outer_var_ref): Handle GIMPLE_OMP_GPUKERNEL.
	(fixup_child_record_type): Bail out if receiver_decl is NULL.
	(scan_sharing_clauses): Handle OMP_CLAUSE__GRIDDIM_.
	(scan_omp_parallel): Do not create child functions for phony
	constructs.
	(check_omp_nesting_restrictions): Handle GIMPLE_OMP_GPUKERNEL.
	(scan_omp_1_op): Checking assert we are not remapping to
	ERROR_MARK.  Also also handle GIMPLE_OMP_GPUKERNEL.
	(region_needs_kernel_p): New function.
	(expand_parallel_call): Register apprpriate parallel child
	functions as HSA kernels.
	(kernel_dim_array_type, kernel_lattrs_dimnum_decl): New variables.
	(kernel_lattrs_grid_decl, kernel_lattrs_group_decl): Likewise.
	(kernel_launch_attributes_type): Likewise.
	(create_kernel_launch_attr_types): New function.
	(insert_store_range_dim): Likewise.
	(get_kernel_launch_attributes): Likewise.
	(get_target_argument_identifier_1): Likewise.
	(get_target_argument_identifier): Likewise.
	(get_target_argument_value): Likewise.
	(get_target_arguments): Likewise.
	(expand_omp_target): Call get_target_arguments instead of looking
	up for teams and thread limit.
	(expand_omp_for_kernel): New function.
	(arg_decl_map): New type.
	(remap_kernel_arg_accesses): New function.
	(expand_target_kernel_body): New function.
	(expand_omp): Call it.
	(lower_omp_for): Do not emit phony constructs.
	(lower_omp_taskreg): Do not emit phony constructs but create for them
	a temporary variable receiver_decl.
	(lower_omp_taskreg): Do not emit phony constructs.
	(lower_omp_teams): Likewise.
	(lower_omp_gpukernel): New function.
	(lower_omp_1): Call it.
	(reg_assignment_to_local_var_p): New function.
	(seq_only_contains_local_assignments): Likewise.
	(find_single_omp_among_assignments_1): Likewise.
	(find_single_omp_among_assignments): Likewise.
	(find_ungridifiable_statement): Likewise.
	(target_follows_gridifiable_pattern): Likewise.
	(remap_prebody_decls): Likewise.
	(copy_leading_local_assignments): Likewise.
	(process_kernel_body_copy): Likewise.
	(attempt_target_gridification): Likewise.
	(create_target_gpukernel_stmt): Likewise.
	(create_target_gpukernels): Likewise.
	(execute_lower_omp): Call create_target_gpukernels.
	(make_gimple_omp_edges): Handle GIMPLE_OMP_GPUKERNEL.
	* tree-core.h (omp_clause_code): Added OMP_CLAUSE__GRIDDIM_.
	(tree_omp_clause): Added union field dimension.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.c (omp_clause_num_ops): Added number of arguments of
	OMP_CLAUSE__GRIDDIM_.
	(omp_clause_code_name): Added name of OMP_CLAUSE__GRIDDIM_.
	(walk_tree_1): Handle OMP_CLAUSE__GRIDDIM_.
	* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): New.
	(OMP_CLAUSE_SET_GRIDDIM_DIMENSION): Likewise.
	(OMP_CLAUSE_GRIDDIM_SIZE): Likewise.
	(OMP_CLAUSE_GRIDDIM_GROUP): Likewise.

Comments

Jakub Jelinek Dec. 9, 2015, 1:19 p.m. UTC | #1
On Mon, Dec 07, 2015 at 12:22:43PM +0100, Martin Jambor wrote:
> it creates a copy of the entire target body and expands it slightly
> differently for concurrent execution on a GPU.  Note that both teams
> and distribute constructs are mandatory.  Moreover, currently the
> distribute has to be in a combined statement with the inner for
> construct.  And there are quite a few other restrictions which I hope

The standard calls those composite constructs, and I bet for gridification
you want that restriction always, without composite distribute parallel for
there are two different unrelated loops.

> 	* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
> 	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
> 	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
> 	* fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
> 	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
> 	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.

Fortran has its own ChangeLog file.

> @@ -556,9 +558,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>  		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>  		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> -		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
> +		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> +		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)

There shouldn't be an empty line in between this DEF_FUNCTION_TYPE_9 and the
previous one.

> @@ -221,9 +223,9 @@ DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>  		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>  		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
>  		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
>  
>  DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
>  		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,

Ditto.

> --- a/gcc/gimple.def
> +++ b/gcc/gimple.def
> @@ -369,13 +369,17 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT)
>  /* GIMPLE_OMP_TEAMS <BODY, CLAUSES> represents #pragma omp teams
>     BODY is the sequence of statements inside the single section.
>     CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
> -DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
> +DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT)

Why?

> +/* GIMPLE_OMP_GPUKERNEL <BODY> represents a parallel loop lowered for execution
> +   on a GPU.  It is an artificial statement created by omp lowering.  */
> +DEFGSCODE(GIMPLE_OMP_GPUKERNEL, "gimple_omp_gpukernel", GSS_OMP)

Why do you call it GPUKERNEL or KERNEL_BODY when you really mean gridified
body and gridified loop?  I mean, what is GPU specific about it?  PTX is
unlikely going to use that.  And kernel is a wide term.

> @@ -622,8 +623,14 @@ struct GTY((tag("GSS_OMP_FOR")))
>    /* [ WORD 11 ]
>       Pre-body evaluated before the loop body begins.  */
>    gimple_seq pre_body;
> +
> +  /* [ WORD 12 ]
> +     If set, this statement is part of a gridified kernel, its clauses need to
> +     be scanned and lowered but the statement should be discarded after
> +     lowering.  */
> +  bool kernel_phony;

Ugh no, flags should go into GF_OMP_*.

> @@ -643,6 +660,12 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
>    /* [ WORD 10 ]
>       Shared data argument.  */
>    tree data_arg;
> +
> +  /* [ WORD 11 ] */
> +  /* If set, this statement is part of a gridified kernel, its clauses need to
> +     be scanned and lowered but the statement should be discarded after
> +     lowering.  */
> +  bool kernel_phony;
>  };

Likewise.

As for omp-low.c changes, the file is already large enough that it would be
nice if it is easy to find out what routines are for gridification purposes
only, use some special prefix (grid_*, ompgrid_*, ...) for all such
functions?

> @@ -1761,6 +1786,8 @@ fixup_child_record_type (omp_context *ctx)
>  {
>    tree f, type = ctx->record_type;
>  
> +  if (!ctx->receiver_decl)
> +    return;

So when is receiver_decl NULL?

> @@ -2113,6 +2140,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	    }
>  	  break;
>  
> +	case OMP_CLAUSE__GRIDDIM_:
> +	  if (ctx->outer)
> +	    {
> +	      scan_omp_op (&OMP_CLAUSE_GRIDDIM_SIZE (c), ctx->outer);
> +	      scan_omp_op (&OMP_CLAUSE_GRIDDIM_GROUP (c), ctx->outer);

These should be OMP_CLAUSE__GRIDDIM__{SIZE,GROUP}.  See
OMP_CLAUSE__SIMDUID__DECL for another similar macro.

> @@ -6252,6 +6302,37 @@ gimple_build_cond_empty (tree cond)
>    return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE);
>  }
>  
> +/* Return true if a parallel REGION is within a declare target function or
> +   within a target region and is not a part of a gridified kernel.  */
> +
> +static bool
> +region_needs_kernel_p (struct omp_region *region)
> +{
> +  bool indirect = false;
> +  for (region = region->outer; region; region = region->outer)
> +    {
> +      if (region->type == GIMPLE_OMP_PARALLEL)
> +	indirect = true;
> +      else if (region->type == GIMPLE_OMP_TARGET)
> +	{
> +	  gomp_target *tgt_stmt;
> +	  tgt_stmt = as_a <gomp_target *> (last_stmt (region->entry));

	  gomp_target *tgt_stmt
	    = as_a <gomp_target *> (last_stmt (region->entry));
?

> +static GTY(()) tree kernel_dim_array_type;
> +static GTY(()) tree kernel_lattrs_dimnum_decl;
> +static GTY(()) tree kernel_lattrs_grid_decl;
> +static GTY(()) tree kernel_lattrs_group_decl;
> +static GTY(()) tree kernel_launch_attributes_type;

Turn this at least into either a struct or array of trees, so that it is not
5 separate GC roots?

> +  tree dim_arr_index_type;
> +  dim_arr_index_type = build_index_type (build_int_cst (integer_type_node, 2));

See above for formatting; even if you don't have the declaration
one line above it, putting = in 5th column of next line will be often
beneficial for the formatting:

> +  kernel_dim_array_type = build_array_type (uint32_type_node,
> +					    dim_arr_index_type);
> +
> +  kernel_launch_attributes_type = make_node (RECORD_TYPE);
> +  kernel_lattrs_dimnum_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
> +				       get_identifier ("ndim"),
> +				       uint32_type_node);
> +  DECL_CHAIN (kernel_lattrs_dimnum_decl) = NULL_TREE;
> +
> +  kernel_lattrs_grid_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
> +				     get_identifier ("grid_size"),
> +				     kernel_dim_array_type);
> +  DECL_CHAIN (kernel_lattrs_grid_decl) = kernel_lattrs_dimnum_decl;
> +  kernel_lattrs_group_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
> +				     get_identifier ("group_size"),
> +				     kernel_dim_array_type);
> +  DECL_CHAIN (kernel_lattrs_group_decl) = kernel_lattrs_grid_decl;
> +  finish_builtin_struct (kernel_launch_attributes_type,
> +			 "__gomp_kernel_launch_attributes",
> +			 kernel_lattrs_group_decl, NULL_TREE);

> +static tree
> +get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
> +{
> +  auto_vec <tree, 4> args;
> +  tree clauses = gimple_omp_target_clauses (tgt_stmt);
> +  tree t, c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS);
> +  if (c)
> +    t = OMP_CLAUSE_NUM_TEAMS_EXPR (c);
> +  else
> +    t = integer_minus_one_node;
> +  t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
> +				 GOMP_TARGET_ARG_NUM_TEAMS, t);
> +  args.quick_push (t);

This is what I've talked about in review of another patch.  num_teams
is int, for 32-bit targets trying to encode it into 16 bits is not going to
work.
> +
> +  c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT);
> +  if (c)
> +    t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c);
> +  else
> +    t = integer_minus_one_node;
> +  t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
> +				 GOMP_TARGET_ARG_THREAD_LIMIT, t);

Ditto.

> @@ -14872,6 +15392,14 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>    par_olist = NULL;
>    par_ilist = NULL;
>    par_rlist = NULL;
> +  bool phony_construct = is_a <gomp_parallel *> (stmt)
> +    && gimple_omp_parallel_kernel_phony (as_a <gomp_parallel *> (stmt));

I'm not a big fan of the is_a mess.  gimple_code (stmt) == GIMPLE_OMP_PARALLEL
is what is used elsewhere.

> +  if (phony_construct && ctx->record_type)
> +    {
> +      gcc_checking_assert (!ctx->receiver_decl);
> +      ctx->receiver_decl = create_tmp_var
> +	(build_reference_type (ctx->record_type), ".omp_rec");

Formatting.
> @@ -400,7 +401,8 @@ const char * const omp_clause_code_name[] =
>    "num_gangs",
>    "num_workers",
>    "vector_length",
> -  "tile"
> +  "tile",
> +  "griddim"

The clause is "_griddim_".

	Jakub
Thomas Schwinge Dec. 9, 2015, 4:24 p.m. UTC | #2
Hi!

I've been meaning to suggest this for some time already:

On Wed, 9 Dec 2015 14:19:30 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> As for omp-low.c changes, the file is already large enough that it would be
> nice if it is easy to find out what routines are for gridification purposes
> only, use some special prefix (grid_*, ompgrid_*, ...) for all such
> functions?

In addition to that, how about we split up gcc/omp-low.c into several
files?  Would it make sense (I have not yet looked in detail) to do so
along the borders of the several passes defined therein?  Or, can you
tell already that there would be too many cross-references between the
several files to make this infeasible?

I'd suggest to do this shortly before GCC 6 is released, so that
backports from trunk to gcc-6-branch will be easy.  (I assume we don't
have to care for gcc-5-branch backports too much any longer.)

    $ wc -l gcc/*.c | sort -r -n | head
      879881 total
       25770 gcc/dwarf2out.c
       19834 gcc/omp-low.c
       14419 gcc/fold-const.c
       14357 gcc/combine.c
       14003 gcc/tree.c
       11622 gcc/expr.c
       11610 gcc/gimplify.c
       10417 gcc/tree-vrp.c
       10328 gcc/var-tracking.c


Grüße
 Thomas
Bernd Schmidt Dec. 9, 2015, 5:23 p.m. UTC | #3
On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
>
> In addition to that, how about we split up gcc/omp-low.c into several
> files?  Would it make sense (I have not yet looked in detail) to do so
> along the borders of the several passes defined therein?  Or, can you
> tell already that there would be too many cross-references between the
> several files to make this infeasible?

It would be nice to get rid of all the code duplication in that file. 
That alone could reduce the size by quite a bit, and hopefully make it 
easier to read.

I suspect a split along the ompexp/omplow boundary would be quite easy 
to achieve.

> I'd suggest to do this shortly before GCC 6 is released, so that
> backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> have to care for gcc-5-branch backports too much any longer.)

I'll declare myself agnostic as to whether such a change is appropriate 
for gcc-6 at this stage. I guess it kind of depends on the specifics.


Bernd
Jakub Jelinek Dec. 10, 2015, 8:08 a.m. UTC | #4
On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> >
> >In addition to that, how about we split up gcc/omp-low.c into several
> >files?  Would it make sense (I have not yet looked in detail) to do so
> >along the borders of the several passes defined therein?  Or, can you
> >tell already that there would be too many cross-references between the
> >several files to make this infeasible?
> 
> It would be nice to get rid of all the code duplication in that file. That
> alone could reduce the size by quite a bit, and hopefully make it easier to
> read.

What exact code duplication do you mean?

> I suspect a split along the ompexp/omplow boundary would be quite easy to
> achieve.

Yeah, that might be the possible splitting boundary (have omp-low.c,
omp-exp.c).

> >I'd suggest to do this shortly before GCC 6 is released, so that
> >backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> >have to care for gcc-5-branch backports too much any longer.)
> 
> I'll declare myself agnostic as to whether such a change is appropriate for
> gcc-6 at this stage. I guess it kind of depends on the specifics.

Certainly.  On one side I'd say it is too late now in stage3, on the other
side when would be better time to do that, during stage1 people will have
more likely out of the tree branches with more changes (I'm aware we even
now have the HSA, OpenMP -> PTX and OpenACC branches).

So, if somebody wants to try that, we can see if the result would be
appropriate.

	Jakub
Bernd Schmidt Dec. 10, 2015, 11:26 a.m. UTC | #5
On 12/10/2015 09:08 AM, Jakub Jelinek wrote:
> On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
>> On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
>>>
>>> In addition to that, how about we split up gcc/omp-low.c into several
>>> files?  Would it make sense (I have not yet looked in detail) to do so
>>> along the borders of the several passes defined therein?  Or, can you
>>> tell already that there would be too many cross-references between the
>>> several files to make this infeasible?
>>
>> It would be nice to get rid of all the code duplication in that file. That
>> alone could reduce the size by quite a bit, and hopefully make it easier to
>> read.
>
> What exact code duplication do you mean?

Functions that are near-identical with slight differences, or which have 
large sections of identical code. scan_omp_task vs scan_omp_parallel, or 
the various expand_omp_for functions are examples.


Bernd
Jakub Jelinek Dec. 10, 2015, 11:34 a.m. UTC | #6
On Thu, Dec 10, 2015 at 12:26:10PM +0100, Bernd Schmidt wrote:
> On 12/10/2015 09:08 AM, Jakub Jelinek wrote:
> >On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> >>On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> >>>
> >>>In addition to that, how about we split up gcc/omp-low.c into several
> >>>files?  Would it make sense (I have not yet looked in detail) to do so
> >>>along the borders of the several passes defined therein?  Or, can you
> >>>tell already that there would be too many cross-references between the
> >>>several files to make this infeasible?
> >>
> >>It would be nice to get rid of all the code duplication in that file. That
> >>alone could reduce the size by quite a bit, and hopefully make it easier to
> >>read.
> >
> >What exact code duplication do you mean?
> 
> Functions that are near-identical with slight differences, or which have
> large sections of identical code. scan_omp_task vs scan_omp_parallel, or the

Even these two (quite short) have huge number of differences, so I'm not
100% sure it would be more readable if we had just one scan_omp_taskreg that
handled both.

> various expand_omp_for functions are examples.

I'm aware of some duplication in expand_omp_for_* functions, and some of the
obvious duplications were already moved to helper functions.  But in these
cases the number of differences is even significantly bigger too, so having
just one function that would handle all the different schedules would be far
less readable.  Perhaps we can add some small helpers to handle some little
pieces that repeat between the functions.

	Jakub
Nathan Sidwell Dec. 15, 2015, 6:28 p.m. UTC | #7
On 12/10/15 06:34, Jakub Jelinek wrote:

> I'm aware of some duplication in expand_omp_for_* functions, and some of the
> obvious duplications were already moved to helper functions.  But in these
> cases the number of differences is even significantly bigger too, so having
> just one function that would handle all the different schedules would be far
> less readable.  Perhaps we can add some small helpers to handle some little
> pieces that repeat between the functions.

I agree.  For instance, earlier openacc's loop expansion piggybacked onto the 
the two omp loop expanders.  I found it much cleaner to create a separate 
openacc loop expander.  There's so much stuff to juggle in each case, that 
combining all the variants into one function can lead to cognitive overload.

nathan
Thomas Schwinge April 8, 2016, 9:36 a.m. UTC | #8
Hi!

On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > >
> > >In addition to that, how about we split up gcc/omp-low.c into several
> > >files?  Would it make sense (I have not yet looked in detail) to do so
> > >along the borders of the several passes defined therein?  Or, can you
> > >tell already that there would be too many cross-references between the
> > >several files to make this infeasible?
> > 
> > It would be nice to get rid of all the code duplication in that file. That
> > alone could reduce the size by quite a bit, and hopefully make it easier to
> > read.
> 
> What exact code duplication do you mean?

(Has been discussed in the following.)  At this point, I do not intend to
work on any kinds of cleanup, but rather just the "mechanical" changes:

> > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > achieve.
> 
> Yeah, that might be the possible splitting boundary (have omp-low.c,
> omp-exp.c).

Right.  And possibly some kind of omp-simd.c, and omp-checking.c, and so
on, if feasible.  (I have not yet looked in detail.)

> > >I'd suggest to do this shortly before GCC 6 is released, so that
> > >backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> > >have to care for gcc-5-branch backports too much any longer.)
> > 
> > I'll declare myself agnostic as to whether such a change is appropriate for
> > gcc-6 at this stage. I guess it kind of depends on the specifics.
> 
> Certainly.  On one side I'd say it is too late now in stage3, on the other
> side when would be better time to do that, during stage1 people will have
> more likely out of the tree branches with more changes (I'm aware we even
> now have the HSA, OpenMP -> PTX and OpenACC branches).
> 
> So, if somebody wants to try that, we can see if the result would be
> appropriate.

So, has time now come to execute this task?  (To remind: the idea
explicitly has been to do this late, shortly before the gcc-6-branch gets
created, to make it easy in the following months to apply patches to both
trunk and gcc-6-branch.)


Grüße
 Thomas
Martin Jambor April 8, 2016, 10:46 a.m. UTC | #9
Hi,

On Fri, Apr 08, 2016 at 11:36:03AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > >
> > > >In addition to that, how about we split up gcc/omp-low.c into several
> > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > >along the borders of the several passes defined therein?  Or, can you
> > > >tell already that there would be too many cross-references between the
> > > >several files to make this infeasible?
> > > 
> > > It would be nice to get rid of all the code duplication in that file. That
> > > alone could reduce the size by quite a bit, and hopefully make it easier to
> > > read.
> > 
> > What exact code duplication do you mean?
> 
> (Has been discussed in the following.)  At this point, I do not intend to
> work on any kinds of cleanup, but rather just the "mechanical" changes:
> 
> > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > achieve.
> > 
> > Yeah, that might be the possible splitting boundary (have omp-low.c,
> > omp-exp.c).
> 
> Right.  And possibly some kind of omp-simd.c, and omp-checking.c, and so
> on, if feasible.  (I have not yet looked in detail.)
> 
> > > >I'd suggest to do this shortly before GCC 6 is released, so that
> > > >backports from trunk to gcc-6-branch will be easy.  (I assume we don't
> > > >have to care for gcc-5-branch backports too much any longer.)
> > > 
> > > I'll declare myself agnostic as to whether such a change is appropriate for
> > > gcc-6 at this stage. I guess it kind of depends on the specifics.
> > 
> > Certainly.  On one side I'd say it is too late now in stage3, on the other
> > side when would be better time to do that, during stage1 people will have
> > more likely out of the tree branches with more changes (I'm aware we even
> > now have the HSA, OpenMP -> PTX and OpenACC branches).
> > 
> > So, if somebody wants to try that, we can see if the result would be
> > appropriate.
> 
> So, has time now come to execute this task?  (To remind: the idea
> explicitly has been to do this late, shortly before the gcc-6-branch gets
> created, to make it easy in the following months to apply patches to both
> trunk and gcc-6-branch.)
> 

Unless someone is quicler, I can give it a go next Thursday (not any
sooner, unfortunately).  I would do a division into omp-low.c and
omp-exp.c and possibly an omp.c for simple stuff not fitting anywhere
else and perhaps even a separate omp-gridify.c.  Someone else would
have to put stuff into an omp-simd.c, I'm afraid.  But it we can go
about this incrementaly.

Thanks,

Martin
Jakub Jelinek April 8, 2016, 11:07 a.m. UTC | #10
On Fri, Apr 08, 2016 at 11:36:03AM +0200, Thomas Schwinge wrote:
> > Certainly.  On one side I'd say it is too late now in stage3, on the other
> > side when would be better time to do that, during stage1 people will have
> > more likely out of the tree branches with more changes (I'm aware we even
> > now have the HSA, OpenMP -> PTX and OpenACC branches).
> > 
> > So, if somebody wants to try that, we can see if the result would be
> > appropriate.
> 
> So, has time now come to execute this task?  (To remind: the idea
> explicitly has been to do this late, shortly before the gcc-6-branch gets
> created, to make it easy in the following months to apply patches to both
> trunk and gcc-6-branch.)

Only if you are able to do it quickly, branching is approaching very fast,
we have just last couple of P1s and are already below 100 P1-P3s.
I hope we can branch next week and release in 2 weeks.

	Jakub
Thomas Schwinge April 13, 2016, 4:01 p.m. UTC | #11
Hi!

On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > >how about we split up gcc/omp-low.c into several
> > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > >along the borders of the several passes defined therein?

> > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > achieve.

That was indeed the first one that I tackled, omp-expand.c (spelled out
"expand" instead of "exp" to avoid confusion as "exp" might also be short
for "expression"; OK?), and a omp-offload.c also fell out of that (with
more content to be moved into there, I suspect).

We could split up omp-offload.c even further, but I don't know if that's
really feasible.  Currently in there: offload tables stuff, OpenACC loops
stuff and pass_oacc_device_lower, pass_omp_target_link; separated by ^L
in this one file.

> And possibly some kind of omp-simd.c, and omp-checking.c, and so
> on, if feasible.  (I have not yet looked in detail.)

Not yet looked into these.

Stuff that does not relate to OMP lowering, I did not move stuff out of
omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
instead just left all that in omp-low.c.  We'll see how far we get.

One thing I noticed is that there sometimes is more than one suitable
place to put stuff: omp-low.c and omp-expand.c categorize by compiler
passes, and omp-offload.c -- at least in part -- is about the orthogonal
"offloading" category.  For example, see the OMPTODO "struct oacc_loop
and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.

> > > >I'd suggest to do this shortly before GCC 6 is released, [...]

Here is a first variant of such a patch.  I will continue to maintain
this, and intend to send (incremental?) patches on top of that one, but
intend to eventually commit all changes as one big commit, to avoid too
much "source code churn" (as much as that's possible...).

Some more comments, to help review:

The attached 0001-Split-up-gcc-omp-low.c.patch.xz is a Git "--color
--word-diff --ignore-space-change" patch, purely meant for manual review;
I'm intentionally ;-) not attaching a "patch-applyable" patch at this
point, to minimize the risk of other people starting to work on this in
parallel with my ongoing changes, which no doubt would result in a ton of
patch merge conflicts.  Yet, I'm of course very open to any kind of
suggestions; please submit these as a "verbal patch".  I will of course
submit a patch in any other format that you'd like for review.

This already survived "light" C/C++/Fortran
--enable-offload-targets=nvptx-none,x86_64-intelmicemul-linux-gnu,hsa
testing (no HSA execution testing, though), and also survived a "big"
bootstrap build.

As I don't know how this is usually done: is it appropriate to remove
"Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
contributing a ton of other stuff since omp-low.c has been created), or
does this line stay in omp-low.c, or do I even duplicate it into the new
files?

I tried not to re-order stuff when moving.  But: we may actually want to
reorder stuff, to put it into a more sensible order.  Any suggestions?

All lines with "//OMP" tags in them will eventually be removed; these are
just to help review (hence the --word-diff patch), and to solicit
comments, in the case of "//OMPTODO".  Some of the OMPTODOs are for
myself (clean up #include directives), but for the others, I'd like to
hear any comments that you have.

I guess you can just ignore any "//OMPCUT" tags (and I'll remove them at
one point, and clean up the whitespace).  (In the new files) these mean
that in the file where the surrounding stuff is from, there has been
other stuff that either remained in the original file (omp-low.c), or has
been moved to other files.

In omp-low.c and omp-low.h, a "//OMPLOWH" tag means that this line has
been moved to omp-low.h, and likewise: "//OMPEXP" to omp-expand.c,
"//OMPEXPH" to omp-expand.h, "//OMPOFF" to omp-offload.c, and "//OMPOFFH"
to omp-offload.h.

I had to export a small number of functions (see the prototypes not moved
but added to the header files).

Because it's also used in omp-expand.c, I moved the one-line static
inline is_reference function from omp-low.c to omp-low.h, and renamed it
to omp_is_reference because of the very generic name.  Similar functions
stay in omp-low.c however, so they're no longer defined next to each
other.  OK, or does this need a different solution?


Grüße
 Thomas
Bernd Schmidt April 13, 2016, 5:38 p.m. UTC | #12
On 04/13/2016 06:01 PM, Thomas Schwinge wrote:

> The attached 0001-Split-up-gcc-omp-low.c.patch.xz is a Git "--color
> --word-diff --ignore-space-change" patch, purely meant for manual review;
> I'm intentionally ;-) not attaching a "patch-applyable" patch at this
> point, to minimize the risk of other people starting to work on this in
> parallel with my ongoing changes, which no doubt would result in a ton of
> patch merge conflicts.  Yet, I'm of course very open to any kind of
> suggestions; please submit these as a "verbal patch".  I will of course
> submit a patch in any other format that you'd like for review.

I have no idea how to read this patch. I can't even properly show it 
with "less" because it seems to contain color escape sequences. The 
word-diff format (I assume that's what it is) is also unfamiliar and not 
immediately readable.

Best way to present this might be to do
diff -du old-omp-low.c <each of the split files>.


Bernd
Thomas Schwinge April 13, 2016, 5:56 p.m. UTC | #13
Hi Bernd!

On Wed, 13 Apr 2016 19:38:31 +0200, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 04/13/2016 06:01 PM, Thomas Schwinge wrote:
> 
> > The attached 0001-Split-up-gcc-omp-low.c.patch.xz is a Git "--color
> > --word-diff --ignore-space-change" patch, purely meant for manual review;

> I have no idea how to read this patch. I can't even properly show it 
> with "less" because it seems to contain color escape sequences.

Yes, that was intentional.  At least for me, colored output I can scan
(for unmodified/added/deleted) much faster than output only marked up
with +/-.  The -R option for less (I have it specified in $LESS) is
responsible to properly display that.

> The 
> word-diff format (I assume that's what it is) is also unfamiliar and not 
> immediately readable.

I also haven't been used to it until a few years ago; it's another thing
I find temenduously useful at times.

> Best way to present this might be to do
> diff -du old-omp-low.c <each of the split files>.

OK, I found Git "-C5%" produce something very similar to that;
0001-Split-up-gcc-omp-low.c-plain.patch.xz attached.


Grüße
 Thomas
Bernd Schmidt April 13, 2016, 6:20 p.m. UTC | #14
On 04/13/2016 07:56 PM, Thomas Schwinge wrote:

>> Best way to present this might be to do
>> diff -du old-omp-low.c <each of the split files>.
>
> OK, I found Git "-C5%" produce something very similar to that;
> 0001-Split-up-gcc-omp-low.c-plain.patch.xz attached.

That looks much better. However, the //OMPWHATEVER comments are not 
really all that helpful. I think the next step would be to clear out all 
this stuff including the OMPCUT markers, and also to start with an 
initial patch that includes everything that actually modifies code 
rather than moving it (omp_is_reference seems to be the major thing here).


Bernd
Thomas Schwinge April 14, 2016, 4:01 p.m. UTC | #15
Hi!

On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > >how about we split up gcc/omp-low.c into several
> > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > >along the borders of the several passes defined therein?
> 
> > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > achieve.

> > And possibly some kind of omp-simd.c, and omp-checking.c, and so
> > on, if feasible.
> 
> Not yet looked into these.

..., and here they are: word-diff patches creating omp-diagnostics.c and
omp-simd.c, 0001-new-file-gcc-omp-diagnostics.c.patch.xz and
0002-new-file-gcc-omp-simd.c.patch.xz.  The former contains the
"diagnose_omp_blocks" pass and the latter the "simdclone" pass, with the
respective supporting code.  I will certainly submit line-diff patches if
we agree that this is sound -- these two may actually be good candidates
to do first, individually, and do that now, because they're completely
self-contained.  Makes sense?

Should possibly rename omp-simd.c to omp-simd-clone.c to make it clear
that's the only thing it does, the "simdclone" pass?

Should we maybe rename omp-diagnostics.c to omp-structured-blocks.c, and
really only have it contain that "diagnose_omp_blocks" pass, or should we
move other diagnostic stuff like check_omp_nesting_restrictions into that
file, too?

One //OMPTODO:

    --- gcc/omp-low.h
    +++ gcc/omp-low.h
    @@ -95,0 +96,11 @@ extern gimple *build_omp_barrier (tree);
    +//OMPTODO: moved from omp-low.c.  Renamed from WALK_SUBSTMTS to OMP_WALK_SUBSTMTS because of the very generic name.  Used in omp-low.c and omp-diagnostics.c.  Alternatively, WALK_SUBSTMTS should perhaps simply be duplicated in the two files?
    +#define OMP_WALK_SUBSTMTS  \
    +    case GIMPLE_BIND: \
    +    case GIMPLE_TRY: \
    +    case GIMPLE_CATCH: \
    +    case GIMPLE_EH_FILTER: \
    +    case GIMPLE_TRANSACTION: \
    +      /* The sub-statements for these should be walked.  */ \
    +      *handled_ops_p = false; \
    +      break;

Instead of that, duplicate WALK_SUBSTMTS into both omp-low.c and
omp-diagnostics.c?


Grüße
 Thomas
Thomas Schwinge April 14, 2016, 8:27 p.m. UTC | #16
Hi!

On Thu, 14 Apr 2016 18:01:13 +0200, I wrote:
> "simdclone" pass, with the
> respective supporting code.  I will certainly submit line-diff patches if
> we agree that this is sound -- these two may actually be good candidates
> to do first, individually, and do that now, because they're completely
> self-contained.  Makes sense?

;-) Made enough sense to me, so that I prepared the attached patch.  I'm
also attaching a "-C" variant that I created using Git's -C5% option, and
which shows how the new file gcc/omp-simd-clone.c can be created from the
original gcc/omp-low.c.  (Is that useful for review, or are you manually
doing something like that anyway?)

> Should possibly rename omp-simd.c to omp-simd-clone.c to make it clear
> that's the only thing it does, the "simdclone" pass?

I did that.

I manually determined a reduced #include list for the new file
gcc/omp-simd-clone.c.  Hope that's alright.

OK to commit once bootstrap testing succeeded?

commit 8f33dc59ad24a995694d42ee9013e0853426e190
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Thu Apr 14 21:56:31 2016 +0200

    Split out OMP constructs' SIMD clone supporting code
    
    	gcc/
    	* omp-low.c (simd_clone_struct_alloc, simd_clone_struct_copy)
    	(simd_clone_vector_of_formal_parm_types)
    	(simd_clone_clauses_extract, simd_clone_compute_base_data_type)
    	(simd_clone_mangle, simd_clone_create)
    	(simd_clone_adjust_return_type, create_tmp_simd_array)
    	(simd_clone_adjust_argument_types, simd_clone_init_simd_arrays)
    	(struct modify_stmt_info, ipa_simd_modify_stmt_ops)
    	(ipa_simd_modify_function_body, simd_clone_linear_addend)
    	(simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone)
    	(pass_data_omp_simd_clone, class pass_omp_simd_clone)
    	(pass_omp_simd_clone::gate, make_pass_omp_simd_clone): Move
    	into...
    	* omp-simd-clone.c: ... this new file.
    	(simd_clone_vector_of_formal_parm_types): Make it static.
    	* Makefile.in (OBJS): Add omp-simd-clone.o.

 gcc/Makefile.in      |    1 +
 gcc/omp-low.c        | 1606 ------------------------------------------------
 gcc/omp-simd-clone.c | 1654 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1655 insertions(+), 1606 deletions(-)


Grüße
 Thomas
Thomas Schwinge April 15, 2016, 6:25 a.m. UTC | #17
Hi!

On Thu, 14 Apr 2016 22:27:40 +0200, I wrote:
> On Thu, 14 Apr 2016 18:01:13 +0200, I wrote:
> > "simdclone" pass, with the
> > respective supporting code.  I will certainly submit line-diff patches if
> > we agree that this is sound -- these two may actually be good candidates
> > to do first, individually, and do that now, because they're completely
> > self-contained.  Makes sense?
> 
> ;-) Made enough sense to me, so that I prepared the attached patch.  I'm
> also attaching a "-C" variant that I created using Git's -C5% option, and
> which shows how the new file gcc/omp-simd-clone.c can be created from the
> original gcc/omp-low.c.  (Is that useful for review, or are you manually
> doing something like that anyway?)
> 
> > Should possibly rename omp-simd.c to omp-simd-clone.c to make it clear
> > that's the only thing it does, the "simdclone" pass?
> 
> I did that.
> 
> I manually determined a reduced #include list for the new file
> gcc/omp-simd-clone.c.  Hope that's alright.
> 
> OK to commit once bootstrap testing succeeded?

Bootstrap/testing look good.  OK to commit?

> commit 8f33dc59ad24a995694d42ee9013e0853426e190
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Thu Apr 14 21:56:31 2016 +0200
> 
>     Split out OMP constructs' SIMD clone supporting code
>     
>     	gcc/
>     	* omp-low.c (simd_clone_struct_alloc, simd_clone_struct_copy)
>     	(simd_clone_vector_of_formal_parm_types)
>     	(simd_clone_clauses_extract, simd_clone_compute_base_data_type)
>     	(simd_clone_mangle, simd_clone_create)
>     	(simd_clone_adjust_return_type, create_tmp_simd_array)
>     	(simd_clone_adjust_argument_types, simd_clone_init_simd_arrays)
>     	(struct modify_stmt_info, ipa_simd_modify_stmt_ops)
>     	(ipa_simd_modify_function_body, simd_clone_linear_addend)
>     	(simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone)
>     	(pass_data_omp_simd_clone, class pass_omp_simd_clone)
>     	(pass_omp_simd_clone::gate, make_pass_omp_simd_clone): Move
>     	into...
>     	* omp-simd-clone.c: ... this new file.
>     	(simd_clone_vector_of_formal_parm_types): Make it static.
>     	* Makefile.in (OBJS): Add omp-simd-clone.o.
> 
>  gcc/Makefile.in      |    1 +
>  gcc/omp-low.c        | 1606 ------------------------------------------------
>  gcc/omp-simd-clone.c | 1654 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1655 insertions(+), 1606 deletions(-)


Grüße
 Thomas
Jakub Jelinek April 15, 2016, 11:15 a.m. UTC | #18
On Thu, Apr 14, 2016 at 10:27:40PM +0200, Thomas Schwinge wrote:
> On Thu, 14 Apr 2016 18:01:13 +0200, I wrote:
> > "simdclone" pass, with the
> > respective supporting code.  I will certainly submit line-diff patches if
> > we agree that this is sound -- these two may actually be good candidates
> > to do first, individually, and do that now, because they're completely
> > self-contained.  Makes sense?
> 
> ;-) Made enough sense to me, so that I prepared the attached patch.  I'm
> also attaching a "-C" variant that I created using Git's -C5% option, and
> which shows how the new file gcc/omp-simd-clone.c can be created from the
> original gcc/omp-low.c.  (Is that useful for review, or are you manually
> doing something like that anyway?)
> 
> > Should possibly rename omp-simd.c to omp-simd-clone.c to make it clear
> > that's the only thing it does, the "simdclone" pass?
> 
> I did that.
> 
> I manually determined a reduced #include list for the new file
> gcc/omp-simd-clone.c.  Hope that's alright.
> 
> OK to commit once bootstrap testing succeeded?

Ok if you manage to do so before the (hopefully intermittent) branching.

> commit 8f33dc59ad24a995694d42ee9013e0853426e190
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Thu Apr 14 21:56:31 2016 +0200
> 
>     Split out OMP constructs' SIMD clone supporting code
>     
>     	gcc/
>     	* omp-low.c (simd_clone_struct_alloc, simd_clone_struct_copy)
>     	(simd_clone_vector_of_formal_parm_types)
>     	(simd_clone_clauses_extract, simd_clone_compute_base_data_type)
>     	(simd_clone_mangle, simd_clone_create)
>     	(simd_clone_adjust_return_type, create_tmp_simd_array)
>     	(simd_clone_adjust_argument_types, simd_clone_init_simd_arrays)
>     	(struct modify_stmt_info, ipa_simd_modify_stmt_ops)
>     	(ipa_simd_modify_function_body, simd_clone_linear_addend)
>     	(simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone)
>     	(pass_data_omp_simd_clone, class pass_omp_simd_clone)
>     	(pass_omp_simd_clone::gate, make_pass_omp_simd_clone): Move
>     	into...
>     	* omp-simd-clone.c: ... this new file.
>     	(simd_clone_vector_of_formal_parm_types): Make it static.
>     	* Makefile.in (OBJS): Add omp-simd-clone.o.
> 
>  gcc/Makefile.in      |    1 +
>  gcc/omp-low.c        | 1606 ------------------------------------------------
>  gcc/omp-simd-clone.c | 1654 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1655 insertions(+), 1606 deletions(-)

	Jakub
Thomas Schwinge April 15, 2016, 11:53 a.m. UTC | #19
Hi!

On Fri, 15 Apr 2016 13:15:07 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Apr 14, 2016 at 10:27:40PM +0200, Thomas Schwinge wrote:
> > On Thu, 14 Apr 2016 18:01:13 +0200, I wrote:
> > > "simdclone" pass, with the
> > > respective supporting code.  I will certainly submit line-diff patches if
> > > we agree that this is sound -- these two may actually be good candidates
> > > to do first, individually, and do that now, because they're completely
> > > self-contained.  Makes sense?

> > OK to commit once bootstrap testing succeeded?
> 
> Ok

Committed without changes in r235017.

> if you manage to do so before the (hopefully intermittent) branching.

For all the other splitting patches that I have posted/proposed, the idea
then is to commit these onto both gcc-6-branch and trunk?


Grüße
 Thomas
Jakub Jelinek April 15, 2016, 11:57 a.m. UTC | #20
On Fri, Apr 15, 2016 at 01:53:14PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 15 Apr 2016 13:15:07 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Apr 14, 2016 at 10:27:40PM +0200, Thomas Schwinge wrote:
> > > On Thu, 14 Apr 2016 18:01:13 +0200, I wrote:
> > > > "simdclone" pass, with the
> > > > respective supporting code.  I will certainly submit line-diff patches if
> > > > we agree that this is sound -- these two may actually be good candidates
> > > > to do first, individually, and do that now, because they're completely
> > > > self-contained.  Makes sense?
> 
> > > OK to commit once bootstrap testing succeeded?
> > 
> > Ok
> 
> Committed without changes in r235017.
> 
> > if you manage to do so before the (hopefully intermittent) branching.
> 
> For all the other splitting patches that I have posted/proposed, the idea
> then is to commit these onto both gcc-6-branch and trunk?

If we branch today, then yes, though the gcc-6-branch commits would need to
wait until after 6.1 is released.

	Jakub
Thomas Schwinge April 15, 2016, 12:11 p.m. UTC | #21
Hi!

On Fri, 15 Apr 2016 13:57:05 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Apr 15, 2016 at 01:53:14PM +0200, Thomas Schwinge wrote:
> > For all the other splitting patches that I have posted/proposed, the idea
> > then is to commit these onto both gcc-6-branch and trunk?
> 
> If we branch today, then yes, though the gcc-6-branch commits would need to
> wait until after 6.1 is released.

Uh.  I fear these patches will be a bit of a pain to maintain if there's
then on-going development in trunk, touching the gcc/omp-low.c file?  I
thought we had agreed to get these cleanup changes into trunk in time for
the first GCC 6 release, to exactly avoid this kind of pain (that is,
divergence of the gcc/omp-low.c files in trunk and gcc-6-branch)?  :-( I
have not been aware that you intend to create the gcc-6-branch today, and
neither have I been aware that once the branch's created, the other
cleanup changes would have to wait until after the GCC 6.1 release.
There is no way I can get all this done today, or on the weekend.


Grüße
 Thomas
Jakub Jelinek April 15, 2016, 12:15 p.m. UTC | #22
On Fri, Apr 15, 2016 at 02:11:45PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 15 Apr 2016 13:57:05 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Apr 15, 2016 at 01:53:14PM +0200, Thomas Schwinge wrote:
> > > For all the other splitting patches that I have posted/proposed, the idea
> > > then is to commit these onto both gcc-6-branch and trunk?
> > 
> > If we branch today, then yes, though the gcc-6-branch commits would need to
> > wait until after 6.1 is released.
> 
> Uh.  I fear these patches will be a bit of a pain to maintain if there's
> then on-going development in trunk, touching the gcc/omp-low.c file?  I
> thought we had agreed to get these cleanup changes into trunk in time for
> the first GCC 6 release, to exactly avoid this kind of pain (that is,
> divergence of the gcc/omp-low.c files in trunk and gcc-6-branch)?  :-( I
> have not been aware that you intend to create the gcc-6-branch today, and
> neither have I been aware that once the branch's created, the other
> cleanup changes would have to wait until after the GCC 6.1 release.
> There is no way I can get all this done today, or on the weekend.

Then postpone the rest shortly before 6.1 is released.
We shouldn't have big changes to trunk before 6.1 is released anyway,
otherwise we won't be able to test on the trunk fixes intended also for 6.1.

	Jakub
Thomas Schwinge April 15, 2016, 2:33 p.m. UTC | #23
Hi!

On Fri, 15 Apr 2016 14:15:42 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Apr 15, 2016 at 02:11:45PM +0200, Thomas Schwinge wrote:
> > On Fri, 15 Apr 2016 13:57:05 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Fri, Apr 15, 2016 at 01:53:14PM +0200, Thomas Schwinge wrote:
> > > > For all the other splitting patches that I have posted/proposed, the idea
> > > > then is to commit these onto both gcc-6-branch and trunk?
> > > 
> > > If we branch today, then yes, though the gcc-6-branch commits would need to
> > > wait until after 6.1 is released.
> > 
> > Uh.  I fear these patches will be a bit of a pain to maintain if there's
> > then on-going development in trunk, touching the gcc/omp-low.c file?  I
> > thought we had agreed to get these cleanup changes into trunk in time for
> > the first GCC 6 release, to exactly avoid this kind of pain (that is,
> > divergence of the gcc/omp-low.c files in trunk and gcc-6-branch)?  :-( I
> > have not been aware that you intend to create the gcc-6-branch today, and
> > neither have I been aware that once the branch's created, the other
> > cleanup changes would have to wait until after the GCC 6.1 release.
> > There is no way I can get all this done today, or on the weekend.
> 
> Then postpone the rest shortly before 6.1 is released.
> We shouldn't have big changes to trunk before 6.1 is released anyway,
> otherwise we won't be able to test on the trunk fixes intended also for 6.1.

Hmm.  What about the further patches that I already have in the queue for
submission, just waiting for final testing/polishing?  Are you asking me
to rescind these now, and then have to reproduce the work later on (when
exactly?), or will it still be fine to get these approved and committed
in the near future (next days)?  Obviously, I'll strongly prefer the
latter, as I already have done most of the work, and as that's what I
understand to be the modus operandi we had agreed on.


Grüße
 Thomas
Thomas Schwinge May 3, 2016, 9:34 a.m. UTC | #24
Hi!

On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > >how about we split up gcc/omp-low.c into several
> > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > >along the borders of the several passes defined therein?
> 
> > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > achieve.
> 
> That was indeed the first one that I tackled, omp-expand.c (spelled out
> "expand" instead of "exp" to avoid confusion as "exp" might also be short
> for "expression"; OK?) [...]

That's the one I'd suggest to pursue next, now that GCC 6.1 has been
released.  How would you like me to submit the patch for review?  (It's
huge, obviously.)

A few high-level comments, and questions that remain to be answered:

> Stuff that does not relate to OMP lowering, I did not move stuff out of
> omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> instead just left all that in omp-low.c.  We'll see how far we get.
> 
> One thing I noticed is that there sometimes is more than one suitable
> place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> passes, and omp-offload.c -- at least in part -- [would be] about the orthogonal
> "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.

> Some more comments, to help review:

> As I don't know how this is usually done: is it appropriate to remove
> "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> contributing a ton of other stuff since omp-low.c has been created), or
> does this line stay in omp-low.c, or do I even duplicate it into the new
> files?
> 
> I tried not to re-order stuff when moving.  But: we may actually want to
> reorder stuff, to put it into a more sensible order.  Any suggestions?

> I had to export a small number of functions (see the prototypes not moved
> but added to the header files).
> 
> Because it's also used in omp-expand.c, I moved the one-line static
> inline is_reference function from omp-low.c to omp-low.h, and renamed it
> to omp_is_reference because of the very generic name.  Similar functions
> stay in omp-low.c however, so they're no longer defined next to each
> other.  OK, or does this need a different solution?


Grüße
 Thomas
Thomas Schwinge May 11, 2016, 1:44 p.m. UTC | #25
Hi!

Ping.

On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > >how about we split up gcc/omp-low.c into several
> > > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > > >along the borders of the several passes defined therein?
> > 
> > > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > > achieve.
> > 
> > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > "expand" instead of "exp" to avoid confusion as "exp" might also be short
> > for "expression"; OK?) [...]
> 
> That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> released.  How would you like me to submit the patch for review?  (It's
> huge, obviously.)
> 
> A few high-level comments, and questions that remain to be answered:
> 
> > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > instead just left all that in omp-low.c.  We'll see how far we get.
> > 
> > One thing I noticed is that there sometimes is more than one suitable
> > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > passes, and omp-offload.c -- at least in part -- [would be] about the orthogonal
> > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.
> 
> > Some more comments, to help review:
> 
> > As I don't know how this is usually done: is it appropriate to remove
> > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > contributing a ton of other stuff since omp-low.c has been created), or
> > does this line stay in omp-low.c, or do I even duplicate it into the new
> > files?
> > 
> > I tried not to re-order stuff when moving.  But: we may actually want to
> > reorder stuff, to put it into a more sensible order.  Any suggestions?
> 
> > I had to export a small number of functions (see the prototypes not moved
> > but added to the header files).
> > 
> > Because it's also used in omp-expand.c, I moved the one-line static
> > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > to omp_is_reference because of the very generic name.  Similar functions
> > stay in omp-low.c however, so they're no longer defined next to each
> > other.  OK, or does this need a different solution?


Grüße
 Thomas
Thomas Schwinge May 18, 2016, 11:42 a.m. UTC | #26
Hi!

Ping.

On Wed, 11 May 2016 15:44:14 +0200, I wrote:
> Ping.
> 
> On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> > On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > > >how about we split up gcc/omp-low.c into several
> > > > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > > > >along the borders of the several passes defined therein?
> > > 
> > > > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > > > achieve.
> > > 
> > > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > > "expand" instead of "exp" to avoid confusion as "exp" might also be short
> > > for "expression"; OK?) [...]
> > 
> > That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> > released.  How would you like me to submit the patch for review?  (It's
> > huge, obviously.)
> > 
> > A few high-level comments, and questions that remain to be answered:
> > 
> > > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > > instead just left all that in omp-low.c.  We'll see how far we get.
> > > 
> > > One thing I noticed is that there sometimes is more than one suitable
> > > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > > passes, and omp-offload.c -- at least in part -- [would be] about the orthogonal
> > > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.
> > 
> > > Some more comments, to help review:
> > 
> > > As I don't know how this is usually done: is it appropriate to remove
> > > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > > contributing a ton of other stuff since omp-low.c has been created), or
> > > does this line stay in omp-low.c, or do I even duplicate it into the new
> > > files?
> > > 
> > > I tried not to re-order stuff when moving.  But: we may actually want to
> > > reorder stuff, to put it into a more sensible order.  Any suggestions?
> > 
> > > I had to export a small number of functions (see the prototypes not moved
> > > but added to the header files).
> > > 
> > > Because it's also used in omp-expand.c, I moved the one-line static
> > > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > > to omp_is_reference because of the very generic name.  Similar functions
> > > stay in omp-low.c however, so they're no longer defined next to each
> > > other.  OK, or does this need a different solution?


Grüße
 Thomas
Thomas Schwinge May 25, 2016, 6:03 a.m. UTC | #27
Hi!

Ping.

Given that we conceptually agreed about this task, but apparently nobody
is now interested in reviewing my proposed changes (and tells me how
they'd like me to submit the patch for review), should I maybe just
execute the steps?

On Wed, 18 May 2016 13:42:37 +0200, Thomas Schwinge <thomas@codesourcery.com> wrote:
> Ping.
> 
> On Wed, 11 May 2016 15:44:14 +0200, I wrote:
> > Ping.
> > 
> > On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> > > On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > > > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > > > >how about we split up gcc/omp-low.c into several
> > > > > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > > > > >along the borders of the several passes defined therein?
> > > > 
> > > > > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > > > > achieve.
> > > > 
> > > > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > > > "expand" instead of "exp" to avoid confusion as "exp" might also be short
> > > > for "expression"; OK?) [...]
> > > 
> > > That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> > > released.  How would you like me to submit the patch for review?  (It's
> > > huge, obviously.)
> > > 
> > > A few high-level comments, and questions that remain to be answered:
> > > 
> > > > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > > > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > > > instead just left all that in omp-low.c.  We'll see how far we get.
> > > > 
> > > > One thing I noticed is that there sometimes is more than one suitable
> > > > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > > > passes, and omp-offload.c -- at least in part -- [would be] about the orthogonal
> > > > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > > > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.
> > > 
> > > > Some more comments, to help review:
> > > 
> > > > As I don't know how this is usually done: is it appropriate to remove
> > > > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > > > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > > > contributing a ton of other stuff since omp-low.c has been created), or
> > > > does this line stay in omp-low.c, or do I even duplicate it into the new
> > > > files?
> > > > 
> > > > I tried not to re-order stuff when moving.  But: we may actually want to
> > > > reorder stuff, to put it into a more sensible order.  Any suggestions?
> > > 
> > > > I had to export a small number of functions (see the prototypes not moved
> > > > but added to the header files).
> > > > 
> > > > Because it's also used in omp-expand.c, I moved the one-line static
> > > > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > > > to omp_is_reference because of the very generic name.  Similar functions
> > > > stay in omp-low.c however, so they're no longer defined next to each
> > > > other.  OK, or does this need a different solution?


Grüße
 Thomas
Martin Jambor May 25, 2016, 11:15 a.m. UTC | #28
Hi,

On Wed, May 25, 2016 at 08:03:41AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> Ping.
> 
> Given that we conceptually agreed about this task, but apparently nobody
> is now interested in reviewing my proposed changes (and tells me how
> they'd like me to submit the patch for review), should I maybe just
> execute the steps?

I would suggest that you re-post the patch in a new thread (and in an
orinary non-fancy format), the details are now buried in this big
thred and might easily be forgotten.

I would also strongly suggest that you post the Changelog in the email
body, even if you compress and attach the patch itself.  Frankly, if
the changes are just mechanical movements, only the Changelog is what
I'd like to see and perhaps comment on.

Martin

> 
> On Wed, 18 May 2016 13:42:37 +0200, Thomas Schwinge <thomas@codesourcery.com> wrote:
> > Ping.
> > 
> > On Wed, 11 May 2016 15:44:14 +0200, I wrote:
> > > Ping.
> > > 
> > > On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> > > > On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > > > > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > > > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > > > > >how about we split up gcc/omp-low.c into several
> > > > > > > > >files?  Would it make sense (I have not yet looked in detail) to do so
> > > > > > > > >along the borders of the several passes defined therein?
> > > > > 
> > > > > > > > I suspect a split along the ompexp/omplow boundary would be quite easy to
> > > > > > > > achieve.
> > > > > 
> > > > > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > > > > "expand" instead of "exp" to avoid confusion as "exp" might also be short
> > > > > for "expression"; OK?) [...]
> > > > 
> > > > That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> > > > released.  How would you like me to submit the patch for review?  (It's
> > > > huge, obviously.)
> > > > 
> > > > A few high-level comments, and questions that remain to be answered:
> > > > 
> > > > > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > > > > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > > > > instead just left all that in omp-low.c.  We'll see how far we get.
> > > > > 
> > > > > One thing I noticed is that there sometimes is more than one suitable
> > > > > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > > > > passes, and omp-offload.c -- at least in part -- [would be] about the orthogonal
> > > > > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > > > > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that goes.
> > > > 
> > > > > Some more comments, to help review:
> > > > 
> > > > > As I don't know how this is usually done: is it appropriate to remove
> > > > > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > > > > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > > > > contributing a ton of other stuff since omp-low.c has been created), or
> > > > > does this line stay in omp-low.c, or do I even duplicate it into the new
> > > > > files?
> > > > > 
> > > > > I tried not to re-order stuff when moving.  But: we may actually want to
> > > > > reorder stuff, to put it into a more sensible order.  Any suggestions?
> > > > 
> > > > > I had to export a small number of functions (see the prototypes not moved
> > > > > but added to the header files).
> > > > > 
> > > > > Because it's also used in omp-expand.c, I moved the one-line static
> > > > > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > > > > to omp_is_reference because of the very generic name.  Similar functions
> > > > > stay in omp-low.c however, so they're no longer defined next to each
> > > > > other.  OK, or does this need a different solution?
> 
> 
> Grüße
>  Thomas
diff mbox

Patch

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c68fb19..8dcf3a6 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -478,6 +478,8 @@  DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
 DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
 		     BT_PTR_ULONGLONG)
+DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
+		     BT_INT, BT_PTR)
 
 DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG,
 		     BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING,
@@ -556,9 +558,9 @@  DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
-		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
-		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+		     BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+		     BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
 		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index a37e856..283eaf4 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -159,6 +159,8 @@  DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
 DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_ULLPTR_ULLPTR_ULLPTR,
 		     BT_BOOL, BT_UINT, BT_PTR_ULONGLONG, BT_PTR_ULONGLONG,
 		     BT_PTR_ULONGLONG)
+DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, BT_INT, BT_PTR,
+		     BT_INT, BT_PTR)
 
 DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
 		     BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT,
@@ -221,9 +223,9 @@  DEF_FUNCTION_TYPE_9 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 		     BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 		     BT_BOOL, BT_UINT, BT_PTR, BT_INT)
 
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		      BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
-		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+		      BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
 		      BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c
index 4994918..d2a6a80 100644
--- a/gcc/gimple-low.c
+++ b/gcc/gimple-low.c
@@ -358,6 +358,7 @@  lower_stmt (gimple_stmt_iterator *gsi, struct lower_data *data)
     case GIMPLE_OMP_TASK:
     case GIMPLE_OMP_TARGET:
     case GIMPLE_OMP_TEAMS:
+    case GIMPLE_OMP_GPUKERNEL:
       data->cannot_fallthru = false;
       lower_omp_directive (gsi, data);
       data->cannot_fallthru = false;
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 7764201..8f5e889 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1187,6 +1187,9 @@  dump_gimple_omp_for (pretty_printer *buffer, gomp_for *gs, int spc, int flags)
 	case GF_OMP_FOR_KIND_CILKSIMD:
 	  pp_string (buffer, "#pragma simd");
 	  break;
+	case GF_OMP_FOR_KIND_KERNEL_BODY:
+	  pp_string (buffer, "#pragma omp for kernel");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -1491,6 +1494,9 @@  dump_gimple_omp_block (pretty_printer *buffer, gimple *gs, int spc, int flags)
 	case GIMPLE_OMP_SECTION:
 	  pp_string (buffer, "#pragma omp section");
 	  break;
+	case GIMPLE_OMP_GPUKERNEL:
+	  pp_string (buffer, "#pragma omp gpukernel");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -2273,6 +2279,7 @@  pp_gimple_stmt_1 (pretty_printer *buffer, gimple *gs, int spc, int flags)
     case GIMPLE_OMP_MASTER:
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_GPUKERNEL:
       dump_gimple_omp_block (buffer, gs, spc, flags);
       break;
 
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index 850cf57..695592d 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -644,6 +644,7 @@  walk_gimple_stmt (gimple_stmt_iterator *gsi, walk_stmt_fn callback_stmt,
     case GIMPLE_OMP_SINGLE:
     case GIMPLE_OMP_TARGET:
     case GIMPLE_OMP_TEAMS:
+    case GIMPLE_OMP_GPUKERNEL:
       ret = walk_gimple_seq_mod (gimple_omp_body_ptr (stmt), callback_stmt,
 			     callback_op, wi);
       if (ret)
diff --git a/gcc/gimple.c b/gcc/gimple.c
index bf552a7..cae42e9 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -954,6 +954,19 @@  gimple_build_omp_master (gimple_seq body)
   return p;
 }
 
+/* Build a GIMPLE_OMP_GPUKERNEL statement.
+
+   BODY is the sequence of statements to be executed by the kernel.  */
+
+gimple *
+gimple_build_omp_gpukernel (gimple_seq body)
+{
+  gimple *p = gimple_alloc (GIMPLE_OMP_GPUKERNEL, 0);
+  if (body)
+    gimple_omp_set_body (p, body);
+
+  return p;
+}
 
 /* Build a GIMPLE_OMP_TASKGROUP statement.
 
@@ -1805,6 +1817,7 @@  gimple_copy (gimple *stmt)
 	case GIMPLE_OMP_SECTION:
 	case GIMPLE_OMP_MASTER:
 	case GIMPLE_OMP_TASKGROUP:
+	case GIMPLE_OMP_GPUKERNEL:
 	copy_omp_body:
 	  new_seq = gimple_seq_copy (gimple_omp_body (stmt));
 	  gimple_omp_set_body (copy, new_seq);
diff --git a/gcc/gimple.def b/gcc/gimple.def
index d3ca402..30f0111 100644
--- a/gcc/gimple.def
+++ b/gcc/gimple.def
@@ -369,13 +369,17 @@  DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", GSS_OMP_PARALLEL_LAYOUT)
 /* GIMPLE_OMP_TEAMS <BODY, CLAUSES> represents #pragma omp teams
    BODY is the sequence of statements inside the single section.
    CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
-DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
+DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT)
 
 /* GIMPLE_OMP_ORDERED <BODY, CLAUSES> represents #pragma omp ordered.
    BODY is the sequence of statements to execute in the ordered section.
    CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
 DEFGSCODE(GIMPLE_OMP_ORDERED, "gimple_omp_ordered", GSS_OMP_SINGLE_LAYOUT)
 
+/* GIMPLE_OMP_GPUKERNEL <BODY> represents a parallel loop lowered for execution
+   on a GPU.  It is an artificial statement created by omp lowering.  */
+DEFGSCODE(GIMPLE_OMP_GPUKERNEL, "gimple_omp_gpukernel", GSS_OMP)
+
 /* GIMPLE_PREDICT <PREDICT, OUTCOME> specifies a hint for branch prediction.
 
    PREDICT is one of the predictors from predict.def.
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 0b04804..7b212b2 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -153,6 +153,7 @@  enum gf_mask {
     GF_OMP_FOR_KIND_TASKLOOP	= 2,
     GF_OMP_FOR_KIND_CILKFOR     = 3,
     GF_OMP_FOR_KIND_OACC_LOOP	= 4,
+    GF_OMP_FOR_KIND_KERNEL_BODY = 5,
     /* Flag for SIMD variants of OMP_FOR kinds.  */
     GF_OMP_FOR_SIMD		= 1 << 3,
     GF_OMP_FOR_KIND_SIMD	= GF_OMP_FOR_SIMD | 0,
@@ -622,8 +623,14 @@  struct GTY((tag("GSS_OMP_FOR")))
   /* [ WORD 11 ]
      Pre-body evaluated before the loop body begins.  */
   gimple_seq pre_body;
+
+  /* [ WORD 12 ]
+     If set, this statement is part of a gridified kernel, its clauses need to
+     be scanned and lowered but the statement should be discarded after
+     lowering.  */
+  bool kernel_phony;
 };
 
 
 /* GIMPLE_OMP_PARALLEL, GIMPLE_OMP_TARGET, GIMPLE_OMP_TASK */
 
@@ -643,6 +660,12 @@  struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
   /* [ WORD 10 ]
      Shared data argument.  */
   tree data_arg;
+
+  /* [ WORD 11 ] */
+  /* If set, this statement is part of a gridified kernel, its clauses need to
+     be scanned and lowered but the statement should be discarded after
+     lowering.  */
+  bool kernel_phony;
 };
 
 /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */
@@ -725,14 +748,14 @@  struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 };
 
-/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
+/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gimple_statement_omp_single_layout : public gimple_statement_omp
 {
   /* [ WORD 1-7 ] : base class */
 
-  /* [ WORD 7 ]  */
+  /* [ WORD 8 ]  */
   tree clauses;
 };
 
@@ -743,11 +766,18 @@  struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
          stmt->code == GIMPLE_OMP_SINGLE.  */
 };
 
-struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
+/* GIMPLE_OMP_TEAMS */
+
+struct GTY((tag("GSS_OMP_TEAMS_LAYOUT")))
   gomp_teams : public gimple_statement_omp_single_layout
 {
-    /* No extra fields; adds invariant:
-         stmt->code == GIMPLE_OMP_TEAMS.  */
+  /* [ WORD 1-8 ] : base class */
+
+  /* [ WORD 9 ]
+     If set, this statement is part of a gridified kernel, its clauses need to
+     be scanned and lowered but the statement should be discarded after
+     lowering.  */
+  bool kernel_phony;
 };
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
@@ -1451,6 +1481,7 @@  gomp_task *gimple_build_omp_task (gimple_seq, tree, tree, tree, tree,
 				       tree, tree);
 gimple *gimple_build_omp_section (gimple_seq);
 gimple *gimple_build_omp_master (gimple_seq);
+gimple *gimple_build_omp_gpukernel (gimple_seq);
 gimple *gimple_build_omp_taskgroup (gimple_seq);
 gomp_continue *gimple_build_omp_continue (tree, tree);
 gomp_ordered *gimple_build_omp_ordered (gimple_seq, tree);
@@ -1711,6 +1742,7 @@  gimple_has_substatements (gimple *g)
     case GIMPLE_OMP_CRITICAL:
     case GIMPLE_WITH_CLEANUP_EXPR:
     case GIMPLE_TRANSACTION:
+    case GIMPLE_OMP_GPUKERNEL:
       return true;
 
     default:
@@ -5076,6 +5108,21 @@  gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
   omp_for_stmt->pre_body = pre_body;
 }
 
+/* Return the kernel_phony of OMP_FOR statement.  */
+
+static inline bool
+gimple_omp_for_kernel_phony (const gomp_for *omp_for)
+{
+  return omp_for->kernel_phony;
+}
+
+/* Set kernel_phony flag of OMP_FOR to VALUE.  */
+
+static inline void
+gimple_omp_for_set_kernel_phony (gomp_for *omp_for, bool value)
+{
+  omp_for->kernel_phony = value;
+}
 
 /* Return the clauses associated with OMP_PARALLEL GS.  */
 
@@ -5162,6 +5209,22 @@  gimple_omp_parallel_set_data_arg (gomp_parallel *omp_parallel_stmt,
   omp_parallel_stmt->data_arg = data_arg;
 }
 
+/* Return the kernel_phony flag of OMP_PARALLEL_STMT.  */
+
+static inline bool
+gimple_omp_parallel_kernel_phony (const gomp_parallel *omp_parallel_stmt)
+{
+  return omp_parallel_stmt->kernel_phony;
+}
+
+/* Set kernel_phony flag of OMP_PARALLEL_STMT to VALUE.  */
+
+static inline void
+gimple_omp_parallel_set_kernel_phony (gomp_parallel *omp_parallel_stmt,
+				      bool value)
+{
+  omp_parallel_stmt->kernel_phony = value;
+}
 
 /* Return the clauses associated with OMP_TASK GS.  */
 
@@ -5635,6 +5697,21 @@  gimple_omp_teams_set_clauses (gomp_teams *omp_teams_stmt, tree clauses)
   omp_teams_stmt->clauses = clauses;
 }
 
+/* Return the kernel_phony flag of an OMP_TEAMS_STMT.  */
+
+static inline bool
+gimple_omp_teams_kernel_phony (const gomp_teams *omp_teams_stmt)
+{
+  return omp_teams_stmt->kernel_phony;
+}
+
+/* Set kernel_phony flag of an OMP_TEAMS_STMT to VALUE.  */
+
+static inline void
+gimple_omp_teams_set_kernel_phony (gomp_teams *omp_teams_stmt, bool value)
+{
+  omp_teams_stmt->kernel_phony = value;
+}
 
 /* Return the clauses associated with OMP_SECTIONS GS.  */
 
@@ -5964,7 +6041,8 @@  gimple_return_set_retbnd (gimple *gs, tree retval)
     case GIMPLE_OMP_RETURN:			\
     case GIMPLE_OMP_ATOMIC_LOAD:		\
     case GIMPLE_OMP_ATOMIC_STORE:		\
-    case GIMPLE_OMP_CONTINUE
+    case GIMPLE_OMP_CONTINUE:			\
+    case GIMPLE_OMP_GPUKERNEL
 
 static inline bool
 is_gimple_omp (const gimple *stmt)
diff --git a/gcc/gsstruct.def b/gcc/gsstruct.def
index d84e098..9d6b0ef 100644
--- a/gcc/gsstruct.def
+++ b/gcc/gsstruct.def
@@ -47,6 +47,7 @@  DEFGSSTRUCT(GSS_OMP_PARALLEL_LAYOUT, gimple_statement_omp_parallel_layout, false
 DEFGSSTRUCT(GSS_OMP_TASK, gomp_task, false)
 DEFGSSTRUCT(GSS_OMP_SECTIONS, gomp_sections, false)
 DEFGSSTRUCT(GSS_OMP_SINGLE_LAYOUT, gimple_statement_omp_single_layout, false)
+DEFGSSTRUCT(GSS_OMP_TEAMS_LAYOUT, gomp_teams, false)
 DEFGSSTRUCT(GSS_OMP_CONTINUE, gomp_continue, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_LOAD, gomp_atomic_load, false)
 DEFGSSTRUCT(GSS_OMP_ATOMIC_STORE_LAYOUT, gomp_atomic_store, false)
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index d540dab..b9054ef 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -338,8 +338,13 @@  DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, "GOMP_single_copy_start",
 		  BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
 		  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_REGISTER, "GOMP_offload_register_ver",
+		  BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_OFFLOAD_UNREGISTER,
+		  "GOMP_offload_unregister_ver",
+		  BT_FN_VOID_UINT_PTR_INT_PTR, ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
-		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+		  BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
 		  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
 		  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f17a828..971e173 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -80,6 +80,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "lto-section-names.h"
 #include "gomp-constants.h"
 #include "gimple-pretty-print.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+#include "params.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
@@ -450,6 +453,63 @@  is_combined_parallel (struct omp_region *region)
   return region->is_combined_parallel;
 }
 
+/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or
+   GT_EXPR.  */
+
+static void
+adjust_for_condition (location_t loc, enum tree_code *cond_code, tree *n2)
+{
+  switch (*cond_code)
+    {
+    case LT_EXPR:
+    case GT_EXPR:
+    case NE_EXPR:
+      break;
+    case LE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+	*n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, 1);
+      else
+	*n2 = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (*n2), *n2,
+			       build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = LT_EXPR;
+      break;
+    case GE_EXPR:
+      if (POINTER_TYPE_P (TREE_TYPE (*n2)))
+	*n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, -1);
+      else
+	*n2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (*n2), *n2,
+			       build_int_cst (TREE_TYPE (*n2), 1));
+      *cond_code = GT_EXPR;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the looping step from INCR, extracted from the step of a gimple omp
+   for statement.  */
+
+static tree
+get_omp_for_step_from_incr (location_t loc, tree incr)
+{
+  tree step;
+  switch (TREE_CODE (incr))
+    {
+    case PLUS_EXPR:
+      step = TREE_OPERAND (incr, 1);
+      break;
+    case POINTER_PLUS_EXPR:
+      step = fold_convert (ssizetype, TREE_OPERAND (incr, 1));
+      break;
+    case MINUS_EXPR:
+      step = TREE_OPERAND (incr, 1);
+      step = fold_build1_loc (loc, NEGATE_EXPR, TREE_TYPE (step), step);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  return step;
+}
 
 /* Extract the header elements of parallel loop FOR_STMT and store
    them into *FD.  */
@@ -579,58 +639,14 @@  extract_omp_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
 
       loop->cond_code = gimple_omp_for_cond (for_stmt, i);
       loop->n2 = gimple_omp_for_final (for_stmt, i);
-      switch (loop->cond_code)
-	{
-	case LT_EXPR:
-	case GT_EXPR:
-	  break;
-	case NE_EXPR:
-	  gcc_assert (gimple_omp_for_kind (for_stmt)
-		      == GF_OMP_FOR_KIND_CILKSIMD
-		      || (gimple_omp_for_kind (for_stmt)
-			  == GF_OMP_FOR_KIND_CILKFOR));
-	  break;
-	case LE_EXPR:
-	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
-	    loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, 1);
-	  else
-	    loop->n2 = fold_build2_loc (loc,
-				    PLUS_EXPR, TREE_TYPE (loop->n2), loop->n2,
-				    build_int_cst (TREE_TYPE (loop->n2), 1));
-	  loop->cond_code = LT_EXPR;
-	  break;
-	case GE_EXPR:
-	  if (POINTER_TYPE_P (TREE_TYPE (loop->n2)))
-	    loop->n2 = fold_build_pointer_plus_hwi_loc (loc, loop->n2, -1);
-	  else
-	    loop->n2 = fold_build2_loc (loc,
-				    MINUS_EXPR, TREE_TYPE (loop->n2), loop->n2,
-				    build_int_cst (TREE_TYPE (loop->n2), 1));
-	  loop->cond_code = GT_EXPR;
-	  break;
-	default:
-	  gcc_unreachable ();
-	}
+      gcc_assert (loop->cond_code != NE_EXPR
+		  || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKSIMD
+		  || gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_CILKFOR);
+      adjust_for_condition (loc, &loop->cond_code, &loop->n2);
 
       t = gimple_omp_for_incr (for_stmt, i);
       gcc_assert (TREE_OPERAND (t, 0) == var);
-      switch (TREE_CODE (t))
-	{
-	case PLUS_EXPR:
-	  loop->step = TREE_OPERAND (t, 1);
-	  break;
-	case POINTER_PLUS_EXPR:
-	  loop->step = fold_convert (ssizetype, TREE_OPERAND (t, 1));
-	  break;
-	case MINUS_EXPR:
-	  loop->step = TREE_OPERAND (t, 1);
-	  loop->step = fold_build1_loc (loc,
-				    NEGATE_EXPR, TREE_TYPE (loop->step),
-				    loop->step);
-	  break;
-	default:
-	  gcc_unreachable ();
-	}
+      loop->step = get_omp_for_step_from_incr (loc, t);
 
       if (simd
 	  || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
@@ -1321,7 +1337,16 @@  build_outer_var_ref (tree var, omp_context *ctx, bool lastprivate = false)
 	}
     }
   else if (ctx->outer)
-    x = lookup_decl (var, ctx->outer);
+    {
+      omp_context *outer = ctx->outer;
+      if (gimple_code (outer->stmt) == GIMPLE_OMP_GPUKERNEL)
+	{
+	  outer = outer->outer;
+	  gcc_assert (outer
+		      && gimple_code (outer->stmt) != GIMPLE_OMP_GPUKERNEL);
+	}
+	x = lookup_decl (var, outer);
+    }
   else if (is_reference (var))
     /* This can happen with orphaned constructs.  If var is reference, it is
        possible it is shared and as such valid.  */
@@ -1761,6 +1786,8 @@  fixup_child_record_type (omp_context *ctx)
 {
   tree f, type = ctx->record_type;
 
+  if (!ctx->receiver_decl)
+    return;
   /* ??? It isn't sufficient to just call remap_type here, because
      variably_modified_type_p doesn't work the way we expect for
      record types.  Testing each field for whether it needs remapping
@@ -2113,6 +2140,14 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    }
 	  break;
 
+	case OMP_CLAUSE__GRIDDIM_:
+	  if (ctx->outer)
+	    {
+	      scan_omp_op (&OMP_CLAUSE_GRIDDIM_SIZE (c), ctx->outer);
+	      scan_omp_op (&OMP_CLAUSE_GRIDDIM_GROUP (c), ctx->outer);
+	    }
+	  break;
+
 	case OMP_CLAUSE_NOWAIT:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_COLLAPSE:
@@ -2309,6 +2344,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE__GRIDDIM_:
 	  break;
 
 	case OMP_CLAUSE_DEVICE_RESIDENT:
@@ -2631,8 +2667,11 @@  scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx)
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   TYPE_ARTIFICIAL (ctx->record_type) = 1;
-  create_omp_child_function (ctx, false);
-  gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+  if (!gimple_omp_parallel_kernel_phony (stmt))
+    {
+      create_omp_child_function (ctx, false);
+      gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+    }
 
   scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
@@ -3102,6 +3142,11 @@  check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 {
   tree c;
 
+  if (ctx && gimple_code (ctx->stmt) == GIMPLE_OMP_GPUKERNEL)
+    /* GPUKERNEL is an artificial construct, nesting rules will be checked in
+       the original copy of its contents.  */
+    return true;
+
   /* No nesting of non-OpenACC STMT (that is, an OpenMP one, or a GOMP builtin)
      inside an OpenACC CTX.  */
   if (!(is_gimple_omp (stmt)
@@ -3686,7 +3731,11 @@  scan_omp_1_op (tree *tp, int *walk_subtrees, void *data)
     case LABEL_DECL:
     case RESULT_DECL:
       if (ctx)
-	*tp = remap_decl (t, &ctx->cb);
+	{
+	  tree repl = remap_decl (t, &ctx->cb);
+	  gcc_checking_assert (TREE_CODE (repl) != ERROR_MARK);
+	  *tp = repl;
+	}
       break;
 
     default:
@@ -3820,6 +3869,7 @@  scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_ORDERED:
     case GIMPLE_OMP_CRITICAL:
+    case GIMPLE_OMP_GPUKERNEL:
       ctx = new_omp_context (stmt, ctx);
       scan_omp (gimple_omp_body_ptr (stmt), ctx);
       break;
@@ -6252,6 +6302,37 @@  gimple_build_cond_empty (tree cond)
   return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE);
 }
 
+/* Return true if a parallel REGION is within a declare target function or
+   within a target region and is not a part of a gridified kernel.  */
+
+static bool
+region_needs_kernel_p (struct omp_region *region)
+{
+  bool indirect = false;
+  for (region = region->outer; region; region = region->outer)
+    {
+      if (region->type == GIMPLE_OMP_PARALLEL)
+	indirect = true;
+      else if (region->type == GIMPLE_OMP_TARGET)
+	{
+	  gomp_target *tgt_stmt;
+	  tgt_stmt = as_a <gomp_target *> (last_stmt (region->entry));
+
+	  if (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+			       OMP_CLAUSE__GRIDDIM_))
+	    return indirect;
+	  else
+	    return true;
+	}
+    }
+
+  if (lookup_attribute ("omp declare target",
+			DECL_ATTRIBUTES (current_function_decl)))
+    return true;
+
+  return false;
+}
+
 static void expand_omp_build_assign (gimple_stmt_iterator *, tree, tree,
 				     bool = false);
 
@@ -6421,7 +6502,8 @@  expand_parallel_call (struct omp_region *region, basic_block bb,
     t1 = null_pointer_node;
   else
     t1 = build_fold_addr_expr (t);
-  t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
+  tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
+  t2 = build_fold_addr_expr (child_fndecl);
 
   vec_alloc (args, 4 + vec_safe_length (ws_args));
   args->quick_push (t2);
@@ -6436,6 +6518,13 @@  expand_parallel_call (struct omp_region *region, basic_block bb,
 
   force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
 			    false, GSI_CONTINUE_LINKING);
+
+  if (hsa_gen_requested_p ()
+      && region_needs_kernel_p (region))
+    {
+      cgraph_node *child_cnode = cgraph_node::get (child_fndecl);
+      hsa_register_kernel (child_cnode);
+    }
 }
 
 /* Insert a function call whose name is FUNC_NAME with the information from
@@ -12475,6 +12564,202 @@  mark_loops_in_oacc_kernels_region (basic_block region_entry,
     loop->in_oacc_kernels_region = true;
 }
 
+/* Types used to pass grid and wortkgroup sizes to kernel invocation.  */
+
+static GTY(()) tree kernel_dim_array_type;
+static GTY(()) tree kernel_lattrs_dimnum_decl;
+static GTY(()) tree kernel_lattrs_grid_decl;
+static GTY(()) tree kernel_lattrs_group_decl;
+static GTY(()) tree kernel_launch_attributes_type;
+
+/* Create types used to pass kernel launch attributes to target.  */
+
+static void
+create_kernel_launch_attr_types (void)
+{
+  if (kernel_launch_attributes_type)
+    return;
+
+  tree dim_arr_index_type;
+  dim_arr_index_type = build_index_type (build_int_cst (integer_type_node, 2));
+  kernel_dim_array_type = build_array_type (uint32_type_node,
+					    dim_arr_index_type);
+
+  kernel_launch_attributes_type = make_node (RECORD_TYPE);
+  kernel_lattrs_dimnum_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+				       get_identifier ("ndim"),
+				       uint32_type_node);
+  DECL_CHAIN (kernel_lattrs_dimnum_decl) = NULL_TREE;
+
+  kernel_lattrs_grid_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+				     get_identifier ("grid_size"),
+				     kernel_dim_array_type);
+  DECL_CHAIN (kernel_lattrs_grid_decl) = kernel_lattrs_dimnum_decl;
+  kernel_lattrs_group_decl = build_decl (BUILTINS_LOCATION, FIELD_DECL,
+				     get_identifier ("group_size"),
+				     kernel_dim_array_type);
+  DECL_CHAIN (kernel_lattrs_group_decl) = kernel_lattrs_grid_decl;
+  finish_builtin_struct (kernel_launch_attributes_type,
+			 "__gomp_kernel_launch_attributes",
+			 kernel_lattrs_group_decl, NULL_TREE);
+}
+
+/* Insert before the current statement in GSI a store of VALUE to INDEX of
+   array (of type kernel_dim_array_type) FLD_DECL of RANGE_VAR.  VALUE must be
+   of type uint32_type_node.  */
+
+static void
+insert_store_range_dim (gimple_stmt_iterator *gsi, tree range_var,
+			tree fld_decl, int index, tree value)
+{
+  tree ref = build4 (ARRAY_REF, uint32_type_node,
+		     build3 (COMPONENT_REF, kernel_dim_array_type,
+			     range_var, fld_decl, NULL_TREE),
+		     build_int_cst (integer_type_node, index),
+		     NULL_TREE, NULL_TREE);
+  gsi_insert_before (gsi, gimple_build_assign (ref, value), GSI_SAME_STMT);
+}
+
+/* Return a tree representation of a pointer to a structure with grid and
+   work-group size information.  Statements filling that information will be
+   inserted before GSI, TGT_STMT is the target statement which has the
+   necessary information in it.  */
+
+static tree
+get_kernel_launch_attributes (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
+{
+  create_kernel_launch_attr_types ();
+  tree u32_one = build_one_cst (uint32_type_node);
+  tree lattrs = create_tmp_var (kernel_launch_attributes_type,
+				"__kernel_launch_attrs");
+
+  unsigned max_dim = 0;
+  for (tree clause = gimple_omp_target_clauses (tgt_stmt);
+       clause;
+       clause = OMP_CLAUSE_CHAIN (clause))
+    {
+      if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
+	continue;
+
+      unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause);
+      max_dim = MAX (dim, max_dim);
+
+      insert_store_range_dim (gsi, lattrs, kernel_lattrs_grid_decl, dim,
+			      OMP_CLAUSE_GRIDDIM_SIZE (clause));
+      insert_store_range_dim (gsi, lattrs, kernel_lattrs_group_decl, dim,
+			      OMP_CLAUSE_GRIDDIM_GROUP (clause));
+    }
+
+  tree dimref = build3 (COMPONENT_REF, uint32_type_node,
+			lattrs, kernel_lattrs_dimnum_decl, NULL_TREE);
+  /* At this moment we cannot gridify a loop with a collapse clause.  */
+  /* TODO: Adjust when we support bigger collapse.  */
+  gcc_assert (max_dim == 0);
+  gsi_insert_before (gsi, gimple_build_assign (dimref, u32_one), GSI_SAME_STMT);
+  TREE_ADDRESSABLE (lattrs) = 1;
+  return build_fold_addr_expr (lattrs);
+}
+
+/* Build target argument identifier from the DEVICE identifier, value
+   identifier ID and whether the element also has a SUBSEQUENT_PARAM.  */
+
+static tree
+get_target_argument_identifier_1 (int device, bool subseqent_param, int id)
+{
+  tree t = build_int_cst (integer_type_node, device);
+  if (subseqent_param)
+    t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		     build_int_cst (integer_type_node,
+				    GOMP_TARGET_ARG_SUBSEQUENT_PARAM));
+  t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		   build_int_cst (integer_type_node, id));
+  return t;
+}
+
+/* Like above but return it in type that can be directly stored as an element
+   of the argument array.  */
+
+static tree
+get_target_argument_identifier (int device, bool subseqent_param, int id)
+{
+  tree t = get_target_argument_identifier_1 (device, subseqent_param, id);
+  return fold_convert (ptr_type_node, t);
+}
+
+/* Return a target argument consisiting of DEVICE identifier, value identifier
+   ID, and the actual VALUE.  */
+
+static tree
+get_target_argument_value (gimple_stmt_iterator *gsi, int device, int id,
+			   tree value)
+{
+  tree t = fold_build2 (LSHIFT_EXPR, integer_type_node,
+			fold_convert (integer_type_node, value),
+			build_int_cst (unsigned_type_node,
+				       GOMP_TARGET_ARG_VALUE_SHIFT));
+  t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+		   get_target_argument_identifier_1 (device, false, id));
+  t = fold_convert (ptr_type_node, t);
+  return force_gimple_operand_gsi (gsi, t, true, NULL, true, GSI_SAME_STMT);
+}
+
+/* Create an array of arguments that is then passed to GOMP_target.   */
+
+static tree
+get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
+{
+  auto_vec <tree, 4> args;
+  tree clauses = gimple_omp_target_clauses (tgt_stmt);
+  tree t, c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS);
+  if (c)
+    t = OMP_CLAUSE_NUM_TEAMS_EXPR (c);
+  else
+    t = integer_minus_one_node;
+  t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
+				 GOMP_TARGET_ARG_NUM_TEAMS, t);
+  args.quick_push (t);
+
+  c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT);
+  if (c)
+    t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c);
+  else
+    t = integer_minus_one_node;
+  t = get_target_argument_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
+				 GOMP_TARGET_ARG_THREAD_LIMIT, t);
+  args.quick_push (t);
+
+  /* Add HSA-specific grid sizes, if available.  */
+  if (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+		       OMP_CLAUSE__GRIDDIM_))
+    {
+      t = get_target_argument_identifier (GOMP_DEVICE_HSA, true,
+					  GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES);
+      args.quick_push (t);
+      args.quick_push (get_kernel_launch_attributes (gsi, tgt_stmt));
+    }
+
+  /* Produce more, perhaps device specific, arguments here.  */
+
+  tree argarray = create_tmp_var (build_array_type_nelts (ptr_type_node,
+							  args.length () + 1),
+				  ".omp_target_args");
+  for (unsigned i = 0; i < args.length (); i++)
+    {
+      tree ref = build4 (ARRAY_REF, ptr_type_node, argarray,
+			 build_int_cst (integer_type_node, i),
+			 NULL_TREE, NULL_TREE);
+      gsi_insert_before (gsi, gimple_build_assign (ref, args[i]),
+			 GSI_SAME_STMT);
+    }
+  tree ref = build4 (ARRAY_REF, ptr_type_node, argarray,
+		     build_int_cst (integer_type_node, args.length ()),
+		     NULL_TREE, NULL_TREE);
+  gsi_insert_before (gsi, gimple_build_assign (ref, null_pointer_node),
+		     GSI_SAME_STMT);
+  TREE_ADDRESSABLE (argarray) = 1;
+  return build_fold_addr_expr (argarray);
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12887,30 +13172,7 @@  expand_omp_target (struct omp_region *region)
 	depend = build_int_cst (ptr_type_node, 0);
       args.quick_push (depend);
       if (start_ix == BUILT_IN_GOMP_TARGET)
-	{
-	  c = find_omp_clause (clauses, OMP_CLAUSE_NUM_TEAMS);
-	  if (c)
-	    {
-	      t = fold_convert (integer_type_node,
-				OMP_CLAUSE_NUM_TEAMS_EXPR (c));
-	      t = force_gimple_operand_gsi (&gsi, t, true, NULL,
-					    true, GSI_SAME_STMT);
-	    }
-	  else
-	    t = integer_minus_one_node;
-	  args.quick_push (t);
-	  c = find_omp_clause (clauses, OMP_CLAUSE_THREAD_LIMIT);
-	  if (c)
-	    {
-	      t = fold_convert (integer_type_node,
-				OMP_CLAUSE_THREAD_LIMIT_EXPR (c));
-	      t = force_gimple_operand_gsi (&gsi, t, true, NULL,
-					    true, GSI_SAME_STMT);
-	    }
-	  else
-	    t = integer_minus_one_node;
-	  args.quick_push (t);
-	}
+	args.quick_push (get_target_arguments (&gsi, entry_stmt));
       break;
     case BUILT_IN_GOACC_PARALLEL:
       {
@@ -13014,6 +13276,257 @@  expand_omp_target (struct omp_region *region)
     }
 }
 
+/* Expand KFOR loop as a GPGPU kernel, i.e. as a body only with iteration
+   variable derived from the thread number.  */
+
+static void
+expand_omp_for_kernel (struct omp_region *kfor)
+{
+  tree t, threadid;
+  tree type, itype;
+  gimple_stmt_iterator gsi;
+  tree n1, step;
+  struct omp_for_data fd;
+
+  gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
+  gcc_checking_assert (gimple_omp_for_kind (for_stmt)
+		       == GF_OMP_FOR_KIND_KERNEL_BODY);
+  basic_block body_bb = FALLTHRU_EDGE (kfor->entry)->dest;
+
+  gcc_assert (gimple_omp_for_collapse (for_stmt) == 1);
+  gcc_assert (kfor->cont);
+  extract_omp_for_data (for_stmt, &fd, NULL);
+
+  itype = type = TREE_TYPE (fd.loop.v);
+  if (POINTER_TYPE_P (type))
+    itype = signed_type_for (type);
+
+  gsi = gsi_start_bb (body_bb);
+
+  n1 = fd.loop.n1;
+  step = fd.loop.step;
+  n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
+				 true, NULL_TREE, true, GSI_SAME_STMT);
+  step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
+				   true, NULL_TREE, true, GSI_SAME_STMT);
+  threadid = build_call_expr (builtin_decl_explicit
+			      (BUILT_IN_OMP_GET_THREAD_NUM), 0);
+  threadid = fold_convert (itype, threadid);
+  threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
+				       true, GSI_SAME_STMT);
+
+  tree startvar = fd.loop.v;
+  t = fold_build2 (MULT_EXPR, itype, threadid, step);
+  if (POINTER_TYPE_P (type))
+    t = fold_build_pointer_plus (n1, t);
+  else
+    t = fold_build2 (PLUS_EXPR, type, t, n1);
+  t = fold_convert (type, t);
+  t = force_gimple_operand_gsi (&gsi, t,
+				DECL_P (startvar)
+				&& TREE_ADDRESSABLE (startvar),
+				NULL_TREE, true, GSI_SAME_STMT);
+  gassign *assign_stmt = gimple_build_assign (startvar, t);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+
+  /* Remove the omp for statement */
+  gsi = gsi_last_bb (kfor->entry);
+  gsi_remove (&gsi, true);
+
+  /* Remove the GIMPLE_OMP_CONTINUE statement.  */
+  gsi = gsi_last_bb (kfor->cont);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_CONTINUE);
+  gsi_remove (&gsi, true);
+
+  /* Replace the GIMPLE_OMP_RETURN with a real return.  */
+  gsi = gsi_last_bb (kfor->exit);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  gsi_remove (&gsi, true);
+
+  /* Fixup the much simpler CFG.  */
+  remove_edge (find_edge (kfor->cont, body_bb));
+
+  if (kfor->cont != body_bb)
+    set_immediate_dominator (CDI_DOMINATORS, kfor->cont, body_bb);
+  set_immediate_dominator (CDI_DOMINATORS, kfor->exit, kfor->cont);
+}
+
+/* Structure passed to remap_kernel_arg_accesses so that it can remap
+   argument_decls.  */
+
+struct arg_decl_map
+{
+  tree old_arg;
+  tree new_arg;
+};
+
+/* Invoked through walk_gimple_op, will remap all PARM_DECLs to the ones
+   pertaining to kernel function.  */
+
+static tree
+remap_kernel_arg_accesses (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  struct arg_decl_map *adm = (struct arg_decl_map *) wi->info;
+  tree t = *tp;
+
+  if (t == adm->old_arg)
+    *tp = adm->new_arg;
+  *walk_subtrees = !TYPE_P (t) && !DECL_P (t);
+  return NULL_TREE;
+}
+
+static void expand_omp (struct omp_region *region);
+
+/* If TARGET region contains a kernel body for loop, remove its region from the
+   TARGET and expand it in GPGPU kernel fashion. */
+
+static void
+expand_target_kernel_body (struct omp_region *target)
+{
+  if (!hsa_gen_requested_p ())
+    return;
+
+  gomp_target *tgt_stmt = as_a <gomp_target *> (last_stmt (target->entry));
+  struct omp_region **pp;
+
+  for (pp = &target->inner; *pp; pp = &(*pp)->next)
+    if ((*pp)->type == GIMPLE_OMP_GPUKERNEL)
+      break;
+
+  struct omp_region *gpukernel = *pp;
+
+  tree orig_child_fndecl = gimple_omp_target_child_fn (tgt_stmt);
+  if (!gpukernel)
+    {
+      /* HSA cannot handle OACC stuff.  */
+      if (gimple_omp_target_kind (tgt_stmt) != GF_OMP_TARGET_KIND_REGION)
+	return;
+      gcc_checking_assert (orig_child_fndecl);
+      gcc_assert (!find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+				    OMP_CLAUSE__GRIDDIM_));
+      cgraph_node *n = cgraph_node::get (orig_child_fndecl);
+
+      hsa_register_kernel (n);
+      return;
+    }
+
+  gcc_assert (find_omp_clause (gimple_omp_target_clauses (tgt_stmt),
+			       OMP_CLAUSE__GRIDDIM_));
+  tree inside_block = gimple_block (first_stmt (single_succ (gpukernel->entry)));
+  *pp = gpukernel->next;
+  for (pp = &gpukernel->inner; *pp; pp = &(*pp)->next)
+    if ((*pp)->type == GIMPLE_OMP_FOR)
+      break;
+
+  struct omp_region *kfor = *pp;
+  gcc_assert (kfor);
+  gcc_assert (gimple_omp_for_kind (last_stmt ((kfor)->entry))
+	      == GF_OMP_FOR_KIND_KERNEL_BODY);
+  *pp = kfor->next;
+  if (kfor->inner)
+    expand_omp (kfor->inner);
+  if (gpukernel->inner)
+    expand_omp (gpukernel->inner);
+
+  tree kern_fndecl = copy_node (orig_child_fndecl);
+  DECL_NAME (kern_fndecl) = clone_function_name (kern_fndecl, "kernel");
+  SET_DECL_ASSEMBLER_NAME (kern_fndecl, DECL_NAME (kern_fndecl));
+  tree tgtblock = gimple_block (tgt_stmt);
+  tree fniniblock = make_node (BLOCK);
+  BLOCK_ABSTRACT_ORIGIN (fniniblock) = tgtblock;
+  BLOCK_SOURCE_LOCATION (fniniblock) = BLOCK_SOURCE_LOCATION (tgtblock);
+  BLOCK_SOURCE_END_LOCATION (fniniblock) = BLOCK_SOURCE_END_LOCATION (tgtblock);
+  DECL_INITIAL (kern_fndecl) = fniniblock;
+  push_struct_function (kern_fndecl);
+  cfun->function_end_locus = gimple_location (tgt_stmt);
+  pop_cfun ();
+
+  tree old_parm_decl = DECL_ARGUMENTS (kern_fndecl);
+  gcc_assert (!DECL_CHAIN (old_parm_decl));
+  tree new_parm_decl = copy_node (DECL_ARGUMENTS (kern_fndecl));
+  DECL_CONTEXT (new_parm_decl) = kern_fndecl;
+  DECL_ARGUMENTS (kern_fndecl) = new_parm_decl;
+  struct function *kern_cfun = DECL_STRUCT_FUNCTION (kern_fndecl);
+  kern_cfun->curr_properties = cfun->curr_properties;
+
+  remove_edge (BRANCH_EDGE (kfor->entry));
+  expand_omp_for_kernel (kfor);
+
+  /* Remove the omp for statement */
+  gimple_stmt_iterator gsi = gsi_last_bb (gpukernel->entry);
+  gsi_remove (&gsi, true);
+  /* Replace the GIMPLE_OMP_RETURN at the end of the kernel region with a real
+     return.  */
+  gsi = gsi_last_bb (gpukernel->exit);
+  gcc_assert (!gsi_end_p (gsi)
+	      && gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+  gimple *ret_stmt = gimple_build_return (NULL);
+  gsi_insert_after (&gsi, ret_stmt, GSI_SAME_STMT);
+  gsi_remove (&gsi, true);
+
+  /* Statements in the first BB in the target construct have been produced by
+     target lowering and must be copied inside the GPUKERNEL, with the two
+     exceptions of the first OMP statement and the OMP_DATA assignment
+     statement.  */
+  gsi = gsi_start_bb (single_succ (gpukernel->entry));
+  tree data_arg = gimple_omp_target_data_arg (tgt_stmt);
+  tree sender = data_arg ? TREE_VEC_ELT (data_arg, 0) : NULL;
+  for (gimple_stmt_iterator tsi = gsi_start_bb (single_succ (target->entry));
+       !gsi_end_p (tsi); gsi_next (&tsi))
+    {
+      gimple *stmt = gsi_stmt (tsi);
+      if (is_gimple_omp (stmt))
+	break;
+      if (sender
+	  && is_gimple_assign (stmt)
+	  && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR
+	  && TREE_OPERAND (gimple_assign_rhs1 (stmt), 0) == sender)
+	continue;
+      gimple *copy = gimple_copy (stmt);
+      gsi_insert_before (&gsi, copy, GSI_SAME_STMT);
+      gimple_set_block (copy, fniniblock);
+    }
+
+  move_sese_region_to_fn (kern_cfun, single_succ (gpukernel->entry),
+			  gpukernel->exit, inside_block);
+
+  cgraph_node *kcn = cgraph_node::get_create (kern_fndecl);
+  kcn->mark_force_output ();
+  cgraph_node *orig_child = cgraph_node::get (orig_child_fndecl);
+
+  hsa_register_kernel (kcn, orig_child);
+
+  cgraph_node::add_new_function (kern_fndecl, true);
+  push_cfun (kern_cfun);
+  cgraph_edge::rebuild_edges ();
+
+  /* Re-map any mention of the PARM_DECL of the original function to the
+     PARM_DECL of the new one.
+
+     TODO: It would be great if lowering produced references into the GPU
+     kernel decl straight away and we did not have to do this.  */
+  struct arg_decl_map adm;
+  adm.old_arg = old_parm_decl;
+  adm.new_arg = new_parm_decl;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, kern_cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  struct walk_stmt_info wi;
+	  memset (&wi, 0, sizeof (wi));
+	  wi.info = &adm;
+	  walk_gimple_op (stmt, remap_kernel_arg_accesses, &wi);
+	}
+    }
+  pop_cfun ();
+
+  return;
+}
 
 /* Expand the parallel region tree rooted at REGION.  Expansion
    proceeds in depth-first order.  Innermost regions are expanded
@@ -13034,6 +13547,8 @@  expand_omp (struct omp_region *region)
        	 region.  */
       if (region->type == GIMPLE_OMP_PARALLEL)
 	determine_parallel_type (region);
+      else if (region->type == GIMPLE_OMP_TARGET)
+	expand_target_kernel_body (region);
 
       if (region->type == GIMPLE_OMP_FOR
 	  && gimple_omp_for_combined_p (last_stmt (region->entry)))
@@ -14411,11 +14926,13 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						ctx);
 	}
 
-  gimple_seq_add_stmt (&body, stmt);
+  if (!gimple_omp_for_kernel_phony (stmt))
+    gimple_seq_add_stmt (&body, stmt);
   gimple_seq_add_seq (&body, gimple_omp_body (stmt));
 
-  gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
-							 fd.loop.v));
+  if (!gimple_omp_for_kernel_phony (stmt))
+    gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
+							   fd.loop.v));
 
   /* After the loop, add exit clauses.  */
   lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
@@ -14427,10 +14944,13 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   body = maybe_catch_exception (body);
 
-  /* Region exit marker goes at the end of the loop body.  */
-  gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
-  maybe_add_implicit_barrier_cancel (ctx, &body);
-
+  if (!gimple_omp_for_kernel_phony (stmt))
+    {
+      /* Region exit marker goes at the end of the loop body.  */
+      gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
+      maybe_add_implicit_barrier_cancel (ctx, &body);
+    }
+
   /* Add OpenACC joining and reduction markers just after the loop.  */
   if (oacc_tail)
     gimple_seq_add_seq (&body, oacc_tail);
@@ -14872,6 +15392,14 @@  lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   par_olist = NULL;
   par_ilist = NULL;
   par_rlist = NULL;
+  bool phony_construct = is_a <gomp_parallel *> (stmt)
+    && gimple_omp_parallel_kernel_phony (as_a <gomp_parallel *> (stmt));
+  if (phony_construct && ctx->record_type)
+    {
+      gcc_checking_assert (!ctx->receiver_decl);
+      ctx->receiver_decl = create_tmp_var
+	(build_reference_type (ctx->record_type), ".omp_rec");
+    }
   lower_rec_input_clauses (clauses, &par_ilist, &par_olist, ctx, NULL);
   lower_omp (&par_body, ctx);
   if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
@@ -14930,13 +15458,19 @@  lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     gimple_seq_add_stmt (&new_body,
 			 gimple_build_omp_continue (integer_zero_node,
 						    integer_zero_node));
-  gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
-  gimple_omp_set_body (stmt, new_body);
+  if (!phony_construct)
+    {
+      gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
+      gimple_omp_set_body (stmt, new_body);
+    }
 
   bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
   gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
   gimple_bind_add_seq (bind, ilist);
-  gimple_bind_add_stmt (bind, stmt);
+  if (!phony_construct)
+    gimple_bind_add_stmt (bind, stmt);
+  else
+    gimple_bind_add_seq (bind, new_body);
   gimple_bind_add_seq (bind, olist);
 
   pop_gimplify_context (NULL);
@@ -16068,19 +16602,22 @@  lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			   &bind_body, &dlist, ctx, NULL);
   lower_omp (gimple_omp_body_ptr (teams_stmt), ctx);
   lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist, ctx);
-  gimple_seq_add_stmt (&bind_body, teams_stmt);
-
-  location_t loc = gimple_location (teams_stmt);
-  tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
-  gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
-  gimple_set_location (call, loc);
-  gimple_seq_add_stmt (&bind_body, call);
+  if (!gimple_omp_teams_kernel_phony (teams_stmt))
+    {
+      gimple_seq_add_stmt (&bind_body, teams_stmt);
+      location_t loc = gimple_location (teams_stmt);
+      tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
+      gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
+      gimple_set_location (call, loc);
+      gimple_seq_add_stmt (&bind_body, call);
+    }
 
   gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt));
   gimple_omp_set_body (teams_stmt, NULL);
   gimple_seq_add_seq (&bind_body, olist);
   gimple_seq_add_seq (&bind_body, dlist);
-  gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
+  if (!gimple_omp_teams_kernel_phony (teams_stmt))
+    gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
   gimple_bind_set_body (bind, bind_body);
 
   pop_gimplify_context (bind);
@@ -16091,6 +16628,17 @@  lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     TREE_USED (block) = 1;
 }
 
+/* Expand code within an artificial GPUKERNELS OMP construct.  */
+
+static void
+lower_omp_gpukernel (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  lower_omp (gimple_omp_body_ptr (stmt), ctx);
+  gimple_seq_add_stmt (gimple_omp_body_ptr (stmt),
+		       gimple_build_omp_return (false));
+}
+
 
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
@@ -16302,6 +16850,11 @@  lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       gcc_assert (ctx);
       lower_omp_teams (gsi_p, ctx);
       break;
+    case GIMPLE_OMP_GPUKERNEL:
+      ctx = maybe_lookup_ctx (stmt);
+      gcc_assert (ctx);
+      lower_omp_gpukernel (gsi_p, ctx);
+      break;
     case GIMPLE_CALL:
       tree fndecl;
       call_stmt = as_a <gcall *> (stmt);
@@ -16391,7 +16944,654 @@  lower_omp (gimple_seq *body, omp_context *ctx)
       fold_stmt (&gsi);
   input_location = saved_location;
 }
-
+
+/* Returen true if STMT is an assignment of a register-type into a local
+   VAR_DECL.  */
+
+static bool
+reg_assignment_to_local_var_p (gimple *stmt)
+{
+  gassign *assign = dyn_cast <gassign *> (stmt);
+  if (!assign)
+    return false;
+  tree lhs = gimple_assign_lhs (assign);
+  if (TREE_CODE (lhs) != VAR_DECL
+      || !is_gimple_reg_type (TREE_TYPE (lhs))
+      || is_global_var (lhs))
+    return false;
+  return true;
+}
+
+/* Return true if all statements in SEQ are assignments to local register-type
+   variables.  */
+
+static bool
+seq_only_contains_local_assignments (gimple_seq seq)
+{
+  if (!seq)
+    return true;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    if (!reg_assignment_to_local_var_p (gsi_stmt (gsi)))
+      return false;
+  return true;
+}
+
+
+/* Scan statements in SEQ and call itself recursively on any bind.  If during
+   whole search only assignments to register-type local variables and one
+   single OMP statement is encountered, return true, otherwise return false.
+   8RET is where we store any OMP statement encountered.  TARGET_LOC and NAME
+   are used for dumping a note about a failure.  */
+
+static bool
+find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
+				     const char *name, gimple **ret)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (reg_assignment_to_local_var_p (stmt))
+	continue;
+      if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  if (!find_single_omp_among_assignments_1 (gimple_bind_body (bind),
+						    target_loc, name, ret))
+	      return false;
+	}
+      else if (is_gimple_omp (stmt))
+	{
+	  if (*ret)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, target_loc,
+				 "Will not turn target construct into a simple "
+				 "GPGPU kernel because %s construct contains "
+				 "multiple OpenMP constructs\n", name);
+	      return false;
+	    }
+	  *ret = stmt;
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, target_loc,
+			     "Will not turn target construct into a simple "
+			     "GPGPU kernel because %s construct contains "
+			     "a complex statement\n", name);
+	  return false;
+	}
+    }
+  return true;
+}
+
+/* Scan statements in SEQ and make sure that it and any binds in it contain
+   only assignments to local register-type variables and one OMP construct.  If
+   so, return that construct, otherwise return NULL.  If dumping is enabled and
+   function fails, use TARGET_LOC and NAME to dump a note with the reason for
+   failure.  */
+
+static gimple *
+find_single_omp_among_assignments (gimple_seq seq, location_t target_loc,
+				   const char *name)
+{
+  if (!seq)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, target_loc,
+			 "Will not turn target construct into a simple "
+			 "GPGPU kernel because %s construct has empty "
+			 "body\n",
+			 name);
+      return NULL;
+    }
+
+  gimple *ret = NULL;
+  if (find_single_omp_among_assignments_1 (seq, target_loc, name, &ret))
+    {
+      if (!ret && dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, target_loc,
+			 "Will not turn target construct into a simple "
+			 "GPGPU kernel because %s construct does not contain"
+			 "any other OpenMP construct\n", name);
+      return ret;
+    }
+  else
+    return NULL;
+}
+
+/* Walker function looking for statements there is no point gridifying (and for
+   noreturn function calls which we cannot do).  Return non-NULL if such a
+   function is found.  */
+
+static tree
+find_ungridifiable_statement (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			      struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+  gimple *stmt = gsi_stmt (*gsi);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_CALL:
+      if (gimple_call_noreturn_p (as_a <gcall *> (stmt)))
+	{
+	  *handled_ops_p = true;
+	  return error_mark_node;
+	}
+      break;
+
+    /* We may reduce the following list if we find a way to implement the
+       clauses, but now there is no point trying further.  */
+    case GIMPLE_OMP_CRITICAL:
+    case GIMPLE_OMP_TASKGROUP:
+    case GIMPLE_OMP_TASK:
+    case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_SECTIONS:
+    case GIMPLE_OMP_SECTIONS_SWITCH:
+    case GIMPLE_OMP_TARGET:
+    case GIMPLE_OMP_ORDERED:
+      *handled_ops_p = true;
+      return error_mark_node;
+
+    default:
+      break;
+    }
+  return NULL;
+}
+
+
+/* If TARGET follows a pattern that can be turned into a gridified GPGPU
+   kernel, return true, otherwise return false.  In the case of success, also
+   fill in GROUP_SIZE_P with the requested group size or NULL if there is
+   none.  */
+
+static bool
+target_follows_gridifiable_pattern (gomp_target *target, tree *group_size_p)
+{
+  if (gimple_omp_target_kind (target) != GF_OMP_TARGET_KIND_REGION)
+    return false;
+
+  location_t tloc = gimple_location (target);
+  gimple *stmt = find_single_omp_among_assignments (gimple_omp_body (target),
+						    tloc, "target");
+  if (!stmt)
+    return false;
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  tree group_size = NULL;
+  if (!teams)
+    {
+      dump_printf_loc (MSG_NOTE, tloc,
+		       "Will not turn target construct into a simple "
+		       "GPGPU kernel because it does not have a sole teams "
+		       "construct in it.\n");
+      return false;
+    }
+
+  tree clauses = gimple_omp_teams_clauses (teams);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_TEAMS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because we cannot "
+			     "handle num_teams clause of teams "
+			     "construct\n ");
+	  return false;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	case OMP_CLAUSE_THREAD_LIMIT:
+	  group_size = OMP_CLAUSE_OPERAND (clauses, 0);
+	  break;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = find_single_omp_among_assignments (gimple_omp_body (teams), tloc,
+					    "teams");
+  if (!stmt)
+    return false;
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  if (!dist)
+    {
+      dump_printf_loc (MSG_NOTE, tloc,
+		       "Will not turn target construct into a simple "
+		       "GPGPU kernel because the teams construct  does not have "
+		       "a sole distribute construct in it.\n");
+      return false;
+    }
+
+  gcc_assert (gimple_omp_for_kind (dist) == GF_OMP_FOR_KIND_DISTRIBUTE);
+  if (!gimple_omp_for_combined_p (dist))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because we cannot handle a standalone "
+			 "distribute construct\n ");
+      return false;
+    }
+  if (dist->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the distribute construct contains "
+			 "collapse clause\n");
+      return false;
+    }
+  struct omp_for_data fd;
+  extract_omp_for_data (dist, &fd, NULL);
+  if (fd.chunk_size)
+    {
+      if (group_size && !operand_equal_p (group_size, fd.chunk_size, 0))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because the teams "
+			     "thread limit is different from distribute "
+			     "schedule chunk\n");
+	  return false;
+	}
+      group_size = fd.chunk_size;
+    }
+  stmt = find_single_omp_among_assignments (gimple_omp_body (dist), tloc,
+					    "distribute");
+  gomp_parallel *par;
+  if (!stmt || !(par = dyn_cast <gomp_parallel *> (stmt)))
+    return false;
+
+  clauses = gimple_omp_parallel_clauses (par);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_NUM_THREADS:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified"
+			     "GPGPU kernel because there is a num_threads "
+			     "clause of the parallel construct\n");
+	  return false;
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  stmt = find_single_omp_among_assignments (gimple_omp_body (par), tloc,
+					    "parallel");
+  gomp_for *gfor;
+  if (!stmt || !(gfor = dyn_cast <gomp_for *> (stmt)))
+    return false;
+
+  if (gimple_omp_for_kind (gfor) != GF_OMP_FOR_KIND_FOR)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop is not a simple for "
+			 "loop\n");
+      return false;
+    }
+  if (gfor->collapse > 1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop contains collapse "
+			 "clause\n");
+      return false;
+    }
+
+  if (!seq_only_contains_local_assignments (gimple_omp_for_pre_body (gfor)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, tloc,
+			 "Will not turn target construct into a gridified GPGPU "
+			 "kernel because the inner loop pre_body contains"
+			 "a complex instruction\n");
+      return false;
+    }
+
+  clauses = gimple_omp_for_clauses (gfor);
+  while (clauses)
+    {
+      switch (OMP_CLAUSE_CODE (clauses))
+	{
+	case OMP_CLAUSE_SCHEDULE:
+	  if (OMP_CLAUSE_SCHEDULE_KIND (clauses) != OMP_CLAUSE_SCHEDULE_AUTO)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, tloc,
+				 "Will not turn target construct into a "
+				 "gridified GPGPU kernel because the inner "
+				 "loop has a non-automatic scheduling clause\n");
+	      return false;
+	    }
+	  break;
+
+	case OMP_CLAUSE_REDUCTION:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a "
+			     "gridified GPGPU kernel because a reduction "
+			     "clause is present\n ");
+	  return false;
+
+	default:
+	  break;
+	}
+      clauses = OMP_CLAUSE_CHAIN (clauses);
+    }
+
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  if (gimple *bad = walk_gimple_seq (gimple_omp_body (gfor),
+				     find_ungridifiable_statement,
+				     NULL, &wi))
+    {
+      if (dump_enabled_p ())
+	{
+	  if (is_gimple_call (bad))
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified "
+			     " GPGPU kernel because the inner loop contains "
+			     "call to a noreturn function\n");
+	  else
+	    dump_printf_loc (MSG_NOTE, tloc,
+			     "Will not turn target construct into a gridified "
+			     "GPGPU kernel because the inner loop contains "
+			     "statement %s which cannot be transformed\n",
+			     gimple_code_name[(int) gimple_code (bad)]);
+	}
+      return false;
+    }
+
+  *group_size_p = group_size;
+  return true;
+}
+
+/* Operand walker, used to remap pre-body declarations according to a hash map
+   provided in DATA.  */
+
+static tree
+remap_prebody_decls (tree *tp, int *walk_subtrees, void *data)
+{
+  tree t = *tp;
+
+  if (DECL_P (t) || TYPE_P (t))
+    *walk_subtrees = 0;
+  else
+    *walk_subtrees = 1;
+
+  if (TREE_CODE (t) == VAR_DECL)
+    {
+      struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+      hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+      tree *repl = declmap->get (t);
+      if (repl)
+	*tp = *repl;
+    }
+  return NULL_TREE;
+}
+
+/* Copy leading register-type assignments to local variables in SRC to just
+   before DST, Creating temporaries, adjusting mapping of operands in WI and
+   remapping operands as necessary.  Add any new temporaries to TGT_BIND.
+   Return the first statement that does not conform to
+   reg_assignment_to_local_var_p or NULL.  */
+
+static gimple *
+copy_leading_local_assignments (gimple_seq src, gimple_stmt_iterator *dst,
+				gbind *tgt_bind, struct walk_stmt_info *wi)
+{
+  hash_map<tree, tree> *declmap = (hash_map<tree, tree> *) wi->info;
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (src); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (gbind *bind = dyn_cast <gbind *> (stmt))
+	{
+	  gimple *r = copy_leading_local_assignments (gimple_bind_body (bind),
+						      dst, tgt_bind, wi);
+	  if (r)
+	    return r;
+	  else
+	    continue;
+	}
+      if (!reg_assignment_to_local_var_p (stmt))
+	return stmt;
+      tree lhs = gimple_assign_lhs (as_a <gassign *> (stmt));
+      tree repl = copy_var_decl (lhs, create_tmp_var_name (NULL),
+				 TREE_TYPE (lhs));
+      DECL_CONTEXT (repl) = current_function_decl;
+      gimple_bind_append_vars (tgt_bind, repl);
+
+      declmap->put (lhs, repl);
+      gassign *copy = as_a <gassign *> (gimple_copy (stmt));
+      walk_gimple_op (copy, remap_prebody_decls, wi);
+      gsi_insert_before (dst, copy, GSI_SAME_STMT);
+    }
+  return NULL;
+}
+
+/* Given freshly copied top level kernel SEQ, identify the individual OMP
+   components, mark them as part of kernel and return the inner loop, and copy
+   assignment leading to them just before DST, remapping them using WI and
+   adding new temporaries to TGT_BIND.  */
+
+static gomp_for *
+process_kernel_body_copy (gimple_seq seq, gimple_stmt_iterator *dst,
+			  gbind *tgt_bind, struct walk_stmt_info *wi)
+{
+  gimple *stmt = copy_leading_local_assignments (seq, dst, tgt_bind, wi);
+  gomp_teams *teams = dyn_cast <gomp_teams *> (stmt);
+  gcc_assert (teams);
+  gimple_omp_teams_set_kernel_phony (teams, true);
+  stmt = copy_leading_local_assignments (gimple_omp_body (teams), dst,
+					 tgt_bind, wi);
+  gcc_checking_assert (stmt);
+  gomp_for *dist = dyn_cast <gomp_for *> (stmt);
+  gcc_assert (dist);
+  gimple_seq prebody = gimple_omp_for_pre_body (dist);
+  if (prebody)
+    copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+  gimple_omp_for_set_kernel_phony (dist, true);
+  stmt = copy_leading_local_assignments (gimple_omp_body (dist), dst,
+					 tgt_bind, wi);
+  gcc_checking_assert (stmt);
+
+  gomp_parallel *parallel = as_a <gomp_parallel *> (stmt);
+  gimple_omp_parallel_set_kernel_phony (parallel, true);
+  stmt = copy_leading_local_assignments (gimple_omp_body (parallel), dst,
+					 tgt_bind, wi);
+  gomp_for *inner_loop = as_a <gomp_for *> (stmt);
+  gimple_omp_for_set_kind (inner_loop, GF_OMP_FOR_KIND_KERNEL_BODY);
+  prebody = gimple_omp_for_pre_body (inner_loop);
+  if (prebody)
+    copy_leading_local_assignments (prebody, dst, tgt_bind, wi);
+
+  return inner_loop;
+}
+
+/* If TARGET points to a GOMP_TARGET which follows a gridifiable pattern,
+   create a GPU kernel for it.  GSI must point to the same statement, TGT_BIND
+   is the bind into which temporaries inserted before TARGET should be
+   added.  */
+
+static void
+attempt_target_gridification (gomp_target *target, gimple_stmt_iterator *gsi,
+			      gbind *tgt_bind)
+{
+  tree group_size;
+  if (!target || !target_follows_gridifiable_pattern (target, &group_size))
+    return;
+
+  location_t loc = gimple_location (target);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+		     "Target construct will be turned into a gridified GPGPU "
+		     "kernel\n");
+
+  /* Copy target body to a GPUKERNEL construct:  */
+  gimple_seq kernel_seq = copy_gimple_seq_and_replace_locals
+    (gimple_omp_body (target));
+
+  hash_map<tree, tree> *declmap = new hash_map<tree, tree>;
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (struct walk_stmt_info));
+  wi.info = declmap;
+
+  /* Copy assignments in between OMP statements before target, mark OMP
+     statements within copy appropriatly.  */
+  gomp_for *inner_loop = process_kernel_body_copy (kernel_seq, gsi, tgt_bind,
+						   &wi);
+
+  gbind *old_bind = as_a <gbind *> (gimple_seq_first (gimple_omp_body (target)));
+  gbind *new_bind = as_a <gbind *> (gimple_seq_first (kernel_seq));
+  tree new_block = gimple_bind_block (new_bind);
+  tree enc_block = BLOCK_SUPERCONTEXT (gimple_bind_block (old_bind));
+  BLOCK_CHAIN (new_block) = BLOCK_SUBBLOCKS (enc_block);
+  BLOCK_SUBBLOCKS (enc_block) = new_block;
+  BLOCK_SUPERCONTEXT (new_block) = enc_block;
+  gimple *gpukernel = gimple_build_omp_gpukernel (kernel_seq);
+  gimple_seq_add_stmt
+    (gimple_bind_body_ptr (as_a <gbind *> (gimple_omp_body (target))),
+     gpukernel);
+
+  walk_tree (&group_size, remap_prebody_decls, &wi, NULL);
+  push_gimplify_context ();
+  size_t collapse = gimple_omp_for_collapse (inner_loop);
+  for (size_t i = 0; i < collapse; i++)
+    {
+      tree itype, type = TREE_TYPE (gimple_omp_for_index (inner_loop, i));
+      if (POINTER_TYPE_P (type))
+	itype = signed_type_for (type);
+      else
+	itype = type;
+
+      enum tree_code cond_code = gimple_omp_for_cond (inner_loop, i);
+      tree n1 = unshare_expr (gimple_omp_for_initial (inner_loop, i));
+      walk_tree (&n1, remap_prebody_decls, &wi, NULL);
+      tree n2 = unshare_expr (gimple_omp_for_final (inner_loop, i));
+      walk_tree (&n2, remap_prebody_decls, &wi, NULL);
+      adjust_for_condition (loc, &cond_code, &n2);
+      tree step;
+      step = get_omp_for_step_from_incr (loc,
+					 gimple_omp_for_incr (inner_loop, i));
+      gimple_seq tmpseq = NULL;
+      n1 = fold_convert (itype, n1);
+      n2 = fold_convert (itype, n2);
+      tree t = build_int_cst (itype, (cond_code == LT_EXPR ? -1 : 1));
+      t = fold_build2 (PLUS_EXPR, itype, step, t);
+      t = fold_build2 (PLUS_EXPR, itype, t, n2);
+      t = fold_build2 (MINUS_EXPR, itype, t, n1);
+      if (TYPE_UNSIGNED (itype) && cond_code == GT_EXPR)
+	t = fold_build2 (TRUNC_DIV_EXPR, itype,
+			 fold_build1 (NEGATE_EXPR, itype, t),
+			 fold_build1 (NEGATE_EXPR, itype, step));
+      else
+	t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
+      tree gs = fold_convert (uint32_type_node, t);
+      gimplify_expr (&gs, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+      if (!gimple_seq_empty_p (tmpseq))
+	gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
+
+      tree ws;
+      if (i == 0 && group_size)
+	{
+	  ws = fold_convert (uint32_type_node, group_size);
+	  tmpseq = NULL;
+	  gimplify_expr (&ws, &tmpseq, NULL, is_gimple_val, fb_rvalue);
+	  if (!gimple_seq_empty_p (tmpseq))
+	    gsi_insert_seq_before (gsi, tmpseq, GSI_SAME_STMT);
+	}
+      else
+	ws = build_zero_cst (uint32_type_node);
+
+      tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_);
+      OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i);
+      OMP_CLAUSE_GRIDDIM_SIZE (c) = gs;
+      OMP_CLAUSE_GRIDDIM_GROUP (c) = ws;
+      OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
+      gimple_omp_target_set_clauses (target, c);
+    }
+  pop_gimplify_context (tgt_bind);
+  delete declmap;
+  return;
+}
+
+/* Walker function doing all the work for create_target_kernels. */
+
+static tree
+create_target_gpukernel_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+			      struct walk_stmt_info *incoming)
+{
+  *handled_ops_p = false;
+
+  gimple *stmt = gsi_stmt (*gsi);
+  gomp_target *target = dyn_cast <gomp_target *> (stmt);
+  if (target)
+    {
+      gbind *tgt_bind = (gbind *) incoming->info;
+      gcc_checking_assert (tgt_bind);
+      attempt_target_gridification (target, gsi, tgt_bind);
+      return NULL_TREE;
+    }
+  gbind *bind = dyn_cast <gbind *> (stmt);
+  if (bind)
+    {
+      *handled_ops_p = true;
+      struct walk_stmt_info wi;
+      memset (&wi, 0, sizeof (wi));
+      wi.info = bind;
+      walk_gimple_seq_mod (gimple_bind_body_ptr (bind),
+			   create_target_gpukernel_stmt, NULL, &wi);
+    }
+  return NULL_TREE;
+}
+
+/* Prepare all target constructs in BODY_P for GPU kernel generation, if they
+   follow a gridifiable pattern.  All such targets will have their bodies
+   duplicated, with the new copy being put into a gpukernel.  All
+   kernel-related construct within the gpukernel will be marked with phony
+   flags or kernel kinds.  Moreover, some re-structuring is often needed, such
+   as copying pre-bodies before the target construct so that kernel grid sizes
+   can be computed.  */
+
+static void
+create_target_gpukernels (gimple_seq *body_p)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq_mod (body_p, create_target_gpukernel_stmt, NULL, &wi);
+}
+
+
 /* Main entry point.  */
 
 static unsigned int
@@ -16411,6 +17611,10 @@  execute_lower_omp (void)
 				 delete_omp_context);
 
   body = gimple_body (current_function_decl);
+
+  if (hsa_gen_requested_p ())
+    create_target_gpukernels (&body);
+
   scan_omp (&body, NULL);
   gcc_assert (taskreg_nesting_level == 0);
   FOR_EACH_VEC_ELT (taskreg_contexts, i, ctx)
@@ -16748,6 +17952,7 @@  make_gimple_omp_edges (basic_block bb, struct omp_region **region,
     case GIMPLE_OMP_TASKGROUP:
     case GIMPLE_OMP_CRITICAL:
     case GIMPLE_OMP_SECTION:
+    case GIMPLE_OMP_GPUKERNEL:
       cur_region = new_omp_region (bb, code, cur_region);
       fallthru = true;
       break;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 9cc64d9..858f220 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -460,7 +460,11 @@  enum omp_clause_code {
   OMP_CLAUSE_VECTOR_LENGTH,
 
   /* OpenACC clause: tile ( size-expr-list ).  */
-  OMP_CLAUSE_TILE
+  OMP_CLAUSE_TILE,
+
+  /* OpenMP internal-only clause to specify grid dimensions of a gridified
+     kernel.  */
+  OMP_CLAUSE__GRIDDIM_
 };
 
 #undef DEFTREESTRUCT
@@ -1377,6 +1381,9 @@  struct GTY(()) tree_omp_clause {
     enum tree_code                 reduction_code;
     enum omp_clause_linear_kind    linear_kind;
     enum tree_code                 if_modifier;
+    /* The dimension a OMP_CLAUSE__GRIDDIM_ clause of a gridified target
+       construct describes.  */
+    unsigned int		   dimension;
   } GTY ((skip)) subcode;
 
   /* The gimplification of OMP_CLAUSE_REDUCTION_{INIT,MERGE} for omp-low's
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index caec760..ad5cfdb 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -945,6 +945,18 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE__GRIDDIM_:
+      pp_string (pp, "_griddim_(");
+      pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause));
+      pp_colon (pp);
+      dump_generic_node (pp, OMP_CLAUSE_GRIDDIM_SIZE (clause), spc, flags,
+			 false);
+      pp_comma (pp);
+      dump_generic_node (pp, OMP_CLAUSE_GRIDDIM_GROUP (clause), spc, flags,
+			 false);
+      pp_right_paren (pp);
+      break;
+
     default:
       /* Should never happen.  */
       dump_generic_node (pp, clause, spc, flags, false);
diff --git a/gcc/tree.c b/gcc/tree.c
index 2387deb..3a74982 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -329,6 +329,7 @@  unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_NUM_WORKERS  */
   1, /* OMP_CLAUSE_VECTOR_LENGTH  */
   1, /* OMP_CLAUSE_TILE  */
+  2, /* OMP_CLAUSE__GRIDDIM_  */
 };
 
 const char * const omp_clause_code_name[] =
@@ -400,7 +401,8 @@  const char * const omp_clause_code_name[] =
   "num_gangs",
   "num_workers",
   "vector_length",
-  "tile"
+  "tile",
+  "griddim"
 };
 
 
@@ -11603,6 +11605,7 @@  walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
       switch (OMP_CLAUSE_CODE (*tp))
 	{
 	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE__GRIDDIM_:
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
 	  /* FALLTHRU */
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 0c1602e..7b9bcb3 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1636,6 +1636,17 @@  extern void protected_set_expr_location (tree, location_t);
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
 
+#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
+   ->omp_clause.subcode.dimension)
+#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
+   ->omp_clause.subcode.dimension = (DIMENSION))
+#define OMP_CLAUSE_GRIDDIM_SIZE(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
+#define OMP_CLAUSE_GRIDDIM_GROUP(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 1)
+
 /* SSA_NAME accessors.  */
 
 /* Returns the IDENTIFIER_NODE giving the SSA name a name or NULL_TREE