diff mbox series

[OpenACC,v2] Non-contiguous array support for OpenACC data clauses

Message ID cc59377b-94ff-4bb2-41cd-cebc29290b1c@mentor.com
State New
Headers show
Series [OpenACC,v2] Non-contiguous array support for OpenACC data clauses | expand

Commit Message

Chung-Lin Tang Nov. 5, 2019, 2:35 p.m. UTC
Hi Thomas,
after your last round of review, I realized that the bulk of the compiler omp-low work was
simply a case of dumb over-engineering in the wrong direction :P
(although it did painstakingly function correctly)

Instead of making code changes for bias adjustment in the child function code in the omp-low
phase, this should simply be done by the libgomp runtime map preparation (similar to how the
current single-dimension array biases are handled)

So this updated patch (1) discards away a large part of the last omp-low.c patch, and
(2) adjusts the libgomp/target.c patch to do the per-dimensional adjustments.

Also, the bit of C/C++ front-end logic you mentioned that was questionable was removed.
After looking closely, it wasn't needed; the relaxing of pointers for OpenACC was enough.
Still some aspects of handling arrays inside the multi-dimension type still need some
more work, e.g. see the catching in the omp-low.c part. A compiler dg-scan testcase
was also added.

However, the issue of ACC_DEVICE_TYPE=host not working (and hence "!openacc_host_selected"
in the testcases) actually is a bit more sophisticated than I thought:

The reason it doesn't work for the host device, is because we use the map pointer (i.e.
a hostaddrs[] entry when passed into libgomp) to point to an array descriptor to pass
the whole array information, and rely on code inside gomp_map_vars_* to setup things,
and place the final on-device address of the non-contig. array into devaddrs[], therefore
only using a single map entry (something I thought was quite clever)

However, this broke down on the host and host-fallback devices, simply because, there
we do NOT do any gomp_map_vars processing; our current code in GOACC_parallel_keyed
simply skips it and passes the offload function the original hostaddrs[] contents.
Lacking the processing to transform the descriptor pointer into a proper array ref,
things of course segfault.

So I think we have three options for this (which may have some interactions with say,
the "proper" host-side parallelization we eventually need to implement for OpenACC 2.7)

(1) The simplest solution: implement a processing which searches and reverts such
non-contiguous array map entries in GOACC_parallel_keyed.
(note: I have implemented this in the current attached "v2" patch)

(2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
i.e. still do the proper gomp_map_vars processing for all cases.

(3) Modify the non-contiguous array map conventions: a possible solution is to use
two maps placed together: one for the array pointer, another for the array descriptor (as
opposed to the current style of using only one map) This needs more further elaborate
compiler/runtime work.

The first two options will pessimize host-mode performance somewhat. The third I have
some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do.

Thanks,
Chung-Lin

	gcc/c/
	* c-typeck.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
	parameter, adjust recursive call site, add cases for allowing
	pointer based multi-dimensional arrays for OpenACC.
	(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
	handle non-contiguous case to create dynamic array map.

	gcc/cp/
	* semantics.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
	parameter, adjust recursive call site, add cases for allowing
	pointer based multi-dimensional arrays for OpenACC.
	(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
	handle non-contiguous case to create dynamic array map.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map kinds,
	make sure bias in each dimension are put into firstprivate variables.

	* omp-low.c (append_field_to_record_type): New function.
	(create_noncontig_array_descr_type): Likewise.
	(create_noncontig_array_descr_init_code): Likewise.
	(scan_sharing_clauses): For non-contiguous array map kinds, check for
	supported dimension structure, and install non-contiguous array variable into
	current omp_context.
	(reorder_noncontig_array_clauses): New function.
	(scan_omp_target): Call reorder_noncontig_array_clauses to place
	non-contiguous array map clauses at beginning of clause sequence.
	(lower_omp_target): Add handling for non-contiguous array map kinds.

	* tree-pretty-print.c (dump_omp_clauses): Add cases for printing
	GOMP_MAP_NONCONTIG_ARRAY map kinds.

	include/
	* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define.
	(enum gomp_map_kind): Add GOMP_MAP_NONCONTIG_ARRAY,
	GOMP_MAP_NONCONTIG_ARRAY_TO, GOMP_MAP_NONCONTIG_ARRAY_FROM,
	GOMP_MAP_NONCONTIG_ARRAY_TOFROM, GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO,
	GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM, GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM,
	GOMP_MAP_NONCONTIG_ARRAY_ALLOC, GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC,
	GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT.
	(GOMP_MAP_NONCONTIG_ARRAY_P): Define.

	gcc/testsuite/
	* c-c++-common/goacc/noncontig_array-1.c: New test.

	libgomp/
	* target.c (struct gomp_ncarray_dim): New struct declaration.
	(struct gomp_ncarray_descr_type): Likewise.
	(struct ncarray_info): Likewise.
	(gomp_noncontig_array_count_rows): New function.
	(gomp_noncontig_array_compute_info): Likewise.
	(gomp_noncontig_array_fill_rows_1): Likewise.
	(gomp_noncontig_array_fill_rows): Likewise.
	(gomp_noncontig_array_create_ptrblock): Likewise.
	(gomp_map_vars_internal): Add code to handle non-contiguous array map
	kinds.
	* oacc-parallel.c (revert_noncontig_array_map_pointers): New function.
	(GOACC_parallel_keyed): Call revert_noncontig_array_map_pointers
	when executing for host-modes.

	* testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h: Support
	header for new tests.

Comments

Thomas Schwinge Nov. 7, 2019, 12:48 a.m. UTC | #1
Hi Chung-Lin!

On 2019-11-05T22:35:43+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> Hi Thomas,
> after your last round of review, I realized that the bulk of the compiler omp-low work was
> simply a case of dumb over-engineering in the wrong direction :P
> (although it did painstakingly function correctly)

Hehe -- that happens.  ;-)

> However, the issue of ACC_DEVICE_TYPE=host not working (and hence "!openacc_host_selected"
> in the testcases)

Actually not just for that, but also generally for any shared-memory
models that may come into existance at some point, such as CUDA Unified
Memory, for example?

> actually is a bit more sophisticated than I thought:
>
> The reason it doesn't work for the host device, is because we use the map pointer (i.e.
> a hostaddrs[] entry when passed into libgomp) to point to an array descriptor to pass
> the whole array information, and rely on code inside gomp_map_vars_* to setup things,
> and place the final on-device address of the non-contig. array into devaddrs[], therefore
> only using a single map entry (something I thought was quite clever)
>
> However, this broke down on the host and host-fallback devices, simply because, there
> we do NOT do any gomp_map_vars processing; our current code in GOACC_parallel_keyed
> simply skips it and passes the offload function the original hostaddrs[] contents.
> Lacking the processing to transform the descriptor pointer into a proper array ref,
> things of course segfault.
>
> So I think we have three options for this (which may have some interactions with say,
> the "proper" host-side parallelization we eventually need to implement for OpenACC 2.7)
>
> (1) The simplest solution: implement a processing which searches and reverts such
> non-contiguous array map entries in GOACC_parallel_keyed.
> (note: I have implemented this in the current attached "v2" patch)
>
> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
> i.e. still do the proper gomp_map_vars processing for all cases.
>
> (3) Modify the non-contiguous array map conventions: a possible solution is to use
> two maps placed together: one for the array pointer, another for the array descriptor (as
> opposed to the current style of using only one map) This needs more further elaborate
> compiler/runtime work.
>
> The first two options will pessimize host-mode performance somewhat. The third I have
> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do.

I'll have to think about it some more, but variant (1) doesn't seem so
bad actually, for a first take.  While it's not nice to pessimize in
particular directives with 'if (false)' clauses, at least it does work,
the run-time overhead should not be too bad (also compared to variant
(2), I suppose), and variant (3) can still be implemented later.


A few comments/questions:

Please reference PR76739 in your submission/ChangeLog updates.

> --- gcc/c/c-typeck.c	(revision 277827)
> +++ gcc/c/c-typeck.c	(working copy)
> @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t
>  static tree
>  handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
>  			     bool &maybe_zero_len, unsigned int &first_non_one,
> -			     enum c_omp_region_type ort)
> +			     bool &non_contiguous, enum c_omp_region_type ort)
>  {
>    tree ret, low_bound, length, type;
>    if (TREE_CODE (t) != TREE_LIST)

> @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec<t
>  	  return error_mark_node;
>  	}
>        /* If there is a pointer type anywhere but in the very first
> -	 array-section-subscript, the array section can't be contiguous.  */
> +	 array-section-subscript, the array section can't be contiguous.
> +	 Note that OpenACC does accept these kinds of non-contiguous pointer
> +	 based arrays.  */

That comment update should instead be moved to the function comment
before the 'handle_omp_array_sections_1' function definition, and should
then also explain the new 'non_contiguous' out variable.  The latter
needs to be done anyway, and the former (no comment here) is easy enough
to tell from the code:

>        if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
>  	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
>  	{
> -	  error_at (OMP_CLAUSE_LOCATION (c),
> -		    "array section is not contiguous in %qs clause",
> -		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> -	  return error_mark_node;
> +	  if (ort == C_ORT_ACC)
> +	    non_contiguous = true;
> +	  else
> +	    {
> +	      error_at (OMP_CLAUSE_LOCATION (c),
> +			"array section is not contiguous in %qs clause",
> +			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> +	      return error_mark_node;
> +	    }
>  	}

> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>        unsigned int num = types.length (), i;
>        tree t, side_effects = NULL_TREE, size = NULL_TREE;
>        tree condition = NULL_TREE;
> +      tree ncarray_dims = NULL_TREE;
>  
>        if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
>  	maybe_zero_len = true;
> @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>  	    length = fold_convert (sizetype, length);
>  	  if (low_bound == NULL_TREE)
>  	    low_bound = integer_zero_node;
> +
> +	  if (non_contiguous)
> +	    {
> +	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
> +	      continue;
> +	    }
> +
>  	  if (!maybe_zero_len && i > first_non_one)
>  	    {
>  	      if (integer_nonzerop (low_bound))

I'm not at all familiar with this array sections code, will trust your
understanding that we don't need any of the processing that you're
skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length
expressions, and other things.

> @@ -13357,6 +13374,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>  		size = size_binop (MULT_EXPR, size, l);
>  	    }
>  	}
> +      if (non_contiguous)
> +	{
> +	  int kind = OMP_CLAUSE_MAP_KIND (c);
> +	  OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);
> +	  OMP_CLAUSE_DECL (c) = t;
> +	  OMP_CLAUSE_SIZE (c) = ncarray_dims;
> +	  return false;
> +	}
>        if (side_effects)
>  	size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
>        if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION

Likewise for all the code being skipped here ('return false').

> --- gcc/cp/semantics.c	(revision 277827)
> +++ gcc/cp/semantics.c	(working copy)

Analoguous to the C front end.

> --- gcc/gimplify.c	(revision 277827)
> +++ gcc/gimplify.c	(working copy)
> @@ -8622,9 +8622,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se
>  	  if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
>  	    OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
>  				  : TYPE_SIZE_UNIT (TREE_TYPE (decl));
> +	  if (OMP_CLAUSE_SIZE (c)
> +	      && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST
> +	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))

Per the code above, 'OMP_CLAUSE_SIZE (c)' will always be set to
something, so no point in checking that here?

Isn't the 'GOMP_MAP_NONCONTIG_ARRAY_P' check alone sufficient already?
And then maybe 'assert (TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST' in
here:

>  	    {
> +	      /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST
> +		 of the individual array dimensions, which gimplify_expr doesn't
> +		 handle, so skip the call to gimplify_expr here.  */
> +	    }

> -	  if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
> -			     NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
> +	  else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
> +				  NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
> +	    {
>  	      remove = true;
>  	      break;
>  	    }

Again, that means we're skipping other code here; don't understand yet.

Your ChangeLog update says:

> 	* gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map kinds,
> 	make sure bias in each dimension are put into firstprivate variables.

I'm not yet seeing how that's happening.

Ah, I see that ChangeLog comment is probably just a remnant from the
previous version.

> --- gcc/omp-low.c	(revision 277827)
> +++ gcc/omp-low.c	(working copy)

Have not yet reviewed in detail.

> @@ -1367,6 +1498,38 @@ scan_sharing_clauses (tree clauses, omp_context *c
>  	      install_var_local (decl, ctx);
>  	      break;
>  	    }
> +
> +	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
> +	    {
> +	      tree array_decl = OMP_CLAUSE_DECL (c);
> +	      tree array_type = TREE_TYPE (array_decl);
> +	      bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE
> +			     ? true : false);
> +
> +	      /* Checking code to ensure we only have arrays at top dimension.
> +		 This limitation might be lifted in the future.  */

Please reference PR76739 here, and in PR76739 also add a comment about
this limitation.  (As well as any other limitations, of course.)

> +	      if (TREE_CODE (array_type) == REFERENCE_TYPE)
> +		array_type = TREE_TYPE (array_type);
> +	      tree t = array_type, prev_t = NULL_TREE;
> +	      while (t)
> +		{
> +		  if (TREE_CODE (t) == ARRAY_TYPE && prev_t)
> +		    {
> +		      error_at (gimple_location (ctx->stmt), "array types are"
> +				" only allowed at outermost dimension of"
> +				" non-contiguous array");
> +		      break;
> +		    }
> +		  prev_t = t;
> +		  t = TREE_TYPE (t);
> +		}
> +
> +	      install_var_field (array_decl, by_ref, 3, ctx);
> +	      install_var_local (array_decl, ctx);
> +	      break;
> +	    }
> +

Assuming this intentionally means to skip ('break' just above) the
following 'if (DECL_P (decl))' and its 'else' branch, then maybe remove
the 'break' just above, and instead do 'else if (DECL_P (decl))'?

>  	  if (DECL_P (decl))
>  	    {
>  	      if (DECL_SIZE (decl)

> @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o
>        gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
>      }
> 
> +  /* If is OpenACC construct, put non-contiguous array clauses (if any)
> +     in front of clause chain. The runtime can then test the first to see
> +     if the additional map processing for them is required.  */
> +  if (is_gimple_omp_oacc (stmt))
> +    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));

Should that be deemed unsuitable for any reason, then add a new
'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.

> --- include/gomp-constants.h	(revision 277827)
> +++ include/gomp-constants.h	(working copy)
> @@ -40,6 +40,7 @@
>  #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)
>  #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)
>  #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)
> +#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)
>  #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \
>  					 | GOMP_MAP_FLAG_SPECIAL_0)
>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
> @@ -127,6 +128,26 @@ enum gomp_map_kind
>      /* Decrement usage count and deallocate if zero.  */
>      GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2
>  					 | GOMP_MAP_DELETE),
> +    /* Mapping kinds for non-contiguous arrays.  */
> +    GOMP_MAP_NONCONTIG_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),
> +    GOMP_MAP_NONCONTIG_ARRAY_TO =	(GOMP_MAP_NONCONTIG_ARRAY
> +					 | GOMP_MAP_TO),
> +    GOMP_MAP_NONCONTIG_ARRAY_FROM =	(GOMP_MAP_NONCONTIG_ARRAY
> +					 | GOMP_MAP_FROM),
> +    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY
> +					 | GOMP_MAP_TOFROM),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =	(GOMP_MAP_NONCONTIG_ARRAY_TO
> +					 | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =	(GOMP_MAP_NONCONTIG_ARRAY_FROM
> +						 | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY_TOFROM
> +						 | GOMP_MAP_FLAG_FORCE),
> +    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =		(GOMP_MAP_NONCONTIG_ARRAY
> +						 | GOMP_MAP_ALLOC),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =	(GOMP_MAP_NONCONTIG_ARRAY
> +						 | GOMP_MAP_FORCE_ALLOC),
> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT =	(GOMP_MAP_NONCONTIG_ARRAY
> +						 | GOMP_MAP_FORCE_PRESENT),

Just an idea: instead of this long list, would it maybe be better (if
feasible at all?) to have a single "lead-in" mapping
'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the
following (normal) mappings belong to that "non-contiguous array mode".
(Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any
'GOMP_MAP_POINTER's following it.)  Might that make some things simpler,
or even more complicated (more internal state to keep)?

> --- libgomp/oacc-parallel.c	(revision 277827)
> +++ libgomp/oacc-parallel.c	(working copy)

> +static inline void
> +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,
> +				     unsigned short *kinds)
> +{
> +  for (int i = 0; i < mapnum; i++)
> +    {
> +      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))
> +	hostaddrs[i] = *((void **)hostaddrs[i]);

Can we be (or, do we make) sure that 'hostaddrs' will never be in
read-only memory?

And, it's permissible to alter 'hostaddrs'?

Ah, other code (including 'libgomp/target.c') is doing such things, too,
so it must be fine.

> +      else
> +	/* We assume all non-contiguous array map entries are placed at the
> +	   start; first other map kind means we can exit.  */
> +	break;
> +    }
> +}

> --- libgomp/target.c	(revision 277827)
> +++ libgomp/target.c	(working copy)

Have not yet reviewed in detail.

> @@ -533,9 +679,37 @@ gomp_map_vars_internal (struct gomp_device_descr *
>    const int typemask = short_mapkind ? 0xff : 0x7;
>    struct splay_tree_s *mem_map = &devicep->mem_map;
>    struct splay_tree_key_s cur_node;
> -  struct target_mem_desc *tgt
> -    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
> -  tgt->list_count = mapnum;
> +  struct target_mem_desc *tgt;
> +
> +  bool process_noncontig_arrays = false;
> +  size_t nca_data_row_num = 0, row_start = 0;
> +  size_t nca_info_num = 0, nca_index;
> +  struct ncarray_info *nca_info = NULL;
> +  struct target_var_desc *row_desc;
> +  uintptr_t target_row_addr;
> +  void **host_data_rows = NULL, **target_data_rows = NULL;
> +  void *row;
> +
> +  if (mapnum > 0)
> +    {

Also add such a comment here: "We assume all non-contiguous array map
entries are placed at the start".

> +      int kind = get_kind (short_mapkind, kinds, 0);
> +      process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask);
> +    }
> +
> +  if (process_noncontig_arrays)
> +    for (i = 0; i < mapnum; i++)
> +      {
> +	int kind = get_kind (short_mapkind, kinds, i);
> +	if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +	  {
> +	    nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]);
> +	    nca_info_num += 1;
> +	  }
> +      }

Or, actually, can the 'if (mapnum > 0)' above and the 'for' loop here
again be simplified to just one loop with 'break', like you've done in
'libgomp/oacc-parallel.c:revert_noncontig_array_map_pointers'?

> +
> +  tgt = gomp_malloc (sizeof (*tgt)
> +		     + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num));
> +  tgt->list_count = mapnum + nca_data_row_num;
>    tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
>    tgt->device_descr = devicep;
>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;

> @@ -735,6 +931,56 @@ gomp_map_vars_internal (struct gomp_device_descr *
>  	}
>      }
>  
> +  /* For non-contiguous arrays. Each data row is one target item, separated
> +     from the normal map clause items, hence we order them after mapnum.  */
> +  if (process_noncontig_arrays)
> +    for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
> +      {
> +	int kind = get_kind (short_mapkind, kinds, i);
> +	if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +	  continue;

Can instead 'break' again?

> @@ -1044,8 +1299,112 @@ gomp_map_vars_internal (struct gomp_device_descr *
>  		array++;
>  	      }
>  	  }
> +
> +      /* Processing of non-contiguous array rows.  */
> +      if (process_noncontig_arrays)
> +	{
> +	  for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
> +	    {
> +	      int kind = get_kind (short_mapkind, kinds, i);
> +	      if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
> +		continue;

Likewise?


It's now gotten too late; more review to follow later.


Grüße
 Thomas
Chung-Lin Tang Nov. 12, 2019, 12:35 p.m. UTC | #2
Hi Thomas,
thanks for the first review. I'm still working on another revision,
but wanted to respond to some of the issues you raised first:

On 2019/11/7 8:48 AM, Thomas Schwinge wrote:
>> (1) The simplest solution: implement a processing which searches and reverts such
>> non-contiguous array map entries in GOACC_parallel_keyed.
>> (note: I have implemented this in the current attached "v2" patch)
>>
>> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
>> i.e. still do the proper gomp_map_vars processing for all cases.
>>
>> (3) Modify the non-contiguous array map conventions: a possible solution is to use
>> two maps placed together: one for the array pointer, another for the array descriptor (as
>> opposed to the current style of using only one map) This needs more further elaborate
>> compiler/runtime work.
>>
>> The first two options will pessimize host-mode performance somewhat. The third I have
>> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do.
> I'll have to think about it some more, but variant (1) doesn't seem so
> bad actually, for a first take.  While it's not nice to pessimize in
> particular directives with 'if (false)' clauses, at least it does work,
> the run-time overhead should not be too bad (also compared to variant
> (2), I suppose), and variant (3) can still be implemented later.

The issue is that (1),(2) vs (3) have different binary interfaces, so a decision has to be
made first, lest we again have compatibility issues later.

Also, (1) vs (2) also may be somewhat different do to the memory copying effects of
gomp_map_vars()  (possible semantic difference versus the usual shared memory expectations?)

I'm currently working on another way of implementing something similar to (3),
but using the variadic arguments of GOACC_parallel_keyed instead of maps, WDYT?

>> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>>         unsigned int num = types.length (), i;
>>         tree t, side_effects = NULL_TREE, size = NULL_TREE;
>>         tree condition = NULL_TREE;
>> +      tree ncarray_dims = NULL_TREE;
>>   
>>         if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
>>   	maybe_zero_len = true;
>> @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>>   	    length = fold_convert (sizetype, length);
>>   	  if (low_bound == NULL_TREE)
>>   	    low_bound = integer_zero_node;
>> +
>> +	  if (non_contiguous)
>> +	    {
>> +	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
>> +	      continue;
>> +	    }
>> +
>>   	  if (!maybe_zero_len && i > first_non_one)
>>   	    {
>>   	      if (integer_nonzerop (low_bound))
> I'm not at all familiar with this array sections code, will trust your
> understanding that we don't need any of the processing that you're
> skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length
> expressions, and other things.

I will re-check on this.

Ditto for the other minor issues you raised.

>>   	  if (DECL_P (decl))
>>   	    {
>>   	      if (DECL_SIZE (decl)
>> @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o
>>         gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
>>       }
>>
>> +  /* If is OpenACC construct, put non-contiguous array clauses (if any)
>> +     in front of clause chain. The runtime can then test the first to see
>> +     if the additional map processing for them is required.  */
>> +  if (is_gimple_omp_oacc (stmt))
>> +    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));
> Should that be deemed unsuitable for any reason, then add a new
> 'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.

I'm considering using that convention unconditionally, not sure if it's faster
though, since that means we can't do the 'early breaking' you mentioned when
scanning through maps looking for GOMP_MAP_NONCONTIG_ARRAY_P.

>> --- include/gomp-constants.h	(revision 277827)
>> +++ include/gomp-constants.h	(working copy)
>> @@ -40,6 +40,7 @@
>>   #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)
>>   #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)
>>   #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)
>> +#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)
>>   #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \
>>   					 | GOMP_MAP_FLAG_SPECIAL_0)
>>   /* Flag to force a specific behavior (or else, trigger a run-time error).  */
>> @@ -127,6 +128,26 @@ enum gomp_map_kind
>>       /* Decrement usage count and deallocate if zero.  */
>>       GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2
>>   					 | GOMP_MAP_DELETE),
>> +    /* Mapping kinds for non-contiguous arrays.  */
>> +    GOMP_MAP_NONCONTIG_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),
>> +    GOMP_MAP_NONCONTIG_ARRAY_TO =	(GOMP_MAP_NONCONTIG_ARRAY
>> +					 | GOMP_MAP_TO),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FROM =	(GOMP_MAP_NONCONTIG_ARRAY
>> +					 | GOMP_MAP_FROM),
>> +    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY
>> +					 | GOMP_MAP_TOFROM),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =	(GOMP_MAP_NONCONTIG_ARRAY_TO
>> +					 | GOMP_MAP_FLAG_FORCE),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =	(GOMP_MAP_NONCONTIG_ARRAY_FROM
>> +						 | GOMP_MAP_FLAG_FORCE),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY_TOFROM
>> +						 | GOMP_MAP_FLAG_FORCE),
>> +    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =		(GOMP_MAP_NONCONTIG_ARRAY
>> +						 | GOMP_MAP_ALLOC),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =	(GOMP_MAP_NONCONTIG_ARRAY
>> +						 | GOMP_MAP_FORCE_ALLOC),
>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT =	(GOMP_MAP_NONCONTIG_ARRAY
>> +						 | GOMP_MAP_FORCE_PRESENT),
> Just an idea: instead of this long list, would it maybe be better (if
> feasible at all?) to have a single "lead-in" mapping
> 'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the
> following (normal) mappings belong to that "non-contiguous array mode".
> (Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any
> 'GOMP_MAP_POINTER's following it.)  Might that make some things simpler,
> or even more complicated (more internal state to keep)?

I prefer not, wrangling with multiple-map sequences in the complex gomp_map_vars code
is proving to be a tedious task; my now given-up version of method (3) above tried using
two map kinds (an 'array' and an 'array descriptor'). Haven't yet got it to work properly.

Also, a non-contiguous array is just a data clause specification feature, and should support
all modes (copy/in/out,present,alloc,etc.) Using a whole GOMP_MAP_FLAG_SPECIAL_3 bit in
combination with other flags independently should be warranted.


>> --- libgomp/oacc-parallel.c	(revision 277827)
>> +++ libgomp/oacc-parallel.c	(working copy)
>> +static inline void
>> +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,
>> +				     unsigned short *kinds)
>> +{
>> +  for (int i = 0; i < mapnum; i++)
>> +    {
>> +      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))
>> +	hostaddrs[i] = *((void **)hostaddrs[i]);
> Can we be (or, do we make) sure that 'hostaddrs' will never be in
> read-only memory?
> 
> And, it's permissible to alter 'hostaddrs'?
> 
> Ah, other code (including 'libgomp/target.c') is doing such things, too,
> so it must be fine.

The hostaddrs[] array is the 'receiver' record built on stack by omp-low,
so it should always be safe to modify, I think.

Thanks again for the review!
Chung-Lin
diff mbox series

Patch

Index: gcc/c/c-typeck.c
===================================================================
--- gcc/c/c-typeck.c	(revision 277827)
+++ gcc/c/c-typeck.c	(working copy)
@@ -12868,7 +12868,7 @@  c_finish_omp_cancellation_point (location_t loc, t
 static tree
 handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
 			     bool &maybe_zero_len, unsigned int &first_non_one,
-			     enum c_omp_region_type ort)
+			     bool &non_contiguous, enum c_omp_region_type ort)
 {
   tree ret, low_bound, length, type;
   if (TREE_CODE (t) != TREE_LIST)
@@ -12953,7 +12953,8 @@  handle_omp_array_sections_1 (tree c, tree t, vec<t
     }
 
   ret = handle_omp_array_sections_1 (c, TREE_CHAIN (t), types,
-				     maybe_zero_len, first_non_one, ort);
+				     maybe_zero_len, first_non_one,
+				     non_contiguous, ort);
   if (ret == error_mark_node || ret == NULL_TREE)
     return ret;
 
@@ -13160,14 +13161,21 @@  handle_omp_array_sections_1 (tree c, tree t, vec<t
 	  return error_mark_node;
 	}
       /* If there is a pointer type anywhere but in the very first
-	 array-section-subscript, the array section can't be contiguous.  */
+	 array-section-subscript, the array section can't be contiguous.
+	 Note that OpenACC does accept these kinds of non-contiguous pointer
+	 based arrays.  */
       if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
 	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
 	{
-	  error_at (OMP_CLAUSE_LOCATION (c),
-		    "array section is not contiguous in %qs clause",
-		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
-	  return error_mark_node;
+	  if (ort == C_ORT_ACC)
+	    non_contiguous = true;
+	  else
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"array section is not contiguous in %qs clause",
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      return error_mark_node;
+	    }
 	}
     }
   else
@@ -13196,6 +13204,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 {
   bool maybe_zero_len = false;
   unsigned int first_non_one = 0;
+  bool non_contiguous = false;
   auto_vec<tree, 10> types;
   tree *tp = &OMP_CLAUSE_DECL (c);
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
@@ -13205,7 +13214,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
     tp = &TREE_VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
 					    maybe_zero_len, first_non_one,
-					    ort);
+					    non_contiguous, ort);
   if (first == error_mark_node)
     return true;
   if (first == NULL_TREE)
@@ -13238,6 +13247,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
       unsigned int num = types.length (), i;
       tree t, side_effects = NULL_TREE, size = NULL_TREE;
       tree condition = NULL_TREE;
+      tree ncarray_dims = NULL_TREE;
 
       if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
 	maybe_zero_len = true;
@@ -13261,6 +13271,13 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 	    length = fold_convert (sizetype, length);
 	  if (low_bound == NULL_TREE)
 	    low_bound = integer_zero_node;
+
+	  if (non_contiguous)
+	    {
+	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
+	      continue;
+	    }
+
 	  if (!maybe_zero_len && i > first_non_one)
 	    {
 	      if (integer_nonzerop (low_bound))
@@ -13357,6 +13374,14 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 		size = size_binop (MULT_EXPR, size, l);
 	    }
 	}
+      if (non_contiguous)
+	{
+	  int kind = OMP_CLAUSE_MAP_KIND (c);
+	  OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);
+	  OMP_CLAUSE_DECL (c) = t;
+	  OMP_CLAUSE_SIZE (c) = ncarray_dims;
+	  return false;
+	}
       if (side_effects)
 	size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
       if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 277827)
+++ gcc/cp/semantics.c	(working copy)
@@ -4732,7 +4732,7 @@  omp_privatize_field (tree t, bool shared)
 static tree
 handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
 			     bool &maybe_zero_len, unsigned int &first_non_one,
-			     enum c_omp_region_type ort)
+			     bool &non_contiguous, enum c_omp_region_type ort)
 {
   tree ret, low_bound, length, type;
   if (TREE_CODE (t) != TREE_LIST)
@@ -4817,7 +4817,8 @@  handle_omp_array_sections_1 (tree c, tree t, vec<t
       && TREE_CODE (TREE_CHAIN (t)) == FIELD_DECL)
     TREE_CHAIN (t) = omp_privatize_field (TREE_CHAIN (t), false);
   ret = handle_omp_array_sections_1 (c, TREE_CHAIN (t), types,
-				     maybe_zero_len, first_non_one, ort);
+				     maybe_zero_len, first_non_one,
+				     non_contiguous, ort);
   if (ret == error_mark_node || ret == NULL_TREE)
     return ret;
 
@@ -5036,14 +5037,21 @@  handle_omp_array_sections_1 (tree c, tree t, vec<t
 	  return error_mark_node;
 	}
       /* If there is a pointer type anywhere but in the very first
-	 array-section-subscript, the array section can't be contiguous.  */
+	 array-section-subscript, the array section can't be contiguous.
+	 Note that OpenACC does accept these kinds of non-contiguous pointer
+	 based arrays.  */
       if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
 	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
 	{
-	  error_at (OMP_CLAUSE_LOCATION (c),
-		    "array section is not contiguous in %qs clause",
-		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
-	  return error_mark_node;
+	  if (ort == C_ORT_ACC)
+	    non_contiguous = true;
+	  else
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"array section is not contiguous in %qs clause",
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      return error_mark_node;
+	    }
 	}
     }
   else
@@ -5083,6 +5091,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 {
   bool maybe_zero_len = false;
   unsigned int first_non_one = 0;
+  bool non_contiguous = false;
   auto_vec<tree, 10> types;
   tree *tp = &OMP_CLAUSE_DECL (c);
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
@@ -5092,7 +5101,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
     tp = &TREE_VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
 					    maybe_zero_len, first_non_one,
-					    ort);
+					    non_contiguous, ort);
   if (first == error_mark_node)
     return true;
   if (first == NULL_TREE)
@@ -5126,6 +5135,7 @@  handle_omp_array_sections (tree c, enum c_omp_regi
       unsigned int num = types.length (), i;
       tree t, side_effects = NULL_TREE, size = NULL_TREE;
       tree condition = NULL_TREE;
+      tree ncarray_dims = NULL_TREE;
 
       if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
 	maybe_zero_len = true;
@@ -5151,6 +5161,13 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 	    length = fold_convert (sizetype, length);
 	  if (low_bound == NULL_TREE)
 	    low_bound = integer_zero_node;
+
+	  if (non_contiguous)
+	    {
+	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
+	      continue;
+	    }
+
 	  if (!maybe_zero_len && i > first_non_one)
 	    {
 	      if (integer_nonzerop (low_bound))
@@ -5242,6 +5259,14 @@  handle_omp_array_sections (tree c, enum c_omp_regi
 	}
       if (!processing_template_decl)
 	{
+	  if (non_contiguous)
+	    {
+	      int kind = OMP_CLAUSE_MAP_KIND (c);
+	      OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);
+	      OMP_CLAUSE_DECL (c) = t;
+	      OMP_CLAUSE_SIZE (c) = ncarray_dims;
+	      return false;
+	    }
 	  if (side_effects)
 	    size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 277827)
+++ gcc/gimplify.c	(working copy)
@@ -8622,9 +8622,17 @@  gimplify_scan_omp_clauses (tree *list_p, gimple_se
 	  if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
 	    OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
 				  : TYPE_SIZE_UNIT (TREE_TYPE (decl));
-	  if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
-			     NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
+	  if (OMP_CLAUSE_SIZE (c)
+	      && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST
+	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
 	    {
+	      /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST
+		 of the individual array dimensions, which gimplify_expr doesn't
+		 handle, so skip the call to gimplify_expr here.  */
+	    }
+	  else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
+				  NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
+	    {
 	      remove = true;
 	      break;
 	    }
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 277827)
+++ gcc/omp-low.c	(working copy)
@@ -894,6 +894,137 @@  omp_copy_decl (tree var, copy_body_data *cb)
   return error_mark_node;
 }
 
+/* Helper function for create_noncontig_array_descr_type(), to append a new field
+   to a record type.  */
+
+static void
+append_field_to_record_type (tree record_type, tree fld_ident, tree fld_type)
+{
+  tree *p, fld = build_decl (UNKNOWN_LOCATION, FIELD_DECL, fld_ident, fld_type);
+  DECL_CONTEXT (fld) = record_type;
+
+  for (p = &TYPE_FIELDS (record_type); *p; p = &DECL_CHAIN (*p))
+    ;
+  *p = fld;
+}
+
+/* Create type for non-contiguous array descriptor. Returns created type, and
+   returns the number of dimensions in *DIM_NUM.  */
+
+static tree
+create_noncontig_array_descr_type (tree decl, tree dims, int *dim_num)
+{
+  int n = 0;
+  tree array_descr_type, name, x;
+  gcc_assert (TREE_CODE (dims) == TREE_LIST);
+
+  array_descr_type = lang_hooks.types.make_type (RECORD_TYPE);
+  name = create_tmp_var_name (".omp_noncontig_array_descr_type");
+  name = build_decl (UNKNOWN_LOCATION, TYPE_DECL, name, array_descr_type);
+  DECL_ARTIFICIAL (name) = 1;
+  DECL_NAMELESS (name) = 1;
+  TYPE_NAME (array_descr_type) = name;
+  TYPE_ARTIFICIAL (array_descr_type) = 1;
+
+  /* Main starting pointer/array.  */
+  tree main_var_type = TREE_TYPE (decl);
+  if (TREE_CODE (main_var_type) == REFERENCE_TYPE)
+    main_var_type = TREE_TYPE (main_var_type);
+  append_field_to_record_type (array_descr_type, DECL_NAME (decl),
+			       (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+				? main_var_type
+				: build_pointer_type (main_var_type)));
+  /* Number of dimensions.  */
+  append_field_to_record_type (array_descr_type, get_identifier ("__dim_num"),
+			       sizetype);
+
+  for (x = dims; x; x = TREE_CHAIN (x), n++)
+    {
+      char *fldname;
+      /* One for the start index.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "__dim_base", n);
+      append_field_to_record_type (array_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for the length.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "__dim_length", n);
+      append_field_to_record_type (array_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for the element size.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "__dim_elem_size", n);
+      append_field_to_record_type (array_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for is_array flag.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "__dim_is_array", n);
+      append_field_to_record_type (array_descr_type, get_identifier (fldname),
+				   sizetype);
+    }
+
+  layout_type (array_descr_type);
+  *dim_num = n;
+  return array_descr_type;
+}
+
+/* Generate code sequence for initializing non-contiguous array descriptor.  */
+
+static void
+create_noncontig_array_descr_init_code (tree array_descr, tree array_var,
+					tree dimensions, int dim_num,
+					gimple_seq *ilist)
+{
+  tree fld, fldref;
+  tree array_descr_type = TREE_TYPE (array_descr);
+  tree dim_type = TREE_TYPE (array_var);
+
+  fld = TYPE_FIELDS (array_descr_type);
+  fldref = omp_build_component_ref (array_descr, fld);
+  gimplify_assign (fldref, (TREE_CODE (dim_type) == ARRAY_TYPE
+			    ? build_fold_addr_expr (array_var) : array_var),
+		   ilist);
+
+  if (TREE_CODE (dim_type) == REFERENCE_TYPE)
+    dim_type = TREE_TYPE (dim_type);
+
+  fld = TREE_CHAIN (fld);
+  fldref = omp_build_component_ref (array_descr, fld);
+  gimplify_assign (fldref, build_int_cst (sizetype, dim_num), ilist);
+
+  while (dimensions)
+    {
+      tree dim_base = fold_convert (sizetype, TREE_PURPOSE (dimensions));
+      tree dim_length = fold_convert (sizetype, TREE_VALUE (dimensions));
+      tree dim_elem_size = TYPE_SIZE_UNIT (TREE_TYPE (dim_type));
+      tree dim_is_array = (TREE_CODE (dim_type) == ARRAY_TYPE
+			   ? integer_one_node : integer_zero_node);
+      /* Set base.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (array_descr, fld);
+      dim_base = fold_build2 (MULT_EXPR, sizetype, dim_base, dim_elem_size);
+      gimplify_assign (fldref, dim_base, ilist);
+
+      /* Set length.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (array_descr, fld);
+      dim_length = fold_build2 (MULT_EXPR, sizetype, dim_length, dim_elem_size);
+      gimplify_assign (fldref, dim_length, ilist);
+
+      /* Set elem_size.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (array_descr, fld);
+      dim_elem_size = fold_convert (sizetype, dim_elem_size);
+      gimplify_assign (fldref, dim_elem_size, ilist);
+
+      /* Set is_array flag.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (array_descr, fld);
+      dim_is_array = fold_convert (sizetype, dim_is_array);
+      gimplify_assign (fldref, dim_is_array, ilist);
+
+      dimensions = TREE_CHAIN (dimensions);
+      dim_type = TREE_TYPE (dim_type);
+    }
+  gcc_assert (TREE_CHAIN (fld) == NULL_TREE);
+}
+
 /* Create a new context, with OUTER_CTX being the surrounding context.  */
 
 static omp_context *
@@ -1367,6 +1498,38 @@  scan_sharing_clauses (tree clauses, omp_context *c
 	      install_var_local (decl, ctx);
 	      break;
 	    }
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
+	    {
+	      tree array_decl = OMP_CLAUSE_DECL (c);
+	      tree array_type = TREE_TYPE (array_decl);
+	      bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE
+			     ? true : false);
+
+	      /* Checking code to ensure we only have arrays at top dimension.
+		 This limitation might be lifted in the future.  */
+	      if (TREE_CODE (array_type) == REFERENCE_TYPE)
+		array_type = TREE_TYPE (array_type);
+	      tree t = array_type, prev_t = NULL_TREE;
+	      while (t)
+		{
+		  if (TREE_CODE (t) == ARRAY_TYPE && prev_t)
+		    {
+		      error_at (gimple_location (ctx->stmt), "array types are"
+				" only allowed at outermost dimension of"
+				" non-contiguous array");
+		      break;
+		    }
+		  prev_t = t;
+		  t = TREE_TYPE (t);
+		}
+
+	      install_var_field (array_decl, by_ref, 3, ctx);
+	      install_var_local (array_decl, ctx);
+	      break;
+	    }
+
 	  if (DECL_P (decl))
 	    {
 	      if (DECL_SIZE (decl)
@@ -2597,6 +2760,50 @@  scan_omp_single (gomp_single *stmt, omp_context *o
     layout_type (ctx->record_type);
 }
 
+/* Reorder clauses so that non-contiguous array map clauses are placed at the very
+   front of the chain.  */
+
+static void
+reorder_noncontig_array_clauses (tree *clauses_ptr)
+{
+  tree c, clauses = *clauses_ptr;
+  tree prev_clause = NULL_TREE, next_clause;
+  tree array_clauses = NULL_TREE, array_clauses_tail = NULL_TREE;
+
+  for (c = clauses; c; c = next_clause)
+    {
+      next_clause = OMP_CLAUSE_CHAIN (c);
+
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	  && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
+	{
+	  /* Unchain c from clauses.  */
+	  if (c == clauses)
+	    clauses = next_clause;
+
+	  /* Link on to array_clauses.  */
+	  if (array_clauses_tail)
+	    OMP_CLAUSE_CHAIN (array_clauses_tail) = c;
+	  else
+	    array_clauses = c;
+	  array_clauses_tail = c;
+
+	  if (prev_clause)
+	    OMP_CLAUSE_CHAIN (prev_clause) = next_clause;
+	  continue;
+	}
+
+      prev_clause = c;
+    }  
+
+  /* Place non-contiguous array clauses at the start of the clause list.  */
+  if (array_clauses)
+    {
+      OMP_CLAUSE_CHAIN (array_clauses_tail) = clauses;
+      *clauses_ptr = array_clauses;
+    }
+}
+
 /* Scan a GIMPLE_OMP_TARGET.  */
 
 static void
@@ -2605,7 +2812,6 @@  scan_omp_target (gomp_target *stmt, omp_context *o
   omp_context *ctx;
   tree name;
   bool offloaded = is_gimple_omp_offloaded (stmt);
-  tree clauses = gimple_omp_target_clauses (stmt);
 
   ctx = new_omp_context (stmt, outer_ctx);
   ctx->field_map = splay_tree_new (splay_tree_compare_pointers, 0, 0);
@@ -2624,6 +2830,14 @@  scan_omp_target (gomp_target *stmt, omp_context *o
       gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
     }
 
+  /* If is OpenACC construct, put non-contiguous array clauses (if any)
+     in front of clause chain. The runtime can then test the first to see
+     if the additional map processing for them is required.  */
+  if (is_gimple_omp_oacc (stmt))
+    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));
+
+  tree clauses = gimple_omp_target_clauses (stmt);
+  
   scan_sharing_clauses (clauses, ctx);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
 
@@ -11335,6 +11549,15 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp
 	  case GOMP_MAP_FORCE_PRESENT:
 	  case GOMP_MAP_FORCE_DEVICEPTR:
 	  case GOMP_MAP_DEVICE_RESIDENT:
+	  case GOMP_MAP_NONCONTIG_ARRAY_TO:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FROM:
+	  case GOMP_MAP_NONCONTIG_ARRAY_TOFROM:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM:
+	  case GOMP_MAP_NONCONTIG_ARRAY_ALLOC:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC:
+	  case GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT:
 	  case GOMP_MAP_LINK:
 	    gcc_assert (is_gimple_omp_oacc (stmt));
 	    break;
@@ -11397,7 +11620,14 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp
 	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 			   && OMP_CLAUSE_MAP_IN_REDUCTION (c)))
 	  {
-	    x = build_receiver_ref (var, true, ctx);
+	    tree var_type = TREE_TYPE (var);
+	    bool rcv_by_ref =
+	      (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	       && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))
+	       && TREE_CODE (var_type) != ARRAY_TYPE
+	       ? false : true);
+
+	    x = build_receiver_ref (var, rcv_by_ref, ctx);
 	    tree new_var = lookup_decl (var, ctx);
 
 	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
@@ -11647,6 +11877,24 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
+		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			 && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
+		  {
+		    int dim_num;
+		    tree dimensions = OMP_CLAUSE_SIZE (c);
+
+		    tree array_descr_type =
+		      create_noncontig_array_descr_type (OMP_CLAUSE_DECL (c),
+							 dimensions, &dim_num);
+		    tree array_descr =
+		      create_tmp_var_raw (array_descr_type, ".omp_noncontig_array_descr");
+		    gimple_add_tmp_var (array_descr);
+
+		    create_noncontig_array_descr_init_code
+		      (array_descr, ovar, dimensions, dim_num, &ilist);
+
+		    gimplify_assign (x, build_fold_addr_expr (array_descr), &ilist);
+		  }
 		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
 		  {
 		    gcc_assert (is_gimple_omp_oacc (ctx->stmt));
@@ -11718,6 +11966,9 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp
 		  s = TREE_TYPE (s);
 		s = TYPE_SIZE_UNIT (s);
 	      }
+	    else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		     && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
+	      s = NULL_TREE;
 	    else
 	      s = OMP_CLAUSE_SIZE (c);
 	    if (s == NULL_TREE)
Index: gcc/testsuite/c-c++-common/goacc/noncontig_array-1.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/noncontig_array-1.c	(nonexistent)
+++ gcc/testsuite/c-c++-common/goacc/noncontig_array-1.c	(working copy)
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+
+void foo (void)
+{
+  int array_of_array[10][10];
+  int **ptr_to_ptr;
+  int *array_of_ptr[10];
+  int (*ptr_to_array)[10];
+ 
+  #pragma acc parallel copy (array_of_array[2:4][0:10])
+    array_of_array[5][5] = 1;
+
+  #pragma acc parallel copy (ptr_to_ptr[2:4][1:7])
+    ptr_to_ptr[5][5] = 1;
+
+  #pragma acc parallel copy (array_of_ptr[2:4][1:7])
+    array_of_ptr[5][5] = 1;
+
+  #pragma acc parallel copy (ptr_to_array[2:4][1:7]) /* { dg-error "array section is not contiguous in 'map' clause" } */
+    ptr_to_array[5][5] = 1;
+}
+/* { dg-final { scan-tree-dump-times {#pragma omp target oacc_parallel map\(tofrom:array_of_array} 1 gimple } } */
+/* { dg-final { scan-tree-dump-times {#pragma omp target oacc_parallel map\(tofrom,noncontig_array:ptr_to_ptr \[dimensions: 2 4, 1 7\]} 1 gimple } } */
+/* { dg-final { scan-tree-dump-times {#pragma omp target oacc_parallel map\(tofrom,noncontig_array:array_of_ptr \[dimensions: 2 4, 1 7\]} 1 gimple } } */
+/* { dg-final { scan-tree-dump-times {#pragma omp target oacc_parallel map\(tofrom,noncontig_array:ptr_to_array \[dimensions: 2 4, 1 7\]} 1 gimple { xfail *-*-* } } } */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 277827)
+++ gcc/tree-pretty-print.c	(working copy)
@@ -849,6 +849,33 @@  dump_omp_clause (pretty_printer *pp, tree clause,
 	case GOMP_MAP_LINK:
 	  pp_string (pp, "link");
 	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_TO:
+	  pp_string (pp, "to,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FROM:
+	  pp_string (pp, "from,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_TOFROM:
+	  pp_string (pp, "tofrom,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO:
+	  pp_string (pp, "force_to,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM:
+	  pp_string (pp, "force_from,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM:
+	  pp_string (pp, "force_tofrom,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_ALLOC:
+	  pp_string (pp, "alloc,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC:
+	  pp_string (pp, "force_alloc,noncontig_array");
+	  break;
+	case GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT:
+	  pp_string (pp, "force_present,noncontig_array");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -859,8 +886,15 @@  dump_omp_clause (pretty_printer *pp, tree clause,
       if (OMP_CLAUSE_SIZE (clause))
 	{
 	  switch (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP
-		  ? OMP_CLAUSE_MAP_KIND (clause) : GOMP_MAP_TO)
+		  ? (GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (clause))
+		     ? GOMP_MAP_NONCONTIG_ARRAY
+		     : OMP_CLAUSE_MAP_KIND (clause))
+		  : GOMP_MAP_TO)
 	    {
+	    case GOMP_MAP_NONCONTIG_ARRAY:
+	      gcc_assert (TREE_CODE (OMP_CLAUSE_SIZE (clause)) == TREE_LIST);
+	      pp_string (pp, " [dimensions: ");
+	      break;
 	    case GOMP_MAP_POINTER:
 	    case GOMP_MAP_FIRSTPRIVATE_POINTER:
 	    case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
Index: include/gomp-constants.h
===================================================================
--- include/gomp-constants.h	(revision 277827)
+++ include/gomp-constants.h	(working copy)
@@ -40,6 +40,7 @@ 
 #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)
 #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)
+#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)
 #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \
 					 | GOMP_MAP_FLAG_SPECIAL_0)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
@@ -127,6 +128,26 @@  enum gomp_map_kind
     /* Decrement usage count and deallocate if zero.  */
     GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2
 					 | GOMP_MAP_DELETE),
+    /* Mapping kinds for non-contiguous arrays.  */
+    GOMP_MAP_NONCONTIG_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),
+    GOMP_MAP_NONCONTIG_ARRAY_TO =	(GOMP_MAP_NONCONTIG_ARRAY
+					 | GOMP_MAP_TO),
+    GOMP_MAP_NONCONTIG_ARRAY_FROM =	(GOMP_MAP_NONCONTIG_ARRAY
+					 | GOMP_MAP_FROM),
+    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY
+					 | GOMP_MAP_TOFROM),
+    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =	(GOMP_MAP_NONCONTIG_ARRAY_TO
+					 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =	(GOMP_MAP_NONCONTIG_ARRAY_FROM
+						 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY_TOFROM
+						 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =		(GOMP_MAP_NONCONTIG_ARRAY
+						 | GOMP_MAP_ALLOC),
+    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =	(GOMP_MAP_NONCONTIG_ARRAY
+						 | GOMP_MAP_FORCE_ALLOC),
+    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT =	(GOMP_MAP_NONCONTIG_ARRAY
+						 | GOMP_MAP_FORCE_PRESENT),
 
     /* Internal to GCC, not used in libgomp.  */
     /* Do not map, but pointer assign a pointer instead.  */
@@ -155,6 +176,8 @@  enum gomp_map_kind
 #define GOMP_MAP_ALWAYS_P(X) \
   (GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM))
 
+#define GOMP_MAP_NONCONTIG_ARRAY_P(X) \
+  ((X) & GOMP_MAP_NONCONTIG_ARRAY)
 
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
Index: libgomp/oacc-parallel.c
===================================================================
--- libgomp/oacc-parallel.c	(revision 277827)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -111,6 +111,21 @@  handle_ftn_pointers (size_t mapnum, void **hostadd
     }
 }
 
+static inline void
+revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,
+				     unsigned short *kinds)
+{
+  for (int i = 0; i < mapnum; i++)
+    {
+      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))
+	hostaddrs[i] = *((void **)hostaddrs[i]);
+      else
+	/* We assume all non-contiguous array map entries are placed at the
+	   start; first other map kind means we can exit.  */
+	break;
+    }
+}
+
 static void goacc_wait (int async, int num_waits, va_list *ap);
 
 
@@ -212,6 +227,7 @@  GOACC_parallel_keyed (int flags_m, void (*fn) (voi
       prof_info.device_type = acc_device_host;
       api_info.device_type = prof_info.device_type;
       goacc_save_and_set_bind (acc_device_host);
+      revert_noncontig_array_map_pointers (mapnum, hostaddrs, kinds);
       fn (hostaddrs);
       goacc_restore_bind ();
       goto out_prof;
@@ -218,6 +234,7 @@  GOACC_parallel_keyed (int flags_m, void (*fn) (voi
     }
   else if (acc_device_type (acc_dev->type) == acc_device_host)
     {
+      revert_noncontig_array_map_pointers (mapnum, hostaddrs, kinds);
       fn (hostaddrs);
       goto out_prof;
     }
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 277827)
+++ libgomp/target.c	(working copy)
@@ -520,6 +520,152 @@  gomp_map_val (struct target_mem_desc *tgt, void **
     }
 }
 
+/* Definitions for data structures describing non-contiguous arrays
+   (Note: interfaces with compiler)
+
+   The compiler generates a descriptor for each such array, places the
+   descriptor on stack, and passes the address of the descriptor to the libgomp
+   runtime as a normal map argument. The runtime then processes the array
+   data structure setup, and replaces the argument with the new actual
+   array address for the child function.
+
+   Care must be taken such that the struct field and layout assumptions
+   of struct gomp_ncarray_dim, gomp_ncarray_descr_type inside the compiler
+   be consistant with the below declarations.  */
+
+struct gomp_ncarray_dim {
+  size_t base;
+  size_t length;
+  size_t elem_size;
+  size_t is_array;
+};
+
+struct gomp_ncarray_descr_type {
+  void *ptr;
+  size_t ndims;
+  struct gomp_ncarray_dim dims[];
+};
+
+/* Internal non-contiguous array info struct, used only here inside the runtime. */
+
+struct ncarray_info
+{
+  struct gomp_ncarray_descr_type *descr;
+  size_t map_index;
+  size_t ptrblock_size;
+  size_t data_row_num;
+  size_t data_row_size;
+};
+
+static size_t
+gomp_noncontig_array_count_rows (struct gomp_ncarray_descr_type *descr)
+{
+  size_t nrows = 1;
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    nrows *= descr->dims[d].length / sizeof (void *);
+  return nrows;
+}
+
+static void
+gomp_noncontig_array_compute_info (struct ncarray_info *nca)
+{
+  size_t d, n = 1;
+  struct gomp_ncarray_descr_type *descr = nca->descr;
+
+  nca->ptrblock_size = 0;
+  for (d = 0; d < descr->ndims - 1; d++)
+    {
+      size_t dim_count = descr->dims[d].length / descr->dims[d].elem_size;
+      size_t dim_ptrblock_size = (descr->dims[d + 1].is_array
+				  ? 0 : descr->dims[d].length * n);
+      nca->ptrblock_size += dim_ptrblock_size;
+      n *= dim_count;
+    }
+  nca->data_row_num = n;
+  nca->data_row_size = descr->dims[d].length;
+}
+
+static void
+gomp_noncontig_array_fill_rows_1 (struct gomp_ncarray_descr_type *descr, void *nca,
+				  size_t d, void ***row_ptr, size_t *count)
+{
+  if (d < descr->ndims - 1)
+    {
+      size_t elsize = descr->dims[d].elem_size;
+      size_t n = descr->dims[d].length / elsize;
+      void *p = nca + descr->dims[d].base;
+      for (size_t i = 0; i < n; i++)
+	{
+	  void *ptr = p + i * elsize;
+	  /* Deref if next dimension is not array.  */
+	  if (!descr->dims[d + 1].is_array)
+	    ptr = *((void **) ptr);
+	  gomp_noncontig_array_fill_rows_1 (descr, ptr, d + 1, row_ptr, count);
+	}
+    }
+  else
+    {
+      **row_ptr = nca + descr->dims[d].base;
+      *row_ptr += 1;
+      *count += 1;
+    }
+}
+
+static size_t
+gomp_noncontig_array_fill_rows (struct gomp_ncarray_descr_type *descr, void *rows[])
+{
+  size_t count = 0;
+  void **p = rows;
+  gomp_noncontig_array_fill_rows_1 (descr, descr->ptr, 0, &p, &count);
+  return count;
+}
+
+static void *
+gomp_noncontig_array_create_ptrblock (struct ncarray_info *nca,
+				      void *tgt_addr, void *tgt_data_rows[])
+{
+  struct gomp_ncarray_descr_type *descr = nca->descr;
+  void *ptrblock = gomp_malloc (nca->ptrblock_size);
+  void **curr_dim_ptrblock = (void **) ptrblock;
+  size_t n = 1;
+
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    {
+      int curr_dim_len = descr->dims[d].length;
+      int next_dim_len = descr->dims[d + 1].length;
+      int curr_dim_num = curr_dim_len / sizeof (void *);
+      size_t next_dim_bias = descr->dims[d + 1].base;
+
+      void *next_dim_ptrblock
+	= (void *)(curr_dim_ptrblock + n * curr_dim_num);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < curr_dim_num; i++)
+	  {
+	    if (d < descr->ndims - 2)
+	      {
+		void *ptr = (next_dim_ptrblock
+			     + b * curr_dim_num * next_dim_len
+			     + i * next_dim_len);
+		void *tgt_ptr = tgt_addr + (ptr - ptrblock) - next_dim_bias;
+		curr_dim_ptrblock[b * curr_dim_num + i] = tgt_ptr;
+	      }
+	    else
+	      {
+		curr_dim_ptrblock[b * curr_dim_num + i]
+		  = tgt_data_rows[b * curr_dim_num + i] - next_dim_bias;
+	      }
+	    void *addr = &curr_dim_ptrblock[b * curr_dim_num + i];
+	    assert (ptrblock <= addr && addr < ptrblock + nca->ptrblock_size);
+	  }
+
+      n *= curr_dim_num;
+      curr_dim_ptrblock = next_dim_ptrblock;
+    }
+  assert (n == nca->data_row_num);
+  return ptrblock;
+}
+
 static inline __attribute__((always_inline)) struct target_mem_desc *
 gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			struct goacc_asyncqueue *aq, size_t mapnum,
@@ -533,9 +679,37 @@  gomp_map_vars_internal (struct gomp_device_descr *
   const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
-  struct target_mem_desc *tgt
-    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
-  tgt->list_count = mapnum;
+  struct target_mem_desc *tgt;
+
+  bool process_noncontig_arrays = false;
+  size_t nca_data_row_num = 0, row_start = 0;
+  size_t nca_info_num = 0, nca_index;
+  struct ncarray_info *nca_info = NULL;
+  struct target_var_desc *row_desc;
+  uintptr_t target_row_addr;
+  void **host_data_rows = NULL, **target_data_rows = NULL;
+  void *row;
+
+  if (mapnum > 0)
+    {
+      int kind = get_kind (short_mapkind, kinds, 0);
+      process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask);
+    }
+
+  if (process_noncontig_arrays)
+    for (i = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
+	  {
+	    nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]);
+	    nca_info_num += 1;
+	  }
+      }
+
+  tgt = gomp_malloc (sizeof (*tgt)
+		     + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num));
+  tgt->list_count = mapnum + nca_data_row_num;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
@@ -547,6 +721,14 @@  gomp_map_vars_internal (struct gomp_device_descr *
       return tgt;
     }
 
+  if (nca_info_num)
+    nca_info = gomp_alloca (sizeof (struct ncarray_info) * nca_info_num);
+  if (nca_data_row_num)
+    {
+      host_data_rows = gomp_malloc (2 * sizeof (void *) * nca_data_row_num);
+      target_data_rows = &host_data_rows[nca_data_row_num];
+    }
+
   tgt_align = sizeof (void *);
   tgt_size = 0;
   cbuf.chunks = NULL;
@@ -578,7 +760,7 @@  gomp_map_vars_internal (struct gomp_device_descr *
       return NULL;
     }
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0, nca_index = 0; i < mapnum; i++)
     {
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL
@@ -667,6 +849,20 @@  gomp_map_vars_internal (struct gomp_device_descr *
 	  has_firstprivate = true;
 	  continue;
 	}
+      else if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
+	{
+	  /* Ignore non-contiguous arrays for now, we process them together
+	     later.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = 0;
+	  not_found_cnt++;
+
+	  struct ncarray_info *nca = &nca_info[nca_index++];
+	  nca->descr = (struct gomp_ncarray_descr_type *) hostaddrs[i];
+	  nca->map_index = i;
+	  continue;
+	}
+
       cur_node.host_start = (uintptr_t) hostaddrs[i];
       if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
@@ -735,6 +931,56 @@  gomp_map_vars_internal (struct gomp_device_descr *
 	}
     }
 
+  /* For non-contiguous arrays. Each data row is one target item, separated
+     from the normal map clause items, hence we order them after mapnum.  */
+  if (process_noncontig_arrays)
+    for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
+	  continue;
+
+	struct ncarray_info *nca = &nca_info[nca_index++];
+	struct gomp_ncarray_descr_type *descr = nca->descr;
+	size_t nr;
+
+	gomp_noncontig_array_compute_info (nca);
+
+	/* We have allocated space in host/target_data_rows to place all the
+	   row data block pointers, now we can start filling them in.  */
+	nr = gomp_noncontig_array_fill_rows (descr, &host_data_rows[row_start]);
+	assert (nr == nca->data_row_num);
+
+	size_t align = (size_t) 1 << (kind >> rshift);
+	if (tgt_align < align)
+	  tgt_align = align;
+	tgt_size = (tgt_size + align - 1) & ~(align - 1);
+	tgt_size += nca->ptrblock_size;
+
+	for (size_t j = 0; j < nca->data_row_num; j++)
+	  {
+	    row = host_data_rows[row_start + j];
+	    row_desc = &tgt->list[mapnum + row_start + j];
+
+	    cur_node.host_start = (uintptr_t) row;
+	    cur_node.host_end = cur_node.host_start + nca->data_row_size;
+	    splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	    if (n)
+	      {
+		assert (n->refcount != REFCOUNT_LINK);
+		gomp_map_vars_existing (devicep, aq, n, &cur_node, row_desc,
+					kind & typemask, /* TODO: cbuf? */ NULL);
+	      }
+	    else
+	      {
+		tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		tgt_size += nca->data_row_size;
+		not_found_cnt++;
+	      }
+	  }
+	row_start += nca->data_row_num;
+      }
+
   if (devaddrs)
     {
       if (mapnum != 1)
@@ -895,6 +1141,15 @@  gomp_map_vars_internal (struct gomp_device_descr *
 	      default:
 		break;
 	      }
+
+	    if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
+	      {
+		tgt->list[i].key = &array->key;
+		tgt->list[i].key->tgt = tgt;
+		array++;
+		continue;
+	      }
+
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
 	    if (!GOMP_MAP_POINTER_P (kind & typemask))
@@ -1044,8 +1299,112 @@  gomp_map_vars_internal (struct gomp_device_descr *
 		array++;
 	      }
 	  }
+
+      /* Processing of non-contiguous array rows.  */
+      if (process_noncontig_arrays)
+	{
+	  for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)
+	    {
+	      int kind = get_kind (short_mapkind, kinds, i);
+	      if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))
+		continue;
+
+	      struct ncarray_info *nca = &nca_info[nca_index++];
+	      assert (nca->descr == hostaddrs[i]);
+
+	      /* The map for the non-contiguous array itself is never copied from
+		 during unmapping, its the data rows that count. Set copy-from
+		 flags to false here.  */
+	      tgt->list[i].copy_from = false;
+	      tgt->list[i].always_copy_from = false;
+
+	      size_t align = (size_t) 1 << (kind >> rshift);
+	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+
+	      /* For the map of the non-contiguous array itself, adjust so that
+		 the passed device address points to the beginning of the
+		 ptrblock. Remember to adjust the first-dimension's bias here.   */
+	      tgt->list[i].key->tgt_offset = tgt_size - nca->descr->dims[0].base;
+
+	      void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
+	      tgt_size += nca->ptrblock_size;
+
+	      /* Add splay key for each data row in current non-contiguous
+		 array.  */
+	      for (size_t j = 0; j < nca->data_row_num; j++)
+		{
+		  row = host_data_rows[row_start + j];
+		  row_desc = &tgt->list[mapnum + row_start + j];
+
+		  cur_node.host_start = (uintptr_t) row;
+		  cur_node.host_end = cur_node.host_start + nca->data_row_size;
+		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+		  if (n)
+		    {
+		      assert (n->refcount != REFCOUNT_LINK);
+		      gomp_map_vars_existing (devicep, aq, n, &cur_node, row_desc,
+					      kind & typemask, cbufp);
+		      target_row_addr = n->tgt->tgt_start + n->tgt_offset;
+		    }
+		  else
+		    {
+		      tgt->refcount++;
+
+		      splay_tree_key k = &array->key;
+		      k->host_start = (uintptr_t) row;
+		      k->host_end = k->host_start + nca->data_row_size;
+
+		      k->tgt = tgt;
+		      k->refcount = 1;
+		      k->link_key = NULL;
+		      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		      target_row_addr = tgt->tgt_start + tgt_size;
+		      k->tgt_offset = tgt_size;
+		      tgt_size += nca->data_row_size;
+
+		      row_desc->key = k;
+		      row_desc->copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->always_copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->offset = 0;
+		      row_desc->length = nca->data_row_size;
+
+		      array->left = NULL;
+		      array->right = NULL;
+		      splay_tree_insert (mem_map, array);
+
+		      if (GOMP_MAP_COPY_TO_P (kind & typemask))
+			gomp_copy_host2dev (devicep, aq,
+					    (void *) tgt->tgt_start + k->tgt_offset,
+					    (void *) k->host_start,
+					    nca->data_row_size, cbufp);
+		      array++;
+		    }
+		  target_data_rows[row_start + j] = (void *) target_row_addr;
+		}
+
+	      /* Now we have the target memory allocated, and target offsets of all
+		 row blocks assigned and calculated, we can construct the
+		 accelerator side ptrblock and copy it in.  */
+	      if (nca->ptrblock_size)
+		{
+		  void *ptrblock = gomp_noncontig_array_create_ptrblock
+		    (nca, target_ptrblock, target_data_rows + row_start);
+		  gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
+				      nca->ptrblock_size, cbufp);
+		  free (ptrblock);
+		}
+
+	      row_start += nca->data_row_num;
+	    }
+	  assert (row_start == nca_data_row_num && nca_index == nca_info_num);
+	}
     }
 
+  if (nca_data_row_num)
+    free (host_data_rows);
+
   if (pragma_kind == GOMP_MAP_VARS_TARGET)
     {
       for (i = 0; i < mapnum; i++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c	(nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c	(working copy)
@@ -0,0 +1,103 @@ 
+/* { dg-do run } */
+
+#include <stdlib.h>
+#include <assert.h>
+
+#define n 100
+#define m 100
+
+int b[n][m];
+
+void
+test1 (void)
+{
+  int i, j, *a[100];
+
+  /* Array of pointers form test.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = (int *)malloc (sizeof (int) * m);
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    {
+      for (j = 0; j < m; j++)
+	assert (a[i][j] == b[i][j]);
+      /* Clean up.  */
+      free (a[i]);
+    }
+}
+
+void
+test2 (void)
+{
+  int i, j, **a = (int **) malloc (sizeof (int *) * n);
+
+  /* Separately allocated blocks.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = (int *)malloc (sizeof (int) * m);
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    {
+      for (j = 0; j < m; j++)
+	assert (a[i][j] == b[i][j]);
+      /* Clean up.  */
+      free (a[i]);
+    }
+  free (a);
+}
+
+void
+test3 (void)
+{
+  int i, j, **a = (int **) malloc (sizeof (int *) * n);
+  a[0] = (int *) malloc (sizeof (int) * n * m);
+
+  /* Rows allocated in one contiguous block.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = *a + i * m;
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    for (j = 0; j < m; j++)
+      assert (a[i][j] == b[i][j]);
+
+  free (a[0]);
+  free (a);
+}
+
+int
+main (void)
+{
+  test1 ();
+  test2 ();
+  test3 ();
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c	(nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c	(working copy)
@@ -0,0 +1,37 @@ 
+/* { dg-do run } */
+
+#include <assert.h>
+#include "noncontig_array-utils.h"
+
+int
+main (void)
+{
+  int n = 10;
+  int ***a = (int ***) create_ncarray (sizeof (int), n, 3);
+  int ***b = (int ***) create_ncarray (sizeof (int), n, 3);
+  int ***c = (int ***) create_ncarray (sizeof (int), n, 3);
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	{
+	  a[i][j][k] = i + j * k + k;
+	  b[i][j][k] = j + k * i + i * j;
+	  c[i][j][k] = a[i][j][k];
+	}
+
+  #pragma acc parallel copy (a[0:n][0:n][0:n]) copyin (b[0:n][0:n][0:n])
+  {
+    for (int i = 0; i < n; i++)
+      for (int j = 0; j < n; j++)
+	for (int k = 0; k < n; k++)
+	  a[i][j][k] += b[k][j][i] + i + j + k;
+  }
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	assert (a[i][j][k] == c[i][j][k] + b[k][j][i] + i + j + k);
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c	(nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c	(working copy)
@@ -0,0 +1,45 @@ 
+/* { dg-do run } */
+
+#include <assert.h>
+#include "noncontig_array-utils.h"
+
+int main (void)
+{
+  int n = 20, x = 5, y = 12;
+  int *****a = (int *****) create_ncarray (sizeof (int), n, 5);
+
+  int sum1 = 0, sum2 = 0, sum3 = 0;
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	for (int l = 0; l < n; l++)
+	  for (int m = 0; m < n; m++)
+	    {
+	      a[i][j][k][l][m] = 1;
+	      sum1++;
+	    }
+
+  #pragma acc parallel copy (a[x:y][x:y][x:y][x:y][x:y]) copy(sum2)
+  {
+    for (int i = x; i < x + y; i++)
+      for (int j = x; j < x + y; j++)
+	for (int k = x; k < x + y; k++)
+	  for (int l = x; l < x + y; l++)
+	    for (int m = x; m < x + y; m++)
+	      {
+		a[i][j][k][l][m] = 0;
+		sum2++;
+	      }
+  }
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	for (int l = 0; l < n; l++)
+	  for (int m = 0; m < n; m++)
+	    sum3 += a[i][j][k][l][m];
+
+  assert (sum1 == sum2 + sum3);
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c	(nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c	(working copy)
@@ -0,0 +1,36 @@ 
+/* { dg-do run } */
+
+#include <assert.h>
+#include "noncontig_array-utils.h"
+
+int main (void)
+{
+  int n = 128;
+  double ***a = (double ***) create_ncarray (sizeof (double), n, 3);
+  double ***b = (double ***) create_ncarray (sizeof (double), n, 3);
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	a[i][j][k] = i + j + k + i * j * k;
+
+  /* This test exercises async copyout of non-contiguous array rows.  */
+  #pragma acc parallel copyin(a[0:n][0:n][0:n]) copyout(b[0:n][0:n][0:n]) async(5)
+  {
+    #pragma acc loop gang
+    for (int i = 0; i < n; i++)
+      #pragma acc loop vector
+      for (int j = 0; j < n; j++)
+	for (int k = 0; k < n; k++)
+	  b[i][j][k] = a[i][j][k] * 2.0;
+  }
+
+  #pragma acc wait (5)
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	assert (b[i][j][k] == a[i][j][k] * 2.0);
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h	(nonexistent)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h	(working copy)
@@ -0,0 +1,44 @@ 
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <stdint.h>
+
+/* Allocate and create a pointer based NDIMS-dimensional array,
+   each dimension DIMLEN long, with ELSIZE sized data elements.  */
+void *
+create_ncarray (size_t elsize, int dimlen, int ndims)
+{
+  size_t blk_size = 0;
+  size_t n = 1;
+
+  for (int i = 0; i < ndims - 1; i++)
+    {
+      n *= dimlen;
+      blk_size += sizeof (void *) * n;
+    }
+  size_t data_rows_num = n;
+  size_t data_rows_offset = blk_size;
+  blk_size += elsize * n * dimlen;
+
+  void *blk = (void *) malloc (blk_size);
+  memset (blk, 0, blk_size);
+  void **curr_dim = (void **) blk;
+  n = 1;
+
+  for (int d = 0; d < ndims - 1; d++)
+    {
+      uintptr_t next_dim = (uintptr_t) (curr_dim + n * dimlen);
+      size_t next_dimlen = dimlen * (d < ndims - 2 ? sizeof (void *) : elsize);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < dimlen; i++)
+	  if (d < ndims - 1)
+	    curr_dim[b * dimlen + i]
+	      = (void*) (next_dim + b * dimlen * next_dimlen + i * next_dimlen);
+
+      n *= dimlen;
+      curr_dim = (void**) next_dim;
+    }
+  assert (n == data_rows_num);
+  return blk;
+}