diff mbox series

[OpenACC,7/8] Multi-dimensional dynamic array support for OpenACC data clauses, libgomp support

Message ID f5365601-d072-6486-02df-94764a6724ee@mentor.com
State New
Headers show
Series Multi-dimensional dynamic array support for OpenACC data clauses | expand

Commit Message

Chung-Lin Tang Oct. 16, 2018, 12:57 p.m. UTC
This part is the libgomp runtime handling for OpenACC dynamic arrays.

We handle such arrays by creating a "pointer block" that emulates the N-1 dimensions,
and then treating each data row of the final Nth dimension as an individual object
mapped in the TGT. All the rows are processed as appended after all the other map
kind objects.

Thanks,
Chung-Lin

	libgomp/
	* target.c (struct da_dim): New struct declaration.
	(struct da_descr_type): Likewise.
	(struct da_info): Likewise.
	(gomp_dynamic_array_count_rows): New function.
	(gomp_dynamic_array_compute_info): Likewise.
	(gomp_dynamic_array_fill_rows_1): Likewise.
	(gomp_dynamic_array_fill_rows): Likewise.
	(gomp_dynamic_array_create_ptrblock): Likewise.
	(gomp_map_vars): Add code to handle dynamic array map kinds.

Comments

Jakub Jelinek Oct. 16, 2018, 1:13 p.m. UTC | #1
On Tue, Oct 16, 2018 at 08:57:00PM +0800, Chung-Lin Tang wrote:
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -490,6 +490,140 @@ gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
>    return tgt->tgt_start + tgt->list[i].offset;
>  }
>  
> +/* Dynamic array related data structures, interfaces with the compiler.  */
> +
> +struct da_dim {
> +  size_t base;
> +  size_t length;
> +  size_t elem_size;
> +  size_t is_array;
> +};
> +
> +struct da_descr_type {
> +  void *ptr;
> +  size_t ndims;
> +  struct da_dim dims[];
> +};

Why do you call the non-contiguous arrays dynamic arrays?  Is that some OpenACC term?
I'd also prefix those with gomp_ and it is important to make it clear what
is the ABI type shared with the compiler and what are the internal types.
struct gomp_array_descr would look more natural to me.

> +  for (i = 0; i < mapnum; i++)
> +    {
> +      int kind = get_kind (short_mapkind, kinds, i);
> +      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
> +	{
> +	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
> +	  da_info_num += 1;
> +	}
> +    }

I'm not really happy by adding several extra loops which will not do
anything in the case there are no non-contiguous arrays being mapped (for
now always for OpenMP (OpenMP 5 has support for non-contigious target update
to/from though) and guess rarely for OpenACC).
Can't you use some flag bit in flags passed to GOMP_target* etc. and do the
above loop only if the compiler indicated there are any?

> +  tgt = gomp_malloc (sizeof (*tgt)
> +		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
> +  tgt->list_count = mapnum + da_data_row_num;
>    tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
>    tgt->device_descr = devicep;
>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;

> @@ -687,6 +863,55 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>  	}
>      }
>  
> +  /* For dynamic arrays. Each data row is one target item, separated from
> +     the normal map clause items, hence we order them after mapnum.  */
> +  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)

Even if nothing is in flags, you could just avoid this loop if the previous
loop(s) haven't found any noncontiguous arrays.

> @@ -976,6 +1210,108 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>  		array++;
>  	      }
>  	  }
> +
> +      /* Processing of dynamic array rows.  */
> +      for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
> +	{
> +	  int kind = get_kind (short_mapkind, kinds, i);
> +	  if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
> +	    continue;

Again.

	Jakub
Chung-Lin Tang Dec. 6, 2018, 2:19 p.m. UTC | #2
Hi Jakub, thanks for the swift review a few weeks ago, and apologies I haven't been able
to respond sooner.

On 2018/10/16 9:13 PM, Jakub Jelinek wrote:>> +/* Dynamic array related data structures, interfaces with the compiler.  */
>> +
>> +struct da_dim {
>> +  size_t base;
>> +  size_t length;
>> +  size_t elem_size;
>> +  size_t is_array;
>> +};
>> +
>> +struct da_descr_type {
>> +  void *ptr;
>> +  size_t ndims;
>> +  struct da_dim dims[];
>> +};
> 
> Why do you call the non-contiguous arrays dynamic arrays?  Is that some OpenACC term?
> I'd also prefix those with gomp_ and it is important to make it clear what
> is the ABI type shared with the compiler and what are the internal types.
> struct gomp_array_descr would look more natural to me.

Well it's not particularly an OpenACC term, just that non-contiguous arrays are
often multi-dimensional arrays dynamically allocated and created through (arrays of) pointers.
Are you strongly opposed to this naming? If so, I can adjust this part.

I think the suggested 'gomp_array_descr' identifier looks descriptive, I'll revise that in an update,
as well as add more comments to better describe its ABI significance with the compiler.

>> +  for (i = 0; i < mapnum; i++)
>> +    {
>> +      int kind = get_kind (short_mapkind, kinds, i);
>> +      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
>> +	{
>> +	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
>> +	  da_info_num += 1;
>> +	}
>> +    }
> 
> I'm not really happy by adding several extra loops which will not do
> anything in the case there are no non-contiguous arrays being mapped (for
> now always for OpenMP (OpenMP 5 has support for non-contigious target update
> to/from though) and guess rarely for OpenACC).
> Can't you use some flag bit in flags passed to GOMP_target* etc. and do the
> above loop only if the compiler indicated there are any?

I originally strived to not have that loop, but because each row in the last dimension
is mapped as its own target_var_desc, we need to count them at this stage to allocate
the right number at start. Otherwise a realloc later seems even more ugly...

We currently don't have a suitable flag word argument in GOMP_target*, GOACC_parallel*, etc.
I am not sure if such a feature warrants changing the interface.

If you are weary of OpenMP being affected, I can add a condition to restrict such processing
to only (pragma_kind == GOMP_MAP_VARS_OPENACC). Is that okay? (at least before making any
larger runtime interface adjustments)

>> +  tgt = gomp_malloc (sizeof (*tgt)
>> +		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
>> +  tgt->list_count = mapnum + da_data_row_num;
>>     tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
>>     tgt->device_descr = devicep;
>>     struct gomp_coalesce_buf cbuf, *cbufp = NULL;
> 
>> @@ -687,6 +863,55 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>>   	}
>>       }
>>   
>> +  /* For dynamic arrays. Each data row is one target item, separated from
>> +     the normal map clause items, hence we order them after mapnum.  */
>> +  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
> 
> Even if nothing is in flags, you could just avoid this loop if the previous
> loop(s) haven't found any noncontiguous arrays.

I'll add a bit more checking to avoid these cases.

Thanks,
Chung-Lin
Jakub Jelinek Dec. 6, 2018, 2:43 p.m. UTC | #3
On Thu, Dec 06, 2018 at 10:19:43PM +0800, Chung-Lin Tang wrote:
> > Why do you call the non-contiguous arrays dynamic arrays?  Is that some OpenACC term?
> > I'd also prefix those with gomp_ and it is important to make it clear what
> > is the ABI type shared with the compiler and what are the internal types.
> > struct gomp_array_descr would look more natural to me.
> 
> Well it's not particularly an OpenACC term, just that non-contiguous arrays are
> often multi-dimensional arrays dynamically allocated and created through (arrays of) pointers.
> Are you strongly opposed to this naming? If so, I can adjust this part.

The way how those arrays are created (and it doesn't have to be dynamically
allocated) doesn't affect their representation.
There are various terms that describe various data structures, like Iliffe
vectors, jagged/ragged arrays, dope vectors.
I guess it depends on what kind of data structures does this new framework
support, if the Iliffe vectors (arrays of pointers), or just flat but
strided arrays, etc.

> > > +  for (i = 0; i < mapnum; i++)
> > > +    {
> > > +      int kind = get_kind (short_mapkind, kinds, i);
> > > +      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
> > > +	{
> > > +	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
> > > +	  da_info_num += 1;
> > > +	}
> > > +    }
> > 
> > I'm not really happy by adding several extra loops which will not do
> > anything in the case there are no non-contiguous arrays being mapped (for
> > now always for OpenMP (OpenMP 5 has support for non-contigious target update
> > to/from though) and guess rarely for OpenACC).
> > Can't you use some flag bit in flags passed to GOMP_target* etc. and do the
> > above loop only if the compiler indicated there are any?
> 
> I originally strived to not have that loop, but because each row in the last dimension
> is mapped as its own target_var_desc, we need to count them at this stage to allocate
> the right number at start. Otherwise a realloc later seems even more ugly...
> 
> We currently don't have a suitable flag word argument in GOMP_target*, GOACC_parallel*, etc.
> I am not sure if such a feature warrants changing the interface.
> 
> If you are weary of OpenMP being affected, I can add a condition to restrict such processing
> to only (pragma_kind == GOMP_MAP_VARS_OPENACC). Is that okay? (at least before making any
> larger runtime interface adjustments)

That will still cost you doing that loop for OpenACC constructs that don't
have any of these non-contiguous arrays.  GOMP_target_ext has flags
argument, but GOACC_paralel_keyed doesn't.  It has ... and you could perhaps
encode some flag in there.  Or, could these array descriptors be passed
first in the list of vars, so instead of a loop to check for these you could
just check the first one?

> > > +  tgt = gomp_malloc (sizeof (*tgt)
> > > +		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
> > > +  tgt->list_count = mapnum + da_data_row_num;
> > >     tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
> > >     tgt->device_descr = devicep;
> > >     struct gomp_coalesce_buf cbuf, *cbufp = NULL;
> > 
> > > @@ -687,6 +863,55 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
> > >   	}
> > >       }
> > > +  /* For dynamic arrays. Each data row is one target item, separated from
> > > +     the normal map clause items, hence we order them after mapnum.  */
> > > +  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
> > 
> > Even if nothing is in flags, you could just avoid this loop if the previous
> > loop(s) haven't found any noncontiguous arrays.
> 
> I'll add a bit more checking to avoid these cases.

	Jakub
Chung-Lin Tang Dec. 13, 2018, 2:52 p.m. UTC | #4
On 2018/12/6 10:43 PM, Jakub Jelinek wrote:
> On Thu, Dec 06, 2018 at 10:19:43PM +0800, Chung-Lin Tang wrote:
>>> Why do you call the non-contiguous arrays dynamic arrays?  Is that some OpenACC term?
>>> I'd also prefix those with gomp_ and it is important to make it clear what
>>> is the ABI type shared with the compiler and what are the internal types.
>>> struct gomp_array_descr would look more natural to me.
>>
>> Well it's not particularly an OpenACC term, just that non-contiguous arrays are
>> often multi-dimensional arrays dynamically allocated and created through (arrays of) pointers.
>> Are you strongly opposed to this naming? If so, I can adjust this part.
> 
> The way how those arrays are created (and it doesn't have to be dynamically
> allocated) doesn't affect their representation.
> There are various terms that describe various data structures, like Iliffe
> vectors, jagged/ragged arrays, dope vectors.
> I guess it depends on what kind of data structures does this new framework
> support, if the Iliffe vectors (arrays of pointers), or just flat but
> strided arrays, etc.
> 
>>>> +  for (i = 0; i < mapnum; i++)
>>>> +    {
>>>> +      int kind = get_kind (short_mapkind, kinds, i);
>>>> +      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
>>>> +	{
>>>> +	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
>>>> +	  da_info_num += 1;
>>>> +	}
>>>> +    }
>>>
>>> I'm not really happy by adding several extra loops which will not do
>>> anything in the case there are no non-contiguous arrays being mapped (for
>>> now always for OpenMP (OpenMP 5 has support for non-contigious target update
>>> to/from though) and guess rarely for OpenACC).
>>> Can't you use some flag bit in flags passed to GOMP_target* etc. and do the
>>> above loop only if the compiler indicated there are any?
>>
>> I originally strived to not have that loop, but because each row in the last dimension
>> is mapped as its own target_var_desc, we need to count them at this stage to allocate
>> the right number at start. Otherwise a realloc later seems even more ugly...
>>
>> We currently don't have a suitable flag word argument in GOMP_target*, GOACC_parallel*, etc.
>> I am not sure if such a feature warrants changing the interface.
>>
>> If you are weary of OpenMP being affected, I can add a condition to restrict such processing
>> to only (pragma_kind == GOMP_MAP_VARS_OPENACC). Is that okay? (at least before making any
>> larger runtime interface adjustments)
> 
> That will still cost you doing that loop for OpenACC constructs that don't
> have any of these non-contiguous arrays.  GOMP_target_ext has flags
> argument, but GOACC_paralel_keyed doesn't.  It has ... and you could perhaps
> encode some flag in there.  Or, could these array descriptors be passed
> first in the list of vars, so instead of a loop to check for these you could
> just check the first one?

Hi Jakub,
I have revised the patch to rename the main struct da_* types into struct gomp_array_* and
added more detailed descriptions in the comments (though admittedly the "dynamic array" term
is not purged completely).

I have opted for the place-at-start-of-chain route, this should avoid all the tests and
additional iterating when such arrays are not used. There's also another omp-low.c update in
another patch.

Besides the revised whole patch, I have also attached a v1-v2 diff showing the changes in between.
Tested with offloading to ensure no regressions.

Thanks,
Chung-Lin
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 267050)
+++ libgomp/target.c	(working copy)
@@ -477,6 +477,151 @@ gomp_map_val (struct target_mem_desc *tgt, void **
   return tgt->tgt_start + tgt->list[i].offset;
 }
 
+/* Definitions for data structures describing dynamic, non-contiguous arrays
+   (Note: interfaces with compiler)
+
+   The compiler generates a descriptor for each such array, places the
+   descriptor on stack, and passes the address of the descriptor to the libgomp
+   runtime as a normal map argument. The runtime then processes the array
+   data structure setup, and replaces the argument with the new actual
+   array address for the child function.
+
+   Care must be taken such that the struct field and layout assumptions
+   of struct gomp_array_dim, gomp_array_descr_type inside the compiler
+   be consistant with the below declarations.  */
+
+struct gomp_array_dim {
+  size_t base;
+  size_t length;
+  size_t elem_size;
+  size_t is_array;
+};
+
+struct gomp_array_descr_type {
+  void *ptr;
+  size_t ndims;
+  struct gomp_array_dim dims[];
+};
+
+/* Internal dynamic array info struct, used only here inside the runtime. */
+
+struct da_info
+{
+  struct gomp_array_descr_type *descr;
+  size_t map_index;
+  size_t ptrblock_size;
+  size_t data_row_num;
+  size_t data_row_size;
+};
+
+static size_t
+gomp_dynamic_array_count_rows (struct gomp_array_descr_type *descr)
+{
+  size_t nrows = 1;
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    nrows *= descr->dims[d].length / sizeof (void *);
+  return nrows;
+}
+
+static void
+gomp_dynamic_array_compute_info (struct da_info *da)
+{
+  size_t d, n = 1;
+  struct gomp_array_descr_type *descr = da->descr;
+
+  da->ptrblock_size = 0;
+  for (d = 0; d < descr->ndims - 1; d++)
+    {
+      size_t dim_count = descr->dims[d].length / descr->dims[d].elem_size;
+      size_t dim_ptrblock_size = (descr->dims[d + 1].is_array
+				  ? 0 : descr->dims[d].length * n);
+      da->ptrblock_size += dim_ptrblock_size;
+      n *= dim_count;
+    }
+  da->data_row_num = n;
+  da->data_row_size = descr->dims[d].length;
+}
+
+static void
+gomp_dynamic_array_fill_rows_1 (struct gomp_array_descr_type *descr, void *da,
+				size_t d, void ***row_ptr, size_t *count)
+{
+  if (d < descr->ndims - 1)
+    {
+      size_t elsize = descr->dims[d].elem_size;
+      size_t n = descr->dims[d].length / elsize;
+      void *p = da + descr->dims[d].base;
+      for (size_t i = 0; i < n; i++)
+	{
+	  void *ptr = p + i * elsize;
+	  /* Deref if next dimension is not array.  */
+	  if (!descr->dims[d + 1].is_array)
+	    ptr = *((void **) ptr);
+	  gomp_dynamic_array_fill_rows_1 (descr, ptr, d + 1, row_ptr, count);
+	}
+    }
+  else
+    {
+      **row_ptr = da + descr->dims[d].base;
+      *row_ptr += 1;
+      *count += 1;
+    }
+}
+
+static size_t
+gomp_dynamic_array_fill_rows (struct gomp_array_descr_type *descr, void *rows[])
+{
+  size_t count = 0;
+  void **p = rows;
+  gomp_dynamic_array_fill_rows_1 (descr, descr->ptr, 0, &p, &count);
+  return count;
+}
+
+static void *
+gomp_dynamic_array_create_ptrblock (struct da_info *da,
+				    void *tgt_addr, void *tgt_data_rows[])
+{
+  struct gomp_array_descr_type *descr = da->descr;
+  void *ptrblock = gomp_malloc (da->ptrblock_size);
+  void **curr_dim_ptrblock = (void **) ptrblock;
+  size_t n = 1;
+
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    {
+      int curr_dim_len = descr->dims[d].length;
+      int next_dim_len = descr->dims[d + 1].length;
+      int curr_dim_num = curr_dim_len / sizeof (void *);
+
+      void *next_dim_ptrblock
+	= (void *)(curr_dim_ptrblock + n * curr_dim_num);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < curr_dim_num; i++)
+	  {
+	    if (d < descr->ndims - 2)
+	      {
+		void *ptr = (next_dim_ptrblock
+			     + b * curr_dim_num * next_dim_len
+			     + i * next_dim_len);
+		void *tgt_ptr = tgt_addr + (ptr - ptrblock);
+		curr_dim_ptrblock[b * curr_dim_num + i] = tgt_ptr;
+	      }
+	    else
+	      {
+		curr_dim_ptrblock[b * curr_dim_num + i]
+		  = tgt_data_rows[b * curr_dim_num + i];
+	      }
+	    void *addr = &curr_dim_ptrblock[b * curr_dim_num + i];
+	    assert (ptrblock <= addr && addr < ptrblock + da->ptrblock_size);
+	  }
+
+      n *= curr_dim_num;
+      curr_dim_ptrblock = next_dim_ptrblock;
+    }
+  assert (n == da->data_row_num);
+  return ptrblock;
+}
+
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
@@ -488,9 +633,37 @@ gomp_map_vars (struct gomp_device_descr *devicep,
   const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
-  struct target_mem_desc *tgt
-    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
-  tgt->list_count = mapnum;
+  struct target_mem_desc *tgt;
+
+  bool process_dynarrays = false;
+  size_t da_data_row_num = 0, row_start = 0;
+  size_t da_info_num = 0, da_index;
+  struct da_info *da_info = NULL;
+  struct target_var_desc *row_desc;
+  uintptr_t target_row_addr;
+  void **host_data_rows = NULL, **target_data_rows = NULL;
+  void *row;
+
+  if (mapnum > 0)
+    {
+      int kind = get_kind (short_mapkind, kinds, 0);
+      process_dynarrays = GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask);
+    }
+
+  if (process_dynarrays)
+    for (i = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	  {
+	    da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
+	    da_info_num += 1;
+	  }
+      }
+
+  tgt = gomp_malloc (sizeof (*tgt)
+		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
+  tgt->list_count = mapnum + da_data_row_num;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
@@ -502,6 +675,14 @@ gomp_map_vars (struct gomp_device_descr *devicep,
       return tgt;
     }
 
+  if (da_info_num)
+    da_info = gomp_alloca (sizeof (struct da_info) * da_info_num);
+  if (da_data_row_num)
+    {
+      host_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+      target_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+    }
+
   tgt_align = sizeof (void *);
   tgt_size = 0;
   cbuf.chunks = NULL;
@@ -533,7 +714,7 @@ gomp_map_vars (struct gomp_device_descr *devicep,
       return NULL;
     }
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0, da_index = 0; i < mapnum; i++)
     {
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL
@@ -606,6 +787,20 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 	  has_firstprivate = true;
 	  continue;
 	}
+      else if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	{
+	  /* Ignore dynamic arrays for now, we process them together
+	     later.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = 0;
+	  not_found_cnt++;
+
+	  struct da_info *da = &da_info[da_index++];
+	  da->descr = (struct gomp_array_descr_type *) hostaddrs[i];
+	  da->map_index = i;
+	  continue;
+	}
+
       cur_node.host_start = (uintptr_t) hostaddrs[i];
       if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
@@ -674,6 +869,56 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 	}
     }
 
+  /* For dynamic arrays. Each data row is one target item, separated from
+     the normal map clause items, hence we order them after mapnum.  */
+  if (process_dynarrays)
+    for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	  continue;
+
+	struct da_info *da = &da_info[da_index++];
+	struct gomp_array_descr_type *descr = da->descr;
+	size_t nr;
+
+	gomp_dynamic_array_compute_info (da);
+
+	/* We have allocated space in host/target_data_rows to place all the
+	   row data block pointers, now we can start filling them in.  */
+	nr = gomp_dynamic_array_fill_rows (descr, &host_data_rows[row_start]);
+	assert (nr == da->data_row_num);
+
+	size_t align = (size_t) 1 << (kind >> rshift);
+	if (tgt_align < align)
+	  tgt_align = align;
+	tgt_size = (tgt_size + align - 1) & ~(align - 1);
+	tgt_size += da->ptrblock_size;
+
+	for (size_t j = 0; j < da->data_row_num; j++)
+	  {
+	    row = host_data_rows[row_start + j];
+	    row_desc = &tgt->list[mapnum + row_start + j];
+
+	    cur_node.host_start = (uintptr_t) row;
+	    cur_node.host_end = cur_node.host_start + da->data_row_size;
+	    splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	    if (n)
+	      {
+		assert (n->refcount != REFCOUNT_LINK);
+		gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					kind & typemask, /* TODO: cbuf? */ NULL);
+	      }
+	    else
+	      {
+		tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		tgt_size += da->data_row_size;
+		not_found_cnt++;
+	      }
+	  }
+	row_start += da->data_row_num;
+      }
+
   if (devaddrs)
     {
       if (mapnum != 1)
@@ -817,6 +1062,15 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 	      default:
 		break;
 	      }
+
+	    if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	      {
+		tgt->list[i].key = &array->key;
+		tgt->list[i].key->tgt = tgt;
+		array++;
+		continue;
+	      }
+
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
 	    if (!GOMP_MAP_POINTER_P (kind & typemask))
@@ -965,8 +1219,113 @@ gomp_map_vars (struct gomp_device_descr *devicep,
 		array++;
 	      }
 	  }
+
+      /* Processing of dynamic array rows.  */
+      if (process_dynarrays)
+	{
+	  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+	    {
+	      int kind = get_kind (short_mapkind, kinds, i);
+	      if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+		continue;
+
+	      struct da_info *da = &da_info[da_index++];
+	      assert (da->descr == hostaddrs[i]);
+
+	      /* The map for the dynamic array itself is never copied from during
+		 unmapping, its the data rows that count. Set copy from flags are
+		 set to false here.  */
+	      tgt->list[i].copy_from = false;
+	      tgt->list[i].always_copy_from = false;
+
+	      size_t align = (size_t) 1 << (kind >> rshift);
+	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+
+	      /* For the map of the dynamic array itself, adjust so that the passed
+		 device address points to the beginning of the ptrblock.  */
+	      tgt->list[i].key->tgt_offset = tgt_size;
+
+	      void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
+	      tgt_size += da->ptrblock_size;
+
+	      /* Add splay key for each data row in current DA.  */
+	      for (size_t j = 0; j < da->data_row_num; j++)
+		{
+		  row = host_data_rows[row_start + j];
+		  row_desc = &tgt->list[mapnum + row_start + j];
+
+		  cur_node.host_start = (uintptr_t) row;
+		  cur_node.host_end = cur_node.host_start + da->data_row_size;
+		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+		  if (n)
+		    {
+		      assert (n->refcount != REFCOUNT_LINK);
+		      gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					      kind & typemask, cbufp);
+		      target_row_addr = n->tgt->tgt_start + n->tgt_offset;
+		    }
+		  else
+		    {
+		      tgt->refcount++;
+
+		      splay_tree_key k = &array->key;
+		      k->host_start = (uintptr_t) row;
+		      k->host_end = k->host_start + da->data_row_size;
+
+		      k->tgt = tgt;
+		      k->refcount = 1;
+		      k->link_key = NULL;
+		      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		      target_row_addr = tgt->tgt_start + tgt_size;
+		      k->tgt_offset = tgt_size;
+		      tgt_size += da->data_row_size;
+
+		      row_desc->key = k;
+		      row_desc->copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->always_copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->offset = 0;
+		      row_desc->length = da->data_row_size;
+
+		      array->left = NULL;
+		      array->right = NULL;
+		      splay_tree_insert (mem_map, array);
+
+		      if (GOMP_MAP_COPY_TO_P (kind & typemask))
+			gomp_copy_host2dev (devicep,
+					    (void *) tgt->tgt_start + k->tgt_offset,
+					    (void *) k->host_start,
+					    da->data_row_size, cbufp);
+		      array++;
+		    }
+		  target_data_rows[row_start + j] = (void *) target_row_addr;
+		}
+
+	      /* Now we have the target memory allocated, and target offsets of all
+		 row blocks assigned and calculated, we can construct the
+		 accelerator side ptrblock and copy it in.  */
+	      if (da->ptrblock_size)
+		{
+		  void *ptrblock = gomp_dynamic_array_create_ptrblock
+		    (da, target_ptrblock, target_data_rows + row_start);
+		  gomp_copy_host2dev (devicep, target_ptrblock, ptrblock,
+				      da->ptrblock_size, cbufp);
+		  free (ptrblock);
+		}
+
+	      row_start += da->data_row_num;
+	    }
+	  assert (row_start == da_data_row_num && da_index == da_info_num);
+	}
     }
 
+  if (da_data_row_num)
+    {
+      free (host_data_rows);
+      free (target_data_rows);
+    }
+
   if (pragma_kind == GOMP_MAP_VARS_TARGET)
     {
       for (i = 0; i < mapnum; i++)
--- trunk-orig/libgomp/target.c	2018-12-12 18:19:51.020618265 +0800
+++ trunk-work/libgomp/target.c	2018-12-12 22:05:49.197617036 +0800
@@ -477,26 +477,37 @@
   return tgt->tgt_start + tgt->list[i].offset;
 }
 
-/* Dynamic array related data structures, interfaces with the compiler.  */
+/* Definitions for data structures describing dynamic, non-contiguous arrays
+   (Note: interfaces with compiler)
 
-struct da_dim {
+   The compiler generates a descriptor for each such array, places the
+   descriptor on stack, and passes the address of the descriptor to the libgomp
+   runtime as a normal map argument. The runtime then processes the array
+   data structure setup, and replaces the argument with the new actual
+   array address for the child function.
+
+   Care must be taken such that the struct field and layout assumptions
+   of struct gomp_array_dim, gomp_array_descr_type inside the compiler
+   be consistant with the below declarations.  */
+
+struct gomp_array_dim {
   size_t base;
   size_t length;
   size_t elem_size;
   size_t is_array;
 };
 
-struct da_descr_type {
+struct gomp_array_descr_type {
   void *ptr;
   size_t ndims;
-  struct da_dim dims[];
+  struct gomp_array_dim dims[];
 };
 
 /* Internal dynamic array info struct, used only here inside the runtime. */
 
 struct da_info
 {
-  struct da_descr_type *descr;
+  struct gomp_array_descr_type *descr;
   size_t map_index;
   size_t ptrblock_size;
   size_t data_row_num;
@@ -504,7 +515,7 @@
 };
 
 static size_t
-gomp_dynamic_array_count_rows (struct da_descr_type *descr)
+gomp_dynamic_array_count_rows (struct gomp_array_descr_type *descr)
 {
   size_t nrows = 1;
   for (size_t d = 0; d < descr->ndims - 1; d++)
@@ -516,7 +527,7 @@
 gomp_dynamic_array_compute_info (struct da_info *da)
 {
   size_t d, n = 1;
-  struct da_descr_type *descr = da->descr;
+  struct gomp_array_descr_type *descr = da->descr;
 
   da->ptrblock_size = 0;
   for (d = 0; d < descr->ndims - 1; d++)
@@ -532,7 +543,7 @@
 }
 
 static void
-gomp_dynamic_array_fill_rows_1 (struct da_descr_type *descr, void *da,
+gomp_dynamic_array_fill_rows_1 (struct gomp_array_descr_type *descr, void *da,
 				size_t d, void ***row_ptr, size_t *count)
 {
   if (d < descr->ndims - 1)
@@ -558,7 +569,7 @@
 }
 
 static size_t
-gomp_dynamic_array_fill_rows (struct da_descr_type *descr, void *rows[])
+gomp_dynamic_array_fill_rows (struct gomp_array_descr_type *descr, void *rows[])
 {
   size_t count = 0;
   void **p = rows;
@@ -570,7 +581,7 @@
 gomp_dynamic_array_create_ptrblock (struct da_info *da,
 				    void *tgt_addr, void *tgt_data_rows[])
 {
-  struct da_descr_type *descr = da->descr;
+  struct gomp_array_descr_type *descr = da->descr;
   void *ptrblock = gomp_malloc (da->ptrblock_size);
   void **curr_dim_ptrblock = (void **) ptrblock;
   size_t n = 1;
@@ -624,6 +635,7 @@
   struct splay_tree_key_s cur_node;
   struct target_mem_desc *tgt;
 
+  bool process_dynarrays = false;
   size_t da_data_row_num = 0, row_start = 0;
   size_t da_info_num = 0, da_index;
   struct da_info *da_info = NULL;
@@ -632,16 +644,23 @@
   void **host_data_rows = NULL, **target_data_rows = NULL;
   void *row;
 
-  for (i = 0; i < mapnum; i++)
+  if (mapnum > 0)
     {
-      int kind = get_kind (short_mapkind, kinds, i);
-      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
-	{
-	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
-	  da_info_num += 1;
-	}
+      int kind = get_kind (short_mapkind, kinds, 0);
+      process_dynarrays = GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask);
     }
 
+  if (process_dynarrays)
+    for (i = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	  {
+	    da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
+	    da_info_num += 1;
+	  }
+      }
+
   tgt = gomp_malloc (sizeof (*tgt)
 		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
   tgt->list_count = mapnum + da_data_row_num;
@@ -777,7 +796,7 @@
 	  not_found_cnt++;
 
 	  struct da_info *da = &da_info[da_index++];
-	  da->descr = (struct da_descr_type *) hostaddrs[i];
+	  da->descr = (struct gomp_array_descr_type *) hostaddrs[i];
 	  da->map_index = i;
 	  continue;
 	}
@@ -852,52 +871,53 @@
 
   /* For dynamic arrays. Each data row is one target item, separated from
      the normal map clause items, hence we order them after mapnum.  */
-  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
-    {
-      int kind = get_kind (short_mapkind, kinds, i);
-      if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
-	continue;
-
-      struct da_info *da = &da_info[da_index++];
-      struct da_descr_type *descr = da->descr;
-      size_t nr;
-
-      gomp_dynamic_array_compute_info (da);
-
-      /* We have allocated space in host/target_data_rows to place all the
-	 row data block pointers, now we can start filling them in.  */
-      nr = gomp_dynamic_array_fill_rows (descr, &host_data_rows[row_start]);
-      assert (nr == da->data_row_num);
+  if (process_dynarrays)
+    for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+      {
+	int kind = get_kind (short_mapkind, kinds, i);
+	if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	  continue;
 
-      size_t align = (size_t) 1 << (kind >> rshift);
-      if (tgt_align < align)
-	tgt_align = align;
-      tgt_size = (tgt_size + align - 1) & ~(align - 1);
-      tgt_size += da->ptrblock_size;
+	struct da_info *da = &da_info[da_index++];
+	struct gomp_array_descr_type *descr = da->descr;
+	size_t nr;
+
+	gomp_dynamic_array_compute_info (da);
+
+	/* We have allocated space in host/target_data_rows to place all the
+	   row data block pointers, now we can start filling them in.  */
+	nr = gomp_dynamic_array_fill_rows (descr, &host_data_rows[row_start]);
+	assert (nr == da->data_row_num);
+
+	size_t align = (size_t) 1 << (kind >> rshift);
+	if (tgt_align < align)
+	  tgt_align = align;
+	tgt_size = (tgt_size + align - 1) & ~(align - 1);
+	tgt_size += da->ptrblock_size;
 
-      for (size_t j = 0; j < da->data_row_num; j++)
-	{
-	  row = host_data_rows[row_start + j];
-	  row_desc = &tgt->list[mapnum + row_start + j];
+	for (size_t j = 0; j < da->data_row_num; j++)
+	  {
+	    row = host_data_rows[row_start + j];
+	    row_desc = &tgt->list[mapnum + row_start + j];
 
-	  cur_node.host_start = (uintptr_t) row;
-	  cur_node.host_end = cur_node.host_start + da->data_row_size;
-	  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
-	  if (n)
-	    {
-	      assert (n->refcount != REFCOUNT_LINK);
-	      gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
-				      kind & typemask, /* TODO: cbuf? */ NULL);
-	    }
-	  else
-	    {
-	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
-	      tgt_size += da->data_row_size;
-	      not_found_cnt++;
-	    }
-	}
-      row_start += da->data_row_num;
-    }
+	    cur_node.host_start = (uintptr_t) row;
+	    cur_node.host_end = cur_node.host_start + da->data_row_size;
+	    splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	    if (n)
+	      {
+		assert (n->refcount != REFCOUNT_LINK);
+		gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					kind & typemask, /* TODO: cbuf? */ NULL);
+	      }
+	    else
+	      {
+		tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		tgt_size += da->data_row_size;
+		not_found_cnt++;
+	      }
+	  }
+	row_start += da->data_row_num;
+      }
 
   if (devaddrs)
     {
@@ -1201,100 +1221,103 @@
 	  }
 
       /* Processing of dynamic array rows.  */
-      for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+      if (process_dynarrays)
 	{
-	  int kind = get_kind (short_mapkind, kinds, i);
-	  if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
-	    continue;
+	  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+	    {
+	      int kind = get_kind (short_mapkind, kinds, i);
+	      if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+		continue;
 
-	  struct da_info *da = &da_info[da_index++];
-	  assert (da->descr == hostaddrs[i]);
+	      struct da_info *da = &da_info[da_index++];
+	      assert (da->descr == hostaddrs[i]);
 
-	  /* The map for the dynamic array itself is never copied from during
-	     unmapping, its the data rows that count. Set copy from flags are
-	     set to false here.  */
-	  tgt->list[i].copy_from = false;
-	  tgt->list[i].always_copy_from = false;
+	      /* The map for the dynamic array itself is never copied from during
+		 unmapping, its the data rows that count. Set copy from flags are
+		 set to false here.  */
+	      tgt->list[i].copy_from = false;
+	      tgt->list[i].always_copy_from = false;
 
-	  size_t align = (size_t) 1 << (kind >> rshift);
-	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
-
-	  /* For the map of the dynamic array itself, adjust so that the passed
-	     device address points to the beginning of the ptrblock.  */
-	  tgt->list[i].key->tgt_offset = tgt_size;
+	      size_t align = (size_t) 1 << (kind >> rshift);
+	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
 
-	  void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
-	  tgt_size += da->ptrblock_size;
+	      /* For the map of the dynamic array itself, adjust so that the passed
+		 device address points to the beginning of the ptrblock.  */
+	      tgt->list[i].key->tgt_offset = tgt_size;
 
-	  /* Add splay key for each data row in current DA.  */
-	  for (size_t j = 0; j < da->data_row_num; j++)
-	    {
-	      row = host_data_rows[row_start + j];
-	      row_desc = &tgt->list[mapnum + row_start + j];
+	      void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
+	      tgt_size += da->ptrblock_size;
 
-	      cur_node.host_start = (uintptr_t) row;
-	      cur_node.host_end = cur_node.host_start + da->data_row_size;
-	      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
-	      if (n)
-		{
-		  assert (n->refcount != REFCOUNT_LINK);
-		  gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
-					  kind & typemask, cbufp);
-		  target_row_addr = n->tgt->tgt_start + n->tgt_offset;
-		}
-	      else
+	      /* Add splay key for each data row in current DA.  */
+	      for (size_t j = 0; j < da->data_row_num; j++)
 		{
-		  tgt->refcount++;
+		  row = host_data_rows[row_start + j];
+		  row_desc = &tgt->list[mapnum + row_start + j];
 
-		  splay_tree_key k = &array->key;
-		  k->host_start = (uintptr_t) row;
-		  k->host_end = k->host_start + da->data_row_size;
-
-		  k->tgt = tgt;
-		  k->refcount = 1;
-		  k->link_key = NULL;
-		  tgt_size = (tgt_size + align - 1) & ~(align - 1);
-		  target_row_addr = tgt->tgt_start + tgt_size;
-		  k->tgt_offset = tgt_size;
-		  tgt_size += da->data_row_size;
-
-		  row_desc->key = k;
-		  row_desc->copy_from
-		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
-		  row_desc->always_copy_from
-		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
-		  row_desc->offset = 0;
-		  row_desc->length = da->data_row_size;
-
-		  array->left = NULL;
-		  array->right = NULL;
-		  splay_tree_insert (mem_map, array);
+		  cur_node.host_start = (uintptr_t) row;
+		  cur_node.host_end = cur_node.host_start + da->data_row_size;
+		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+		  if (n)
+		    {
+		      assert (n->refcount != REFCOUNT_LINK);
+		      gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					      kind & typemask, cbufp);
+		      target_row_addr = n->tgt->tgt_start + n->tgt_offset;
+		    }
+		  else
+		    {
+		      tgt->refcount++;
 
-		  if (GOMP_MAP_COPY_TO_P (kind & typemask))
-		    gomp_copy_host2dev (devicep,
-					(void *) tgt->tgt_start + k->tgt_offset,
-					(void *) k->host_start,
-					da->data_row_size, cbufp);
-		  array++;
+		      splay_tree_key k = &array->key;
+		      k->host_start = (uintptr_t) row;
+		      k->host_end = k->host_start + da->data_row_size;
+
+		      k->tgt = tgt;
+		      k->refcount = 1;
+		      k->link_key = NULL;
+		      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		      target_row_addr = tgt->tgt_start + tgt_size;
+		      k->tgt_offset = tgt_size;
+		      tgt_size += da->data_row_size;
+
+		      row_desc->key = k;
+		      row_desc->copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->always_copy_from
+			= GOMP_MAP_COPY_FROM_P (kind & typemask);
+		      row_desc->offset = 0;
+		      row_desc->length = da->data_row_size;
+
+		      array->left = NULL;
+		      array->right = NULL;
+		      splay_tree_insert (mem_map, array);
+
+		      if (GOMP_MAP_COPY_TO_P (kind & typemask))
+			gomp_copy_host2dev (devicep,
+					    (void *) tgt->tgt_start + k->tgt_offset,
+					    (void *) k->host_start,
+					    da->data_row_size, cbufp);
+		      array++;
+		    }
+		  target_data_rows[row_start + j] = (void *) target_row_addr;
 		}
-	      target_data_rows[row_start + j] = (void *) target_row_addr;
-	    }
 
-	  /* Now we have the target memory allocated, and target offsets of all
-	     row blocks assigned and calculated, we can construct the
-	     accelerator side ptrblock and copy it in.  */
-	  if (da->ptrblock_size)
-	    {
-	      void *ptrblock = gomp_dynamic_array_create_ptrblock
-		(da, target_ptrblock, target_data_rows + row_start);
-	      gomp_copy_host2dev (devicep, target_ptrblock, ptrblock,
-				  da->ptrblock_size, cbufp);
-	      free (ptrblock);
-	    }
+	      /* Now we have the target memory allocated, and target offsets of all
+		 row blocks assigned and calculated, we can construct the
+		 accelerator side ptrblock and copy it in.  */
+	      if (da->ptrblock_size)
+		{
+		  void *ptrblock = gomp_dynamic_array_create_ptrblock
+		    (da, target_ptrblock, target_data_rows + row_start);
+		  gomp_copy_host2dev (devicep, target_ptrblock, ptrblock,
+				      da->ptrblock_size, cbufp);
+		  free (ptrblock);
+		}
 
-	  row_start += da->data_row_num;
+	      row_start += da->data_row_num;
+	    }
+	  assert (row_start == da_data_row_num && da_index == da_info_num);
 	}
-      assert (row_start == da_data_row_num && da_index == da_info_num);
     }
 
   if (da_data_row_num)
diff mbox series

Patch

diff --git a/libgomp/target.c b/libgomp/target.c
index 4c9fae0..071dc70 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -490,6 +490,140 @@  gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
   return tgt->tgt_start + tgt->list[i].offset;
 }
 
+/* Dynamic array related data structures, interfaces with the compiler.  */
+
+struct da_dim {
+  size_t base;
+  size_t length;
+  size_t elem_size;
+  size_t is_array;
+};
+
+struct da_descr_type {
+  void *ptr;
+  size_t ndims;
+  struct da_dim dims[];
+};
+
+/* Internal dynamic array info struct, used only here inside the runtime. */
+
+struct da_info
+{
+  struct da_descr_type *descr;
+  size_t map_index;
+  size_t ptrblock_size;
+  size_t data_row_num;
+  size_t data_row_size;
+};
+
+static size_t
+gomp_dynamic_array_count_rows (struct da_descr_type *descr)
+{
+  size_t nrows = 1;
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    nrows *= descr->dims[d].length / sizeof (void *);
+  return nrows;
+}
+
+static void
+gomp_dynamic_array_compute_info (struct da_info *da)
+{
+  size_t d, n = 1;
+  struct da_descr_type *descr = da->descr;
+
+  da->ptrblock_size = 0;
+  for (d = 0; d < descr->ndims - 1; d++)
+    {
+      size_t dim_count = descr->dims[d].length / descr->dims[d].elem_size;
+      size_t dim_ptrblock_size = (descr->dims[d + 1].is_array
+				  ? 0 : descr->dims[d].length * n);
+      da->ptrblock_size += dim_ptrblock_size;
+      n *= dim_count;
+    }
+  da->data_row_num = n;
+  da->data_row_size = descr->dims[d].length;
+}
+
+static void
+gomp_dynamic_array_fill_rows_1 (struct da_descr_type *descr, void *da,
+				size_t d, void ***row_ptr, size_t *count)
+{
+  if (d < descr->ndims - 1)
+    {
+      size_t elsize = descr->dims[d].elem_size;
+      size_t n = descr->dims[d].length / elsize;
+      void *p = da + descr->dims[d].base;
+      for (size_t i = 0; i < n; i++)
+	{
+	  void *ptr = p + i * elsize;
+	  /* Deref if next dimension is not array.  */
+	  if (!descr->dims[d + 1].is_array)
+	    ptr = *((void **) ptr);
+	  gomp_dynamic_array_fill_rows_1 (descr, ptr, d + 1, row_ptr, count);
+	}
+    }
+  else
+    {
+      **row_ptr = da + descr->dims[d].base;
+      *row_ptr += 1;
+      *count += 1;
+    }
+}
+
+static size_t
+gomp_dynamic_array_fill_rows (struct da_descr_type *descr, void *rows[])
+{
+  size_t count = 0;
+  void **p = rows;
+  gomp_dynamic_array_fill_rows_1 (descr, descr->ptr, 0, &p, &count);
+  return count;
+}
+
+static void *
+gomp_dynamic_array_create_ptrblock (struct da_info *da,
+				    void *tgt_addr, void *tgt_data_rows[])
+{
+  struct da_descr_type *descr = da->descr;
+  void *ptrblock = gomp_malloc (da->ptrblock_size);
+  void **curr_dim_ptrblock = (void **) ptrblock;
+  size_t n = 1;
+
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    {
+      int curr_dim_len = descr->dims[d].length;
+      int next_dim_len = descr->dims[d + 1].length;
+      int curr_dim_num = curr_dim_len / sizeof (void *);
+
+      void *next_dim_ptrblock
+	= (void *)(curr_dim_ptrblock + n * curr_dim_num);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < curr_dim_num; i++)
+	  {
+	    if (d < descr->ndims - 2)
+	      {
+		void *ptr = (next_dim_ptrblock
+			     + b * curr_dim_num * next_dim_len
+			     + i * next_dim_len);
+		void *tgt_ptr = tgt_addr + (ptr - ptrblock);
+		curr_dim_ptrblock[b * curr_dim_num + i] = tgt_ptr;
+	      }
+	    else
+	      {
+		curr_dim_ptrblock[b * curr_dim_num + i]
+		  = tgt_data_rows[b * curr_dim_num + i];
+	      }
+	    void *addr = &curr_dim_ptrblock[b * curr_dim_num + i];
+	    assert (ptrblock <= addr && addr < ptrblock + da->ptrblock_size);
+	  }
+
+      n *= curr_dim_num;
+      curr_dim_ptrblock = next_dim_ptrblock;
+    }
+  assert (n == da->data_row_num);
+  return ptrblock;
+}
+
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
@@ -501,9 +635,29 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
-  struct target_mem_desc *tgt
-    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
-  tgt->list_count = mapnum;
+  struct target_mem_desc *tgt;
+
+  size_t da_data_row_num = 0, row_start = 0;
+  size_t da_info_num = 0, da_index;
+  struct da_info *da_info = NULL;
+  struct target_var_desc *row_desc;
+  uintptr_t target_row_addr;
+  void **host_data_rows = NULL, **target_data_rows = NULL;
+  void *row;
+
+  for (i = 0; i < mapnum; i++)
+    {
+      int kind = get_kind (short_mapkind, kinds, i);
+      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	{
+	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
+	  da_info_num += 1;
+	}
+    }
+
+  tgt = gomp_malloc (sizeof (*tgt)
+		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
+  tgt->list_count = mapnum + da_data_row_num;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
@@ -515,6 +669,14 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       return tgt;
     }
 
+  if (da_info_num)
+    da_info = gomp_alloca (sizeof (struct da_info) * da_info_num);
+  if (da_data_row_num)
+    {
+      host_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+      target_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+    }
+
   tgt_align = sizeof (void *);
   tgt_size = 0;
   cbuf.chunks = NULL;
@@ -546,7 +708,7 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
       return NULL;
     }
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0, da_index = 0; i < mapnum; i++)
     {
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL
@@ -619,6 +781,20 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	  has_firstprivate = true;
 	  continue;
 	}
+      else if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	{
+	  /* Ignore dynamic arrays for now, we process them together
+	     later.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = 0;
+	  not_found_cnt++;
+
+	  struct da_info *da = &da_info[da_index++];
+	  da->descr = (struct da_descr_type *) hostaddrs[i];
+	  da->map_index = i;
+	  continue;
+	}
+
       cur_node.host_start = (uintptr_t) hostaddrs[i];
       if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
@@ -687,6 +863,55 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
+  /* For dynamic arrays. Each data row is one target item, separated from
+     the normal map clause items, hence we order them after mapnum.  */
+  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+    {
+      int kind = get_kind (short_mapkind, kinds, i);
+      if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	continue;
+
+      struct da_info *da = &da_info[da_index++];
+      struct da_descr_type *descr = da->descr;
+      size_t nr;
+
+      gomp_dynamic_array_compute_info (da);
+
+      /* We have allocated space in host/target_data_rows to place all the
+	 row data block pointers, now we can start filling them in.  */
+      nr = gomp_dynamic_array_fill_rows (descr, &host_data_rows[row_start]);
+      assert (nr == da->data_row_num);
+
+      size_t align = (size_t) 1 << (kind >> rshift);
+      if (tgt_align < align)
+	tgt_align = align;
+      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+      tgt_size += da->ptrblock_size;
+
+      for (size_t j = 0; j < da->data_row_num; j++)
+	{
+	  row = host_data_rows[row_start + j];
+	  row_desc = &tgt->list[mapnum + row_start + j];
+
+	  cur_node.host_start = (uintptr_t) row;
+	  cur_node.host_end = cur_node.host_start + da->data_row_size;
+	  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	  if (n)
+	    {
+	      assert (n->refcount != REFCOUNT_LINK);
+	      gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+				      kind & typemask, /* TODO: cbuf? */ NULL);
+	    }
+	  else
+	    {
+	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+	      tgt_size += da->data_row_size;
+	      not_found_cnt++;
+	    }
+	}
+      row_start += da->data_row_num;
+    }
+
   if (devaddrs)
     {
       if (mapnum != 1)
@@ -830,6 +1055,15 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	      default:
 		break;
 	      }
+
+	    if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	      {
+		tgt->list[i].key = &array->key;
+		tgt->list[i].key->tgt = tgt;
+		array++;
+		continue;
+	      }
+
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
 	    if (!GOMP_MAP_POINTER_P (kind & typemask))
@@ -976,6 +1210,108 @@  gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		array++;
 	      }
 	  }
+
+      /* Processing of dynamic array rows.  */
+      for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+	{
+	  int kind = get_kind (short_mapkind, kinds, i);
+	  if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	    continue;
+
+	  struct da_info *da = &da_info[da_index++];
+	  assert (da->descr == hostaddrs[i]);
+
+	  /* The map for the dynamic array itself is never copied from during
+	     unmapping, its the data rows that count. Set copy from flags are
+	     set to false here.  */
+	  tgt->list[i].copy_from = false;
+	  tgt->list[i].always_copy_from = false;
+
+	  size_t align = (size_t) 1 << (kind >> rshift);
+	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
+
+	  /* For the map of the dynamic array itself, adjust so that the passed
+	     device address points to the beginning of the ptrblock.  */
+	  tgt->list[i].key->tgt_offset = tgt_size;
+
+	  void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
+	  tgt_size += da->ptrblock_size;
+
+	  /* Add splay key for each data row in current DA.  */
+	  for (size_t j = 0; j < da->data_row_num; j++)
+	    {
+	      row = host_data_rows[row_start + j];
+	      row_desc = &tgt->list[mapnum + row_start + j];
+
+	      cur_node.host_start = (uintptr_t) row;
+	      cur_node.host_end = cur_node.host_start + da->data_row_size;
+	      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	      if (n)
+		{
+		  assert (n->refcount != REFCOUNT_LINK);
+		  gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					  kind & typemask, cbufp);
+		  target_row_addr = n->tgt->tgt_start + n->tgt_offset;
+		}
+	      else
+		{
+		  tgt->refcount++;
+
+		  splay_tree_key k = &array->key;
+		  k->host_start = (uintptr_t) row;
+		  k->host_end = k->host_start + da->data_row_size;
+
+		  k->tgt = tgt;
+		  k->refcount = 1;
+		  k->link_key = NULL;
+		  tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		  target_row_addr = tgt->tgt_start + tgt_size;
+		  k->tgt_offset = tgt_size;
+		  tgt_size += da->data_row_size;
+
+		  row_desc->key = k;
+		  row_desc->copy_from
+		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		  row_desc->always_copy_from
+		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		  row_desc->offset = 0;
+		  row_desc->length = da->data_row_size;
+
+		  array->left = NULL;
+		  array->right = NULL;
+		  splay_tree_insert (mem_map, array);
+
+		  if (GOMP_MAP_COPY_TO_P (kind & typemask))
+		    gomp_copy_host2dev (devicep,
+					(void *) tgt->tgt_start + k->tgt_offset,
+					(void *) k->host_start,
+					da->data_row_size, cbufp);
+		  array++;
+		}
+	      target_data_rows[row_start + j] = (void *) target_row_addr;
+	    }
+
+	  /* Now we have the target memory allocated, and target offsets of all
+	     row blocks assigned and calculated, we can construct the
+	     accelerator side ptrblock and copy it in.  */
+	  if (da->ptrblock_size)
+	    {
+	      void *ptrblock = gomp_dynamic_array_create_ptrblock
+		(da, target_ptrblock, target_data_rows + row_start);
+	      gomp_copy_host2dev (devicep, target_ptrblock, ptrblock,
+				  da->ptrblock_size, cbufp);
+	      free (ptrblock);
+	    }
+
+	  row_start += da->data_row_num;
+	}
+      assert (row_start == da_data_row_num && da_index == da_info_num);
+    }
+
+  if (da_data_row_num)
+    {
+      free (host_data_rows);
+      free (target_data_rows);
     }
 
   if (pragma_kind == GOMP_MAP_VARS_TARGET)