diff mbox series

[OpenMP,5.0] Implement structure element mapping changes in 5.0

Message ID 8b9a99a1-c9ba-7f5d-fad1-6f90c23e9e4c@codesourcery.com
State New
Headers show
Series [OpenMP,5.0] Implement structure element mapping changes in 5.0 | expand

Commit Message

Chung-Lin Tang Oct. 23, 2020, 10:24 a.m. UTC
Hi Jakub,
this patch set implements more of OpenMP 5.0 mapping, specifically this part in 2.19.7.1 map Clause:

"If a list item in a map clause is a structure element then all other structure elements of the
  containing structure variable form a structure sibling list. The map clause and the structure sibling
  list are associated with the same construct. If a corresponding list item of the structure sibling list
  item is present in the device data environment when the construct is encountered then:

  * If the 1 structure sibling list item does not appear in a map clause on the construct then:
    – If the construct is a target, target data, or target enter data construct then the
      structure sibling list item is treated as if it is a list item in a map clause on the construct with a
      map-type of alloc.
    – If the construct is target exit data construct, then the structure sibling list item is treated
      as if it is a list item in a map clause on the construct with a map-type of release."

While really wordy, I believe this simply means that maps of structure element fields have their reference
counts increased/decreased in a uniform fashion, i.e. the are alloc/release'd together, instead of
having parts of the structure possibly deallocated while others still exist on the device.

In general, upon encountering a construct, we can't statically determine and insert alloc/release maps
for each element of a structure variable, since we don't really know which region of the structure is
currently mapped or not, hence this probably can't be properly implemented in the compiler.

Instead this patch tries to do the equivalent in the runtime: I've modified the handling of the
(GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to:

   (1) Create just a single splay_tree_key to represent the entire structure's mapped-region
       (all element target_var_desc's now reference this same key instead of creating their own), and

   (2) Associated an increment/decrement of the splay_tree_key refcount only with the leading GOMP_MAP_STRUCT,
       not with each individual struct element, e.g. those element tgt_var_desc's mainly are now only used to
       execute the host<->device copying logic.

This implies that, GOMP_MAP_STRUCT is needed also in "exit data" directives too, so a small patch in
gcc/gimplify.c has been made to NOT remove this map for OpenMP target exit data. OpenACC has not been touched.

(There are some parts of the libgomp changes with are related to differentiating OpenMP/OpenACC cases,
which also exists in the last 5.0-mapping patch, also included here for self-completeness)

This patch contains three libgomp testcases, the first one of which was also included in the last 5.0-mapping
patch set as an XFAIL, but now passes with this patch.

Tobias' had an earlier issue with Fortran arrays (I forgot which kind it was called) where sub-struct fields
clashed with each other, causing a libgomp runtime mapping fail. That problem should be fixed with this
patch, since the capturing of all tgt_var_desc's key references into a single splay_tree_key inherently avoids
the multiple overlapping key behavior. I have lightly tested Tobias' testcase he gave me earlier on this,
and this part of the issue appears to be solved, however it still needs the first 5.0-mapping patch combined
with this patch to completely work, since the Fortran array struct needs pointer-attachment/detachment of the
data to really work before and after the target region.

This patch has been tested on x86_64-linux with nvptx offloading with no regressions, also currently testing
for powerpc64le-linux, seeking approval for trunk.

(BTW Jakub, thanks for your review of the other first patch set, I will be working on that revision next).

Thanks,
Chung-Lin

2020-10-23  Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* gimplify.c (gimplify_adjust_omp_clauses): Do not remove
	GOMP_MAP_STRUCT clauses for OpenMP target exit data constructs.

	libgomp/
	* libgomp.h (struct target_var_desc): New 'bool is_struct' field,
	update comments for 'length' field.
	(enum gomp_map_vars_kind): Adjust enum values to be bit-flag
	usable.
	* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
	'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
	(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
	(goacc_enter_data_internal): Likewise.
	* target.c (gomp_map_vars_existing): Add 'bool inc_ref' parameter,
	conditionalize refcount increase on inc_ref. Initialize is_struct
	field for tgt_var.
	(gomp_map_fields_existing): Add 'bool fld_inc_ref' parameter, adjust
	calls to gomp_map_vars_existing.
	(gomp_map_vars_internal): Change checks of GOMP_MAP_VARS_ENTER_DATA
	to use bit-and (&). Adjust OpenMP handling of GOMP_MAP_STRUCT.
	(gomp_var_unref): New function, factored from code in
	gomp_unmap_vars_internal.
	(gomp_var_copy_back): Likewise.
	(gomp_var_unref_tgt): Likewise.
	(gomp_unmap_vars_internal): Reorganize unmapping logic into above
	three functions, handle case when 'is_struct' is true.
	(gomp_exit_data): Handle GOMP_MAP_STRUCT.

	* testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase.
	* testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase.

Comments

Jakub Jelinek Oct. 23, 2020, 12:13 p.m. UTC | #1
On Fri, Oct 23, 2020 at 06:24:20PM +0800, Chung-Lin Tang wrote:
> this patch set implements more of OpenMP 5.0 mapping, specifically this part in 2.19.7.1 map Clause:
> 
> "If a list item in a map clause is a structure element then all other structure elements of the
>  containing structure variable form a structure sibling list. The map clause and the structure sibling
>  list are associated with the same construct. If a corresponding list item of the structure sibling list
>  item is present in the device data environment when the construct is encountered then:
> 
>  * If the 1 structure sibling list item does not appear in a map clause on the construct then:
>    – If the construct is a target, target data, or target enter data construct then the
>      structure sibling list item is treated as if it is a list item in a map clause on the construct with a
>      map-type of alloc.
>    – If the construct is target exit data construct, then the structure sibling list item is treated
>      as if it is a list item in a map clause on the construct with a map-type of release."
> 
> While really wordy, I believe this simply means that maps of structure element fields have their reference
> counts increased/decreased in a uniform fashion, i.e. the are alloc/release'd together, instead of
> having parts of the structure possibly deallocated while others still exist on the device.

I think part of the reason for the above wording being so long is the
declare mapper stuff which caused the mapping of the whole struct to be
implicitly treated as mapping of all the elements individually (which we
don't want to actually implement that way unless we have to (e.g. due to
references in there, different declare mappers etc.)).

> In general, upon encountering a construct, we can't statically determine and insert alloc/release maps
> for each element of a structure variable, since we don't really know which region of the structure is
> currently mapped or not, hence this probably can't be properly implemented in the compiler.
> 
> Instead this patch tries to do the equivalent in the runtime: I've modified the handling of the
> (GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to:
> 
>   (1) Create just a single splay_tree_key to represent the entire structure's mapped-region
>       (all element target_var_desc's now reference this same key instead of creating their own), and

I'm not sure that is what we want.  If we create just a single
splay_tree_key spanning the whole structure mapped region, then we can't
diagnose various mapping errors.  E.g. if I have:
void bar (struct S *);
struct S { int a, b, c, d, e; };
void foo (struct S s)
{
  #pragma omp target data map(tofrom: s.b, s.d)
  #pragma omp target map (s.b, s.c)
  bar (&s);
}
then target data maps the &s.b to &s.d + 1 region of the struct, but s.c
wasn't mapped and so the target region's mapping should fail, even when it
is in the middle of the mapped region.

The structure mapping wording was written in a way to give implementations a
choice, either map the whole struct (space inefficient), or the region from
the first to last element in the struct the needs mapping (what GCC
implements, also space inefficient, but less so), or only map the fields
individually and somehow remap all uses of the struct in the region (I think
that is only theoretically possible if one can analyze the whole target
region and rewrite anything that could refer to it in there).

So, I'd think instead of having just one splay_tree_key, we need multiple
(we could merge adjacent ones though if we want) but we need some way to tie
them together (e.g. represent them as one master entry (perhaps the first one) and
slaves entries and use the refcount of the master entry for all of them.

There are other OpenMP 5.0 changes which are very tightly related to that
though, namely that OpenMP 4.5 disallowed mapping the same variable multiple
times in the same region and therefore had the simple rule that each mapping
bumps the refcount by one.  As OpenMP 5.0 dropped that, we have instead:
"If the corresponding list item’s reference count was not already incremented because of the
effect of a map clause on the construct then:
a) The corresponding list item’s reference count is incremented by one;"

So, additionally we need to ensure that we don't bump again refcounts we've
bumped already in the same GOMP_* call and similarly at the end of region
when unmapping.  Additionally there is the complication that for enter data
and exit data we call actually gomp_map_vars multiple times, so we'll need
to track it somehow even across those calls.

Though, the refcount is only on the target_mem_desc struct not on the
target_var_desc.  But we need to bump the refcount for each separate
target_var_desc in there unless it is one of these slave entries, otherwise
e.g.
#pragma omp target data map (x, y, z)
{
#pragma omp target enter data map (to: x)
#pragma omp target enter data map (to: y)
#pragma omp target enter data map (to: z)
#pragma omp target exit data map (from: x, y, z)
}
would keep the target_mem_desc mapped when it shouldn't (if we'd just
increment or decrement refcount in each target_mem_desc once per construct
(well, twice for target data and target, once upon entry, once upon exit),
then the above would set refcount of the block containing all of x, y, z
to 1, then 2, 3, 4, then decrease to 3 and finally decrease to 2).

For data structures, perhaps change the 4 bool fields in target_var_desc
into bitfields, so that we don't grow the structure on 32-bit architectures
and add one bit for slave entries.  We'd need to ensure the keys for the
same GOMP_MAP_STRUCT are consecutive in the list[] array, but I think that
should be the case already, so finding the corresponding master would be
while (k->slave) k--;

And then dunno, perhaps push the addresses of all the target_var_desc that
should have refcount increased or decreased into a vector, qsort it and
only bump refcount on the first entry in the array or if the previous
address in the vector was different than the current one?  Or a hash table
recording what has been bumped already?  Though perhaps with some cheaper
way how to handle the most common case of only few mapped vars, because hash
table creation would be too expensive in that case?

As far as the merging of adjacent fields, perhaps that is something
that can be as optional optimization done in the compiler (gimplify.c)
when we have the stuff sorted for GOMP_MAP_STRUCT.  But we need to take
into account different mapping kinds.

I'm very sorry about this, I really appreciate your work on this.

	Jakub
Chung-Lin Tang Oct. 23, 2020, 5:43 p.m. UTC | #2
Hi Jakub, thanks for the prompt review.

On 2020/10/23 8:13 PM, Jakub Jelinek wrote:
>> In general, upon encountering a construct, we can't statically determine and insert alloc/release maps
>> for each element of a structure variable, since we don't really know which region of the structure is
>> currently mapped or not, hence this probably can't be properly implemented in the compiler.
>>
>> Instead this patch tries to do the equivalent in the runtime: I've modified the handling of the
>> (GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to:
>>
>>    (1) Create just a single splay_tree_key to represent the entire structure's mapped-region
>>        (all element target_var_desc's now reference this same key instead of creating their own), and
> I'm not sure that is what we want.  If we create just a single
> splay_tree_key spanning the whole structure mapped region, then we can't
> diagnose various mapping errors.  E.g. if I have:
> void bar (struct S *);
> struct S { int a, b, c, d, e; };
> void foo (struct S s)
> {
>    #pragma omp target data map(tofrom: s.b, s.d)
>    #pragma omp target map (s.b, s.c)
>    bar (&s);
> }
> then target data maps the &s.b to &s.d + 1 region of the struct, but s.c
> wasn't mapped and so the target region's mapping should fail, even when it
> is in the middle of the mapped region.

Are you really sure this is what we want? I don't quite see anything harmful
about implicitly mapping "middle fields" like s.c, in fact the corresponding
memory is actually "mapped" anyways.

> The structure mapping wording was written in a way to give implementations a
> choice, either map the whole struct (space inefficient), or the region from
> the first to last element in the struct the needs mapping (what GCC
> implements, also space inefficient, but less so), or only map the fields
> individually and somehow remap all uses of the struct in the region (I think
> that is only theoretically possible if one can analyze the whole target
> region and rewrite anything that could refer to it in there).

That seems to imply that rejecting "middle fields" are not really required
behavior.

> So, I'd think instead of having just one splay_tree_key, we need multiple
> (we could merge adjacent ones though if we want) but we need some way to tie
> them together (e.g. represent them as one master entry (perhaps the first one) and
> slaves entries and use the refcount of the master entry for all of them.

I did think of that route before, but it's just too complex and unwieldly compared
to an elegant solution like using one single splay_tree_key. I can try to think
more about a "composite-key" like design, but please reconsider the current patch.
It's already very close to the 5.0 spec, with what you mention not really "that"
large an issue.

Thanks,
Chung-Lin
Jakub Jelinek Oct. 26, 2020, 8:10 a.m. UTC | #3
On Sat, Oct 24, 2020 at 01:43:26AM +0800, Chung-Lin Tang wrote:
> On 2020/10/23 8:13 PM, Jakub Jelinek wrote:
> > > In general, upon encountering a construct, we can't statically determine and insert alloc/release maps
> > > for each element of a structure variable, since we don't really know which region of the structure is
> > > currently mapped or not, hence this probably can't be properly implemented in the compiler.
> > > 
> > > Instead this patch tries to do the equivalent in the runtime: I've modified the handling of the
> > > (GOMP_MAP_STRUCT, <field-map1>, <field-map2>, ...) sequence to:
> > > 
> > >    (1) Create just a single splay_tree_key to represent the entire structure's mapped-region
> > >        (all element target_var_desc's now reference this same key instead of creating their own), and
> > I'm not sure that is what we want.  If we create just a single
> > splay_tree_key spanning the whole structure mapped region, then we can't
> > diagnose various mapping errors.  E.g. if I have:
> > void bar (struct S *);
> > struct S { int a, b, c, d, e; };
> > void foo (struct S s)
> > {
> >    #pragma omp target data map(tofrom: s.b, s.d)
> >    #pragma omp target map (s.b, s.c)
> >    bar (&s);
> > }
> > then target data maps the &s.b to &s.d + 1 region of the struct, but s.c
> > wasn't mapped and so the target region's mapping should fail, even when it
> > is in the middle of the mapped region.
> 
> Are you really sure this is what we want? I don't quite see anything harmful
> about implicitly mapping "middle fields" like s.c, in fact the corresponding
> memory is actually "mapped" anyways.

Yes, it is a QoI and it is important not to regress about that.
Furthermore, the more we diverge from what the spec says, it will be harder
for us to implement, not just now, but in the future too.
What I wrote about the actual implementation is actually not accurate, we
need the master and slaves to be the struct splay_tree_key_s objects.
And that one already has the aux field that could be used for the slaves,
so we could e.g. use another magic value of refcount, e.g. REFCOUNT_SLAVE
~(uintptr_t) 2, and in that case aux would point to the master
splay_tree_key_s.

And the 
"If the corresponding list item’s reference count was not already incremented because of the
effect of a map clause on the construct then:
a) The corresponding list item’s reference count is incremented by one;"
and
"If the map-type is not delete and the corresponding list item’s reference count is finite and
was not already decremented because of the effect of a map clause on the construct then:
a) The corresponding list item’s reference count is decremented by one;"
rules we need to implement in any case, I don't see a way around that.
The same list item can now be mapped (or unmapped) multiple times on the same
construct.

	Jakub
Jakub Jelinek Oct. 30, 2020, 2:05 p.m. UTC | #4
On Mon, Oct 26, 2020 at 09:10:08AM +0100, Jakub Jelinek via Gcc-patches wrote:
> Yes, it is a QoI and it is important not to regress about that.
> Furthermore, the more we diverge from what the spec says, it will be harder
> for us to implement, not just now, but in the future too.
> What I wrote about the actual implementation is actually not accurate, we
> need the master and slaves to be the struct splay_tree_key_s objects.
> And that one already has the aux field that could be used for the slaves,
> so we could e.g. use another magic value of refcount, e.g. REFCOUNT_SLAVE
> ~(uintptr_t) 2, and in that case aux would point to the master
> splay_tree_key_s.
> 
> And the 
> "If the corresponding list item’s reference count was not already incremented because of the
> effect of a map clause on the construct then:
> a) The corresponding list item’s reference count is incremented by one;"
> and
> "If the map-type is not delete and the corresponding list item’s reference count is finite and
> was not already decremented because of the effect of a map clause on the construct then:
> a) The corresponding list item’s reference count is decremented by one;"
> rules we need to implement in any case, I don't see a way around that.
> The same list item can now be mapped (or unmapped) multiple times on the same
> construct.

To show up what exactly I meant, here is a proof of concept (but unfinished)
patch.
For OpenMP only (I believe OpenACC ATM doesn't have such concept of
structure sibling lists nor requirement as OpenMP 5.0 that on one construct
one refcount isn't incremented multiple times nor decremented multiple
times) it uses the dynamic_refcount field otherwise only used in OpenACC
for the structure sibling lists; in particular, all but the first mapping
in a structure sibling list will have refcount == REFCOUNT_SIBLING and
dynamic_refcount pointing to their master's refcount field.  And
the master has dynamic_refcount set to the number of REFCOUNT_SIBLING
following those.

In the patch I've only changed the construction of such splay_tree_keys
and changed gomp_exit_data to do deal with those (that is the very easy
part) plus implement the OpenMP 5.0 rule that one refcount isn't decremented
more than once.
What would need to be done is handle the rest, in particular (for OpenMP
only) adjust the refcount (splay_tree_key only, not target_mem_desc), such
that for the just created splay_tree_keys (refcount pointers in between
tgt->array and end of the array (perhaps we should add a field how many
elts the array has) it doesn't bump anything - just rely on the refcount = 1
we do elsewhere, and for other refcounts, if REFCOUNT_SIBLING, use the
dynamic_refcount pointer and if not REFCOUNT_INFINITY, instead of bumping
the refcount queue it for later increments (again, with allocaed list).
And when unmapping at the end of target or target data, do something similar
to what gomp_exit_data does in the patch (perhaps with some helper
functions).

At least from omp-lang discussions, the intent is that e.g. on
struct S { int a, b, c, d, e; } s = { 1, 2, 3, 4, 5};
#pragma omp target enter data map (s)
// same thing as
// #pragma omp target enter data map (s.a, s.b, s.c, s.d, s.e)
// The above at least theoretically creates 5 mappings, with
// refcount set to 1 for each (but with all those refcount behaving
// in sync), but I'd strongly prefer to create just one with one refcount.
int *p = &s.b;
int *q = &s.d;
#pragma omp target enter data map (p[:1]) map (q[:1])
// Above needs to bump either the refcounts of all of s.a, s.b, s.c, s.d and
// s.e by 1, or when it all has just a single refcount, bump it also just by
// 1.

int a;
#pragma omp target enter data map (a)   // This creates just one mapping and sets refcount to 1
// as int is not an aggregate
char *r, *s;
r = (char *) &a;
s = r + 2;
#pragma omp target enter data map (r[:1], s[:1])
// The above should bump the refcount of a just once, not twice in OpenMP
// 5.0.

For both testcases, I guess one can try to construct from that user
observable tests where the refcount will result in copying the data back at
certain points (or not).
And for the non-contiguous structure element mappings, the idea would
be that we still use a single refcount for the whole structure sibling list
defined in the spec.

--- libgomp/libgomp.h.jj	2020-10-30 12:57:16.176284101 +0100
+++ libgomp/libgomp.h	2020-10-30 12:57:40.264014514 +0100
@@ -1002,6 +1002,10 @@ struct target_mem_desc {
 /* Special value for refcount - tgt_offset contains target address of the
    artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
+/* Special value for refcount - structure sibling list item other than
+   the first one.  *(uintptr_t *)dynamic_refcount is the actual refcount
+   for it.  */
+#define REFCOUNT_SIBLING (~(uintptr_t) 2)
 
 /* Special offset values.  */
 #define OFFSET_INLINED (~(uintptr_t) 0)
--- libgomp/target.c.jj	2020-10-30 12:57:19.926242130 +0100
+++ libgomp/target.c	2020-10-30 14:45:04.016809943 +0100
@@ -1022,6 +1022,7 @@ gomp_map_vars_internal (struct gomp_devi
       splay_tree_node array = tgt->array;
       size_t j, field_tgt_offset = 0, field_tgt_clear = FIELD_TGT_EMPTY;
       uintptr_t field_tgt_base = 0;
+      splay_tree_key field_tgt_sibling = NULL;
 
       for (i = 0; i < mapnum; i++)
 	if (has_always_ptrset
@@ -1153,6 +1154,7 @@ gomp_map_vars_internal (struct gomp_devi
 		    field_tgt_base = (uintptr_t) hostaddrs[first];
 		    field_tgt_offset = tgt_size;
 		    field_tgt_clear = last;
+		    field_tgt_sibling = NULL;
 		    tgt_size += cur_node.host_end
 				- (uintptr_t) hostaddrs[first];
 		    continue;
@@ -1251,12 +1253,29 @@ gomp_map_vars_internal (struct gomp_devi
 		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i].key = k;
 		k->tgt = tgt;
+		k->refcount = 1;
+		k->dynamic_refcount = 0;
 		if (field_tgt_clear != FIELD_TGT_EMPTY)
 		  {
 		    k->tgt_offset = k->host_start - field_tgt_base
 				    + field_tgt_offset;
+		    if (pragma_kind != GOMP_MAP_VARS_OPENACC)
+		      {
+			if (field_tgt_sibling == 0)
+			  field_tgt_sibling = k;
+			else
+			  {
+			    k->refcount = REFCOUNT_SIBLING;
+			    k->dynamic_refcount
+			      = (uintptr_t) &field_tgt_sibling->refcount;
+			    field_tgt_sibling->dynamic_refcount++;
+			  }
+		      }
 		    if (i == field_tgt_clear)
-		      field_tgt_clear = FIELD_TGT_EMPTY;
+		      {
+			field_tgt_clear = FIELD_TGT_EMPTY;
+			field_tgt_sibling = 0;
+		      }
 		  }
 		else
 		  {
@@ -1270,8 +1289,6 @@ gomp_map_vars_internal (struct gomp_devi
 		tgt->list[i].is_attach = false;
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
-		k->refcount = 1;
-		k->dynamic_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
@@ -2462,12 +2479,27 @@ GOMP_target_update_ext (int device, size
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, true);
 }
 
+static int
+gomp_uintptr_t_cmp (const void *p1, const void *p2)
+{
+  if (*(const uintptr_t **) p1 < *(const uintptr_t **) p2)
+    return -1;
+  if (*(const uintptr_t **) p1 > *(const uintptr_t **) p2)
+    return 1;
+  return 0;
+}
+
 static void
 gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 		void **hostaddrs, size_t *sizes, unsigned short *kinds)
 {
   const int typemask = 0xff;
   size_t i;
+  uintptr_t **refcounts = gomp_alloca (mapnum * sizeof (uintptr_t *));
+  size_t nrefcounts = 0;
+  splay_tree_key *keys = gomp_alloca (mapnum * sizeof (splay_tree_key));
+  bool any_deletes = false;
+  bool any_from = false;
   gomp_mutex_lock (&devicep->lock);
   if (devicep->state == GOMP_DEVICE_FINALIZED)
     {
@@ -2482,6 +2514,8 @@ gomp_exit_data (struct gomp_device_descr
       switch (kind)
 	{
 	case GOMP_MAP_FROM:
+	  any_from = true;
+	  /* FALLTHRU */
 	case GOMP_MAP_ALWAYS_FROM:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
@@ -2493,26 +2527,31 @@ gomp_exit_data (struct gomp_device_descr
 			      || kind == GOMP_MAP_ZERO_LEN_ARRAY_SECTION)
 	    ? gomp_map_0len_lookup (&devicep->mem_map, &cur_node)
 	    : splay_tree_lookup (&devicep->mem_map, &cur_node);
+	  keys[i] = k;
 	  if (!k)
 	    continue;
 
-	  if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
-	    k->refcount--;
-	  if ((kind == GOMP_MAP_DELETE
-	       || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
-	      && k->refcount != REFCOUNT_INFINITY)
-	    k->refcount = 0;
+	  if (k->refcount != REFCOUNT_INFINITY)
+	    {
+	      uintptr_t *refcount = &k->refcount;
+	      if (k->refcount == REFCOUNT_SIBLING)
+		refcount = (uintptr_t *)k->dynamic_refcount;
+	      if (kind == GOMP_MAP_DELETE
+		  || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
+		*refcount = 0;
+	      if (nrefcounts && refcounts[nrefcounts - 1] == refcount)
+		/* Already queued for refcount decrease.  */;
+	      else
+		/* Otherwise queue the mapping for refcount decrement.  */
+		refcounts[nrefcounts++] = refcount;
+	    }
 
-	  if ((kind == GOMP_MAP_FROM && k->refcount == 0)
-	      || kind == GOMP_MAP_ALWAYS_FROM)
+	  if (kind == GOMP_MAP_ALWAYS_FROM)
 	    gomp_copy_dev2host (devicep, NULL, (void *) cur_node.host_start,
 				(void *) (k->tgt->tgt_start + k->tgt_offset
 					  + cur_node.host_start
 					  - k->host_start),
 				cur_node.host_end - cur_node.host_start);
-	  if (k->refcount == 0)
-	    gomp_remove_var (devicep, k);
-
 	  break;
 	default:
 	  gomp_mutex_unlock (&devicep->lock);
@@ -2521,6 +2560,82 @@ gomp_exit_data (struct gomp_device_descr
 	}
     }
 
+  /* Sort the refcount pointers.  */
+  if (nrefcounts > 1)
+    qsort (refcounts, nrefcounts, sizeof (uintptr_t *), gomp_uintptr_t_cmp);
+
+  /* So that we can decrease each separate refcount just once.  */
+  for (i = 0; i < nrefcounts; i++)
+    if (i == 0 || refcounts[i - 1] != refcounts[i])
+      {
+	uintptr_t val = *refcounts[i];
+	if (val != 0)
+	  *refcounts[i] = val - 1;
+	if (val <= 1)
+	  any_deletes = true;
+      }
+
+  /* If anything has been decremented to 0 and there are from map-kind
+     map clauses, copy the data to host.  For always, from we've already
+     done it earlier.  */
+  if (any_deletes && any_from)
+    for (i = 0; i < mapnum; i++)
+      {
+	struct splay_tree_key_s cur_node;
+	unsigned char kind = kinds[i] & typemask;
+	if (kind == GOMP_MAP_FROM)
+	  {
+	    cur_node.host_start = (uintptr_t) hostaddrs[i];
+	    cur_node.host_end = cur_node.host_start + sizes[i];
+	    splay_tree_key k = keys[i];
+	    if (k == NULL || k->refcount == REFCOUNT_INFINITY)
+	      continue;
+
+	    uintptr_t *refcount = &k->refcount;
+	    if (k->refcount == REFCOUNT_SIBLING)
+	      refcount = (uintptr_t *) k->dynamic_refcount;
+
+	    if (*refcount == 0)
+	      gomp_copy_dev2host (devicep, NULL, (void *) cur_node.host_start,
+				  (void *) (k->tgt->tgt_start + k->tgt_offset
+					    + cur_node.host_start
+					    - k->host_start),
+				  cur_node.host_end - cur_node.host_start);
+	  }
+      }
+
+  /* And finally remove any mappings that reached refcount 0.  */
+  if (any_deletes)
+    for (i = 0; i < nrefcounts; i++)
+      if ((i == 0 || refcounts[i - 1] != refcounts[i])
+	  && *refcounts[i] == 0)
+	{
+	  splay_tree_key k
+	    = (splay_tree_key) ((char *) refcounts[i]
+				- offsetof (struct splay_tree_key_s,
+					    refcount));
+	  if (k->dynamic_refcount)
+	    {
+	      /* For OpenMP structure sibling lists, remove all following
+		 REFCOUNT_SIBLING mappings before finally removing the first
+		 one.  */
+	      splay_tree_key k2;
+	      for (k2 = k + 1; k->dynamic_refcount;
+		   k2++, k->dynamic_refcount--)
+		{
+		  if (k2->refcount != REFCOUNT_SIBLING
+		      || k2->dynamic_refcount != (uintptr_t) &k->refcount)
+		    {
+		      gomp_mutex_unlock (&devicep->lock);
+		      gomp_fatal ("internal error in structure sibling "
+				  "list handling");
+		    }
+		  gomp_remove_var (devicep, k2);
+		}
+	    }
+	  gomp_remove_var (devicep, k);
+	}
+
   gomp_mutex_unlock (&devicep->lock);
 }
 


	Jakub
Chung-Lin Tang Nov. 2, 2020, 9:54 a.m. UTC | #5
Thank you Jakub, I'll need some time to look at this.

Thanks.
Chung-Lin

On 2020/10/30 10:05 PM, Jakub Jelinek wrote:
> On Mon, Oct 26, 2020 at 09:10:08AM +0100, Jakub Jelinek via Gcc-patches wrote:
>> Yes, it is a QoI and it is important not to regress about that.
>> Furthermore, the more we diverge from what the spec says, it will be harder
>> for us to implement, not just now, but in the future too.
>> What I wrote about the actual implementation is actually not accurate, we
>> need the master and slaves to be the struct splay_tree_key_s objects.
>> And that one already has the aux field that could be used for the slaves,
>> so we could e.g. use another magic value of refcount, e.g. REFCOUNT_SLAVE
>> ~(uintptr_t) 2, and in that case aux would point to the master
>> splay_tree_key_s.
>>
>> And the
>> "If the corresponding list item’s reference count was not already incremented because of the
>> effect of a map clause on the construct then:
>> a) The corresponding list item’s reference count is incremented by one;"
>> and
>> "If the map-type is not delete and the corresponding list item’s reference count is finite and
>> was not already decremented because of the effect of a map clause on the construct then:
>> a) The corresponding list item’s reference count is decremented by one;"
>> rules we need to implement in any case, I don't see a way around that.
>> The same list item can now be mapped (or unmapped) multiple times on the same
>> construct.
> 
> To show up what exactly I meant, here is a proof of concept (but unfinished)
> patch.
> For OpenMP only (I believe OpenACC ATM doesn't have such concept of
> structure sibling lists nor requirement as OpenMP 5.0 that on one construct
> one refcount isn't incremented multiple times nor decremented multiple
> times) it uses the dynamic_refcount field otherwise only used in OpenACC
> for the structure sibling lists; in particular, all but the first mapping
> in a structure sibling list will have refcount == REFCOUNT_SIBLING and
> dynamic_refcount pointing to their master's refcount field.  And
> the master has dynamic_refcount set to the number of REFCOUNT_SIBLING
> following those.
> 
> In the patch I've only changed the construction of such splay_tree_keys
> and changed gomp_exit_data to do deal with those (that is the very easy
> part) plus implement the OpenMP 5.0 rule that one refcount isn't decremented
> more than once.
> What would need to be done is handle the rest, in particular (for OpenMP
> only) adjust the refcount (splay_tree_key only, not target_mem_desc), such
> that for the just created splay_tree_keys (refcount pointers in between
> tgt->array and end of the array (perhaps we should add a field how many
> elts the array has) it doesn't bump anything - just rely on the refcount = 1
> we do elsewhere, and for other refcounts, if REFCOUNT_SIBLING, use the
> dynamic_refcount pointer and if not REFCOUNT_INFINITY, instead of bumping
> the refcount queue it for later increments (again, with allocaed list).
> And when unmapping at the end of target or target data, do something similar
> to what gomp_exit_data does in the patch (perhaps with some helper
> functions).
> 
> At least from omp-lang discussions, the intent is that e.g. on
> struct S { int a, b, c, d, e; } s = { 1, 2, 3, 4, 5};
> #pragma omp target enter data map (s)
> // same thing as
> // #pragma omp target enter data map (s.a, s.b, s.c, s.d, s.e)
> // The above at least theoretically creates 5 mappings, with
> // refcount set to 1 for each (but with all those refcount behaving
> // in sync), but I'd strongly prefer to create just one with one refcount.
> int *p = &s.b;
> int *q = &s.d;
> #pragma omp target enter data map (p[:1]) map (q[:1])
> // Above needs to bump either the refcounts of all of s.a, s.b, s.c, s.d and
> // s.e by 1, or when it all has just a single refcount, bump it also just by
> // 1.
> 
> int a;
> #pragma omp target enter data map (a)   // This creates just one mapping and sets refcount to 1
> // as int is not an aggregate
> char *r, *s;
> r = (char *) &a;
> s = r + 2;
> #pragma omp target enter data map (r[:1], s[:1])
> // The above should bump the refcount of a just once, not twice in OpenMP
> // 5.0.
> 
> For both testcases, I guess one can try to construct from that user
> observable tests where the refcount will result in copying the data back at
> certain points (or not).
> And for the non-contiguous structure element mappings, the idea would
> be that we still use a single refcount for the whole structure sibling list
> defined in the spec.
> 
> --- libgomp/libgomp.h.jj	2020-10-30 12:57:16.176284101 +0100
> +++ libgomp/libgomp.h	2020-10-30 12:57:40.264014514 +0100
> @@ -1002,6 +1002,10 @@ struct target_mem_desc {
>   /* Special value for refcount - tgt_offset contains target address of the
>      artificial pointer to "omp declare target link" object.  */
>   #define REFCOUNT_LINK (~(uintptr_t) 1)
> +/* Special value for refcount - structure sibling list item other than
> +   the first one.  *(uintptr_t *)dynamic_refcount is the actual refcount
> +   for it.  */
> +#define REFCOUNT_SIBLING (~(uintptr_t) 2)
>   
>   /* Special offset values.  */
>   #define OFFSET_INLINED (~(uintptr_t) 0)
> --- libgomp/target.c.jj	2020-10-30 12:57:19.926242130 +0100
> +++ libgomp/target.c	2020-10-30 14:45:04.016809943 +0100
> @@ -1022,6 +1022,7 @@ gomp_map_vars_internal (struct gomp_devi
>         splay_tree_node array = tgt->array;
>         size_t j, field_tgt_offset = 0, field_tgt_clear = FIELD_TGT_EMPTY;
>         uintptr_t field_tgt_base = 0;
> +      splay_tree_key field_tgt_sibling = NULL;
>   
>         for (i = 0; i < mapnum; i++)
>   	if (has_always_ptrset
> @@ -1153,6 +1154,7 @@ gomp_map_vars_internal (struct gomp_devi
>   		    field_tgt_base = (uintptr_t) hostaddrs[first];
>   		    field_tgt_offset = tgt_size;
>   		    field_tgt_clear = last;
> +		    field_tgt_sibling = NULL;
>   		    tgt_size += cur_node.host_end
>   				- (uintptr_t) hostaddrs[first];
>   		    continue;
> @@ -1251,12 +1253,29 @@ gomp_map_vars_internal (struct gomp_devi
>   		size_t align = (size_t) 1 << (kind >> rshift);
>   		tgt->list[i].key = k;
>   		k->tgt = tgt;
> +		k->refcount = 1;
> +		k->dynamic_refcount = 0;
>   		if (field_tgt_clear != FIELD_TGT_EMPTY)
>   		  {
>   		    k->tgt_offset = k->host_start - field_tgt_base
>   				    + field_tgt_offset;
> +		    if (pragma_kind != GOMP_MAP_VARS_OPENACC)
> +		      {
> +			if (field_tgt_sibling == 0)
> +			  field_tgt_sibling = k;
> +			else
> +			  {
> +			    k->refcount = REFCOUNT_SIBLING;
> +			    k->dynamic_refcount
> +			      = (uintptr_t) &field_tgt_sibling->refcount;
> +			    field_tgt_sibling->dynamic_refcount++;
> +			  }
> +		      }
>   		    if (i == field_tgt_clear)
> -		      field_tgt_clear = FIELD_TGT_EMPTY;
> +		      {
> +			field_tgt_clear = FIELD_TGT_EMPTY;
> +			field_tgt_sibling = 0;
> +		      }
>   		  }
>   		else
>   		  {
> @@ -1270,8 +1289,6 @@ gomp_map_vars_internal (struct gomp_devi
>   		tgt->list[i].is_attach = false;
>   		tgt->list[i].offset = 0;
>   		tgt->list[i].length = k->host_end - k->host_start;
> -		k->refcount = 1;
> -		k->dynamic_refcount = 0;
>   		tgt->refcount++;
>   		array->left = NULL;
>   		array->right = NULL;
> @@ -2462,12 +2479,27 @@ GOMP_target_update_ext (int device, size
>     gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, true);
>   }
>   
> +static int
> +gomp_uintptr_t_cmp (const void *p1, const void *p2)
> +{
> +  if (*(const uintptr_t **) p1 < *(const uintptr_t **) p2)
> +    return -1;
> +  if (*(const uintptr_t **) p1 > *(const uintptr_t **) p2)
> +    return 1;
> +  return 0;
> +}
> +
>   static void
>   gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
>   		void **hostaddrs, size_t *sizes, unsigned short *kinds)
>   {
>     const int typemask = 0xff;
>     size_t i;
> +  uintptr_t **refcounts = gomp_alloca (mapnum * sizeof (uintptr_t *));
> +  size_t nrefcounts = 0;
> +  splay_tree_key *keys = gomp_alloca (mapnum * sizeof (splay_tree_key));
> +  bool any_deletes = false;
> +  bool any_from = false;
>     gomp_mutex_lock (&devicep->lock);
>     if (devicep->state == GOMP_DEVICE_FINALIZED)
>       {
> @@ -2482,6 +2514,8 @@ gomp_exit_data (struct gomp_device_descr
>         switch (kind)
>   	{
>   	case GOMP_MAP_FROM:
> +	  any_from = true;
> +	  /* FALLTHRU */
>   	case GOMP_MAP_ALWAYS_FROM:
>   	case GOMP_MAP_DELETE:
>   	case GOMP_MAP_RELEASE:
> @@ -2493,26 +2527,31 @@ gomp_exit_data (struct gomp_device_descr
>   			      || kind == GOMP_MAP_ZERO_LEN_ARRAY_SECTION)
>   	    ? gomp_map_0len_lookup (&devicep->mem_map, &cur_node)
>   	    : splay_tree_lookup (&devicep->mem_map, &cur_node);
> +	  keys[i] = k;
>   	  if (!k)
>   	    continue;
>   
> -	  if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
> -	    k->refcount--;
> -	  if ((kind == GOMP_MAP_DELETE
> -	       || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
> -	      && k->refcount != REFCOUNT_INFINITY)
> -	    k->refcount = 0;
> +	  if (k->refcount != REFCOUNT_INFINITY)
> +	    {
> +	      uintptr_t *refcount = &k->refcount;
> +	      if (k->refcount == REFCOUNT_SIBLING)
> +		refcount = (uintptr_t *)k->dynamic_refcount;
> +	      if (kind == GOMP_MAP_DELETE
> +		  || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
> +		*refcount = 0;
> +	      if (nrefcounts && refcounts[nrefcounts - 1] == refcount)
> +		/* Already queued for refcount decrease.  */;
> +	      else
> +		/* Otherwise queue the mapping for refcount decrement.  */
> +		refcounts[nrefcounts++] = refcount;
> +	    }
>   
> -	  if ((kind == GOMP_MAP_FROM && k->refcount == 0)
> -	      || kind == GOMP_MAP_ALWAYS_FROM)
> +	  if (kind == GOMP_MAP_ALWAYS_FROM)
>   	    gomp_copy_dev2host (devicep, NULL, (void *) cur_node.host_start,
>   				(void *) (k->tgt->tgt_start + k->tgt_offset
>   					  + cur_node.host_start
>   					  - k->host_start),
>   				cur_node.host_end - cur_node.host_start);
> -	  if (k->refcount == 0)
> -	    gomp_remove_var (devicep, k);
> -
>   	  break;
>   	default:
>   	  gomp_mutex_unlock (&devicep->lock);
> @@ -2521,6 +2560,82 @@ gomp_exit_data (struct gomp_device_descr
>   	}
>       }
>   
> +  /* Sort the refcount pointers.  */
> +  if (nrefcounts > 1)
> +    qsort (refcounts, nrefcounts, sizeof (uintptr_t *), gomp_uintptr_t_cmp);
> +
> +  /* So that we can decrease each separate refcount just once.  */
> +  for (i = 0; i < nrefcounts; i++)
> +    if (i == 0 || refcounts[i - 1] != refcounts[i])
> +      {
> +	uintptr_t val = *refcounts[i];
> +	if (val != 0)
> +	  *refcounts[i] = val - 1;
> +	if (val <= 1)
> +	  any_deletes = true;
> +      }
> +
> +  /* If anything has been decremented to 0 and there are from map-kind
> +     map clauses, copy the data to host.  For always, from we've already
> +     done it earlier.  */
> +  if (any_deletes && any_from)
> +    for (i = 0; i < mapnum; i++)
> +      {
> +	struct splay_tree_key_s cur_node;
> +	unsigned char kind = kinds[i] & typemask;
> +	if (kind == GOMP_MAP_FROM)
> +	  {
> +	    cur_node.host_start = (uintptr_t) hostaddrs[i];
> +	    cur_node.host_end = cur_node.host_start + sizes[i];
> +	    splay_tree_key k = keys[i];
> +	    if (k == NULL || k->refcount == REFCOUNT_INFINITY)
> +	      continue;
> +
> +	    uintptr_t *refcount = &k->refcount;
> +	    if (k->refcount == REFCOUNT_SIBLING)
> +	      refcount = (uintptr_t *) k->dynamic_refcount;
> +
> +	    if (*refcount == 0)
> +	      gomp_copy_dev2host (devicep, NULL, (void *) cur_node.host_start,
> +				  (void *) (k->tgt->tgt_start + k->tgt_offset
> +					    + cur_node.host_start
> +					    - k->host_start),
> +				  cur_node.host_end - cur_node.host_start);
> +	  }
> +      }
> +
> +  /* And finally remove any mappings that reached refcount 0.  */
> +  if (any_deletes)
> +    for (i = 0; i < nrefcounts; i++)
> +      if ((i == 0 || refcounts[i - 1] != refcounts[i])
> +	  && *refcounts[i] == 0)
> +	{
> +	  splay_tree_key k
> +	    = (splay_tree_key) ((char *) refcounts[i]
> +				- offsetof (struct splay_tree_key_s,
> +					    refcount));
> +	  if (k->dynamic_refcount)
> +	    {
> +	      /* For OpenMP structure sibling lists, remove all following
> +		 REFCOUNT_SIBLING mappings before finally removing the first
> +		 one.  */
> +	      splay_tree_key k2;
> +	      for (k2 = k + 1; k->dynamic_refcount;
> +		   k2++, k->dynamic_refcount--)
> +		{
> +		  if (k2->refcount != REFCOUNT_SIBLING
> +		      || k2->dynamic_refcount != (uintptr_t) &k->refcount)
> +		    {
> +		      gomp_mutex_unlock (&devicep->lock);
> +		      gomp_fatal ("internal error in structure sibling "
> +				  "list handling");
> +		    }
> +		  gomp_remove_var (devicep, k2);
> +		}
> +	    }
> +	  gomp_remove_var (devicep, k);
> +	}
> +
>     gomp_mutex_unlock (&devicep->lock);
>   }
>   
> 
> 
> 	Jakub
>
diff mbox series

Patch

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 29f385c9368..4878f71ac61 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -10429,8 +10429,9 @@  gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 		}
 	    }
 	  else if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT
-		   && (code == OMP_TARGET_EXIT_DATA
-		       || code == OACC_EXIT_DATA))
+		   /* Note: we keep GOMP_MAP_STRUCT for OpenMP target exit data
+		      directives, so only remove for OpenACC exit data.  */
+		   && code == OACC_EXIT_DATA)
 	    remove = true;
 	  else if (DECL_SIZE (decl)
 		   && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index da7ac037dcd..3e03f52f70a 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -964,12 +964,15 @@  struct target_var_desc {
   bool always_copy_from;
   /* True if this is for OpenACC 'attach'.  */
   bool is_attach;
+  /* True if this is a structure map. */
+  bool is_struct;
   /* If GOMP_MAP_TO_PSET had a NULL pointer; used for Fortran descriptors,
      which were initially unallocated.  */
   bool has_null_ptr_assoc;
   /* Relative offset against key host_start.  */
   uintptr_t offset;
-  /* Actual length.  */
+  /* Actual length, or number of following structure elements
+     if is_struct == true.  */
   uintptr_t length;
 };
 
@@ -1162,10 +1165,10 @@  struct gomp_device_descr
 /* Kind of the pragma, for which gomp_map_vars () is called.  */
 enum gomp_map_vars_kind
 {
-  GOMP_MAP_VARS_OPENACC,
-  GOMP_MAP_VARS_TARGET,
-  GOMP_MAP_VARS_DATA,
-  GOMP_MAP_VARS_ENTER_DATA
+  GOMP_MAP_VARS_OPENACC    = 1,
+  GOMP_MAP_VARS_TARGET     = 2,
+  GOMP_MAP_VARS_DATA       = 4,
+  GOMP_MAP_VARS_ENTER_DATA = 8
 };
 
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 65757ab2ffc..8dc521ac6d6 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -403,7 +403,8 @@  acc_map_data (void *h, void *d, size_t s)
 
       struct target_mem_desc *tgt
 	= gomp_map_vars (acc_dev, mapnum, &hostaddrs, &devaddrs, &sizes,
-			 &kinds, true, GOMP_MAP_VARS_ENTER_DATA);
+			 &kinds, true,
+			 GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA);
       assert (tgt);
       assert (tgt->list_count == 1);
       splay_tree_key n = tgt->list[0].key;
@@ -572,7 +573,8 @@  goacc_enter_datum (void **hostaddrs, size_t *sizes, void *kinds, int async)
 
       struct target_mem_desc *tgt
 	= gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, NULL, sizes,
-			       kinds, true, GOMP_MAP_VARS_ENTER_DATA);
+			       kinds, true,
+			       GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA);
       assert (tgt);
       assert (tgt->list_count == 1);
       n = tgt->list[0].key;
@@ -1202,7 +1204,7 @@  goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  struct target_mem_desc *tgt
 	    = gomp_map_vars_async (acc_dev, aq, groupnum, &hostaddrs[i], NULL,
 				   &sizes[i], &kinds[i], true,
-				   GOMP_MAP_VARS_ENTER_DATA);
+				   GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA);
 	  assert (tgt);
 
 	  gomp_mutex_lock (&acc_dev->lock);
diff --git a/libgomp/target.c b/libgomp/target.c
index 1a8c67c2df5..32d571337ab 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -369,7 +369,7 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			struct goacc_asyncqueue *aq, splay_tree_key oldn,
 			splay_tree_key newn, struct target_var_desc *tgt_var,
 			unsigned char kind, bool always_to_flag,
-			struct gomp_coalesce_buf *cbuf)
+			bool inc_ref, struct gomp_coalesce_buf *cbuf)
 {
   assert (kind != GOMP_MAP_ATTACH);
 
@@ -377,6 +377,7 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
   tgt_var->is_attach = false;
+  tgt_var->is_struct = false;
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -398,7 +399,7 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			(void *) newn->host_start,
 			newn->host_end - newn->host_start, cbuf);
 
-  if (oldn->refcount != REFCOUNT_INFINITY)
+  if (inc_ref && oldn->refcount != REFCOUNT_INFINITY)
     oldn->refcount++;
 }
 
@@ -453,6 +454,7 @@  gomp_map_fields_existing (struct target_mem_desc *tgt,
 			  struct goacc_asyncqueue *aq, splay_tree_key n,
 			  size_t first, size_t i, void **hostaddrs,
 			  size_t *sizes, void *kinds,
+			  bool fld_inc_ref,
 			  struct gomp_coalesce_buf *cbuf)
 {
   struct gomp_device_descr *devicep = tgt->device_descr;
@@ -471,7 +473,7 @@  gomp_map_fields_existing (struct target_mem_desc *tgt,
       && n2->host_start - n->host_start == n2->tgt_offset - n->tgt_offset)
     {
       gomp_map_vars_existing (devicep, aq, n2, &cur_node, &tgt->list[i],
-			      kind & typemask, false, cbuf);
+			      kind & typemask, false, fld_inc_ref, cbuf);
       return;
     }
   if (sizes[i] == 0)
@@ -487,7 +489,8 @@  gomp_map_fields_existing (struct target_mem_desc *tgt,
 		 == n2->tgt_offset - n->tgt_offset)
 	    {
 	      gomp_map_vars_existing (devicep, aq, n2, &cur_node, &tgt->list[i],
-				      kind & typemask, false, cbuf);
+				      kind & typemask, false, fld_inc_ref,
+				      cbuf);
 	      return;
 	    }
 	}
@@ -499,7 +502,7 @@  gomp_map_fields_existing (struct target_mem_desc *tgt,
 	  && n2->host_start - n->host_start == n2->tgt_offset - n->tgt_offset)
 	{
 	  gomp_map_vars_existing (devicep, aq, n2, &cur_node, &tgt->list[i],
-				  kind & typemask, false, cbuf);
+				  kind & typemask, false, fld_inc_ref, cbuf);
 	  return;
 	}
     }
@@ -676,6 +679,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
   size_t i, tgt_align, tgt_size, not_found_cnt = 0;
   bool has_firstprivate = false;
   bool has_always_ptrset = false;
+  bool openmp_p = (pragma_kind & GOMP_MAP_VARS_OPENACC) == 0;
   const int rshift = short_mapkind ? 8 : 3;
   const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
@@ -683,7 +687,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
-  tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
+  tgt->refcount = (pragma_kind & GOMP_MAP_VARS_ENTER_DATA) ? 0 : 1;
   tgt->device_descr = devicep;
   tgt->prev = NULL;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
@@ -798,6 +802,10 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
 	      tgt_size += cur_node.host_end - cur_node.host_start;
 	      not_found_cnt += last - i;
+	      /* For OpenMP, we also create an entry for the struct map
+		 itself, besides the elements.  */
+	      if (openmp_p)
+		not_found_cnt += 1;
 	      for (i = first; i <= last; i++)
 		{
 		  tgt->list[i].key = NULL;
@@ -811,9 +819,29 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	      i--;
 	      continue;
 	    }
+
+	  /* For OpenMP, structure elements do not increment refcount of the
+	     splay_tree_key, only the heading struct map entry does. This is
+	     to create the uniform alloc/release behavior specified in OpenMP
+	     5.0, i.e. map/unmap of just one structure element field will
+	     behave the same as having alloc/release maps for all
+	     (already mapped) element fields.  */
+	  if (openmp_p)
+	    {
+	      tgt->list[i].key = n;
+	      tgt->list[i].offset = 0; /* Note: not OFFSET_STRUCT.  */
+	      tgt->list[i].copy_from = false;
+	      tgt->list[i].always_copy_from = false;
+	      tgt->list[i].is_attach = false;
+	      tgt->list[i].is_struct = true;
+	      tgt->list[i].length = sizes[i];
+
+	      if (n->refcount != REFCOUNT_INFINITY)
+		n->refcount++;
+	    }
 	  for (i = first; i <= last; i++)
 	    gomp_map_fields_existing (tgt, aq, n, first, i, hostaddrs,
-				      sizes, kinds, NULL);
+				      sizes, kinds, !openmp_p, NULL);
 	  i--;
 	  continue;
 	}
@@ -909,7 +937,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		}
 	    }
 	  gomp_map_vars_existing (devicep, aq, n, &cur_node, &tgt->list[i],
-				  kind & typemask, always_to_cnt > 0, NULL);
+				  kind & typemask, always_to_cnt > 0, true, NULL);
 	  i += always_to_cnt;
 	}
       else
@@ -1064,6 +1092,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		    tgt->list[j].copy_from = false;
 		    tgt->list[j].always_copy_from = false;
 		    tgt->list[j].is_attach = false;
+		    tgt->list[j].is_struct = false;
 		    if (k->refcount != REFCOUNT_INFINITY)
 		      k->refcount++;
 		    gomp_map_pointer (k->tgt, aq,
@@ -1155,11 +1184,48 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		    field_tgt_clear = last;
 		    tgt_size += cur_node.host_end
 				- (uintptr_t) hostaddrs[first];
+		    if (openmp_p)
+		      {
+			/* When first mapping a struct, create a region
+			   encompassing all to be mapped structure elements.  */
+			splay_tree_key k = &array->key;
+			k->host_start = cur_node.host_start;
+			k->host_end = cur_node.host_end;
+			k->aux = NULL;
+			k->refcount = 1;
+			k->dynamic_refcount = 0;
+			k->tgt = tgt;
+			tgt->refcount++;
+			/* Locate target address of what should be the start of
+			   the entire structure.  */
+			k->tgt_offset = (field_tgt_offset
+					 - (hostaddrs[first] - hostaddrs[i]));
+
+			/* Save number of elements here.  */
+			tgt->list[i].length = sizes[i];
+
+			tgt->list[i].copy_from = false;
+			tgt->list[i].always_copy_from = false;
+			tgt->list[i].is_attach = false;
+			tgt->list[i].is_struct = true;
+			tgt->list[i].offset = 0;
+			tgt->list[i].key = k;
+
+			array->left = NULL;
+			array->right = NULL;
+			splay_tree_insert (mem_map, array);
+			array++;
+		      }
 		    continue;
 		  }
+
+		if (openmp_p
+		    && n->refcount != REFCOUNT_INFINITY)
+		  n->refcount++;
+
 		for (i = first; i <= last; i++)
 		  gomp_map_fields_existing (tgt, aq, n, first, i, hostaddrs,
-					    sizes, kinds, cbufp);
+					    sizes, kinds, !openmp_p, cbufp);
 		i--;
 		continue;
 	      case GOMP_MAP_ALWAYS_POINTER:
@@ -1209,6 +1275,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      tgt->list[i].copy_from = false;
 		      tgt->list[i].always_copy_from = false;
 		      tgt->list[i].is_attach = true;
+		      tgt->list[i].is_struct = false;
 		      /* OpenACC 'attach'/'detach' doesn't affect
 			 structured/dynamic reference counts ('n->refcount',
 			 'n->dynamic_refcount').  */
@@ -1234,8 +1301,28 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
 	    if (n && n->refcount != REFCOUNT_LINK)
-	      gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i],
-				      kind & typemask, false, cbufp);
+	      {
+		if (!openmp_p)
+		  gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i],
+					  kind & typemask, false, true, cbufp);
+		else
+		  {
+		    bool inc_ref = (field_tgt_clear == FIELD_TGT_EMPTY);
+
+		    gomp_map_vars_existing (devicep, aq, n, k, &tgt->list[i],
+					    kind & typemask, false, inc_ref, cbufp);
+		    if (field_tgt_clear != FIELD_TGT_EMPTY)
+		      {
+			k->tgt = tgt;
+			k->tgt_offset = (k->host_start
+					 - field_tgt_base + field_tgt_offset);
+			if (i == field_tgt_clear)
+			  field_tgt_clear = FIELD_TGT_EMPTY;
+			k->aux = NULL;
+			goto copy_map;
+		      }
+		  }
+	      }
 	    else
 	      {
 		k->aux = NULL;
@@ -1268,6 +1355,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		tgt->list[i].always_copy_from
 		  = GOMP_MAP_ALWAYS_FROM_P (kind & typemask);
 		tgt->list[i].is_attach = false;
+		tgt->list[i].is_struct = false;
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
@@ -1276,6 +1364,8 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		array->left = NULL;
 		array->right = NULL;
 		splay_tree_insert (mem_map, array);
+
+	      copy_map:
 		switch (kind & typemask)
 		  {
 		  case GOMP_MAP_ALLOC:
@@ -1326,6 +1416,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 			    tgt->list[j].copy_from = false;
 			    tgt->list[j].always_copy_from = false;
 			    tgt->list[j].is_attach = false;
+			    tgt->list[j].is_struct = false;
 			    tgt->list[i].has_null_ptr_assoc |= !(*(void **) hostaddrs[j]);
 			    if (k->refcount != REFCOUNT_INFINITY)
 			      k->refcount++;
@@ -1415,7 +1506,7 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
   /* If the variable from "omp target enter data" map-list was already mapped,
      tgt is not needed.  Otherwise tgt will be freed by gomp_unmap_vars or
      gomp_exit_data.  */
-  if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == 0)
+  if ((pragma_kind & GOMP_MAP_VARS_ENTER_DATA) && tgt->refcount == 0)
     {
       free (tgt);
       tgt = NULL;
@@ -1523,6 +1614,50 @@  gomp_remove_var_async (struct gomp_device_descr *devicep, splay_tree_key k,
   (void) gomp_remove_var_internal (devicep, k, aq);
 }
 
+static bool
+gomp_var_unref (splay_tree_key k)
+{
+  bool do_unmap = false;
+  if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
+    k->refcount--;
+  else if (k->refcount == 1)
+    {
+      k->refcount--;
+      do_unmap = true;
+    }
+  return do_unmap;
+}
+
+static void
+gomp_var_copy_back (struct target_mem_desc *tgt, bool do_unmap_and_copyfrom,
+		    struct goacc_asyncqueue *aq, size_t i)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  splay_tree_key k = tgt->list[i].key;
+
+  if ((do_unmap_and_copyfrom && tgt->list[i].copy_from)
+      || tgt->list[i].always_copy_from)
+    gomp_copy_dev2host (devicep, aq,
+			(void *) (k->host_start + tgt->list[i].offset),
+			(void *) (k->tgt->tgt_start + k->tgt_offset
+				  + tgt->list[i].offset),
+			tgt->list[i].length);
+}
+
+static void
+gomp_var_unref_tgt (struct target_mem_desc *tgt, splay_tree_key k)
+{
+  struct gomp_device_descr *devicep = tgt->device_descr;
+  struct target_mem_desc *k_tgt = k->tgt;
+
+  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
+
+  /* It would be bad if TGT got unmapped while we're still iterating
+     over its LIST_COUNT, and also expect to use it in the following
+     code.  */
+  assert (!is_tgt_unmapped || k_tgt != tgt);
+}
+
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
    variables back from device to host: if it is false, it is assumed that this
    has been done already.  */
@@ -1561,42 +1696,40 @@  gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 			     false, NULL);
     }
 
-  for (i = 0; i < tgt->list_count; i++)
+  for (i = 0; i < tgt->list_count;)
     {
-      splay_tree_key k = tgt->list[i].key;
-      if (k == NULL)
-	continue;
-
-      /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic reference
-	 counts ('n->refcount', 'n->dynamic_refcount').  */
-      if (tgt->list[i].is_attach)
-	continue;
-
-      bool do_unmap = false;
-      if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
-	k->refcount--;
-      else if (k->refcount == 1)
+      if (tgt->list[i].key == NULL
+	  /* OpenACC 'attach'/'detach' doesn't affect structured/dynamic
+	     reference counts ('n->refcount', 'n->dynamic_refcount').  */
+	  || tgt->list[i].is_attach)
 	{
-	  k->refcount--;
-	  do_unmap = true;
+	  i += 1;
+	  continue;
 	}
 
-      if ((do_unmap && do_copyfrom && tgt->list[i].copy_from)
-	  || tgt->list[i].always_copy_from)
-	gomp_copy_dev2host (devicep, aq,
-			    (void *) (k->host_start + tgt->list[i].offset),
-			    (void *) (k->tgt->tgt_start + k->tgt_offset
-				      + tgt->list[i].offset),
-			    tgt->list[i].length);
-      if (do_unmap)
+      if (tgt->list[i].is_struct)
+	{
+	  size_t j, num_elem = tgt->list[i].length;
+
+	  /* Release the struct map's reference on the splay_tree_key.  */
+	  bool do_unmap = gomp_var_unref (tgt->list[i].key);
+
+	  for (j = i + 1; j <= i + num_elem; j++)
+	    gomp_var_copy_back (tgt, do_unmap && do_copyfrom, aq, j);
+
+	  if (do_unmap)
+	    gomp_var_unref_tgt (tgt, tgt->list[i].key);
+	  i = j;
+	}
+      else
 	{
-	  struct target_mem_desc *k_tgt = k->tgt;
-	  bool is_tgt_unmapped = gomp_remove_var (devicep, k);
-	  /* It would be bad if TGT got unmapped while we're still iterating
-	     over its LIST_COUNT, and also expect to use it in the following
-	     code.  */
-	  assert (!is_tgt_unmapped
-		  || k_tgt != tgt);
+	  splay_tree_key k = tgt->list[i].key;
+	  bool do_unmap = gomp_var_unref (k);
+
+	  gomp_var_copy_back (tgt, do_unmap && do_copyfrom, aq, i);
+	  if (do_unmap)
+	    gomp_var_unref_tgt (tgt, k);
+	  i += 1;
 	}
     }
 
@@ -2512,8 +2645,46 @@  gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 				cur_node.host_end - cur_node.host_start);
 	  if (k->refcount == 0)
 	    gomp_remove_var (devicep, k);
+	  break;
+
+	case GOMP_MAP_STRUCT:
+	  {
+	    size_t num_elem = sizes[i];
+	    cur_node.host_start = (uintptr_t) hostaddrs[i];
+	    cur_node.host_end = cur_node.host_start + 1;
+
+	    splay_tree_key k = splay_tree_lookup (&devicep->mem_map, &cur_node);
+	    if (k)
+	      {
+		for (size_t j = i + 1; j <= i + num_elem; j++)
+		  {
+		    cur_node.host_start = (uintptr_t) hostaddrs[j];
+		    cur_node.host_end = cur_node.host_start + sizes[j];
+		    splay_tree_key ek = splay_tree_lookup (&devicep->mem_map,
+							   &cur_node);
+		    /* All fields should lookup to same splay_tree_key.  */
+		    assert (ek == k);
+		  }
 
+		bool do_unmap = gomp_var_unref (k);
+		for (size_t j = i + 1; j <= i + num_elem; j++)
+		  {
+		    unsigned char ekind = kinds[j] & typemask;
+		    if ((ekind == GOMP_MAP_FROM && do_unmap)
+			|| ekind == GOMP_MAP_ALWAYS_FROM)
+		      gomp_copy_dev2host (devicep, NULL, hostaddrs[j],
+					  (void *) (k->tgt->tgt_start
+						    + k->tgt_offset
+						    + (uintptr_t) hostaddrs[j]
+						    - k->host_start), sizes[j]);
+		  }
+		if (do_unmap)
+		  gomp_remove_var (devicep, k);
+	      }
+	    i += num_elem;
+	  }
 	  break;
+
 	default:
 	  gomp_mutex_unlock (&devicep->lock);
 	  gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x",
diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-1.c b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-1.c
new file mode 100644
index 00000000000..c49d8c12c05
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-1.c
@@ -0,0 +1,29 @@ 
+#include <omp.h>
+#include <stdlib.h>
+
+struct S
+{
+  int a, b;
+};
+typedef struct S S;
+
+int main (void)
+{
+  int d = omp_get_default_device ();
+  int id = omp_get_initial_device ();
+
+  if (d < 0 || d >= omp_get_num_devices ())
+    d = id;
+
+  S s;
+  #pragma omp target enter data map (alloc: s.a, s.b)
+  #pragma omp target exit data map (release: s.b)
+
+  /* OpenMP 5.0 structure element mapping rules describe that elements of same
+     structure variable should allocate/deallocate in a uniform fashion, so
+     "s.a" should be removed together by above 'exit data'.  */
+  if (omp_target_is_present (&s.a, d))
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-2.c b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-2.c
new file mode 100644
index 00000000000..555c6e3e8e0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-2.c
@@ -0,0 +1,44 @@ 
+#include <omp.h>
+#include <stdlib.h>
+
+struct S
+{
+  int a, b, c, d;
+};
+typedef struct S S;
+
+int main (void)
+{
+  int d = omp_get_default_device ();
+  int id = omp_get_initial_device ();
+
+  if (d < 0 || d >= omp_get_num_devices ())
+    d = id;
+
+  S s;
+  #pragma omp target enter data map (alloc: s.a, s.b, s.c, s.d)
+  #pragma omp target enter data map (alloc: s.c)
+  #pragma omp target enter data map (alloc: s.b, s.d)
+  #pragma omp target enter data map (alloc: s.a, s.c, s.b)
+
+  #pragma omp target exit data map (release: s.a)
+  #pragma omp target exit data map (release: s.d)
+  #pragma omp target exit data map (release: s.c)
+  #pragma omp target exit data map (release: s.b)
+
+  /* OpenMP 5.0 structure element mapping rules describe that elements of same
+     structure variable should allocate/deallocate in a uniform fashion, so
+     all elements of 's' should be removed together by above 'exit data's.  */
+  if (omp_target_is_present (&s, d))
+    abort ();
+  if (omp_target_is_present (&s.a, d))
+    abort ();
+  if (omp_target_is_present (&s.b, d))
+    abort ();
+  if (omp_target_is_present (&s.c, d))
+    abort ();
+  if (omp_target_is_present (&s.d, d))
+    abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-3.c b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-3.c
new file mode 100644
index 00000000000..4850eabd879
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-3.c
@@ -0,0 +1,63 @@ 
+#include <omp.h>
+#include <stdlib.h>
+
+struct S
+{
+  int a, b, c, d;
+};
+typedef struct S S;
+
+int main (void)
+{
+  int d = omp_get_default_device ();
+  int id = omp_get_initial_device ();
+
+  if (d < 0 || d >= omp_get_num_devices ())
+    d = id;
+
+  S s;
+
+  #pragma omp target enter data map (alloc: s)
+  #pragma omp target enter data map (alloc: s)
+
+  #pragma omp target exit data map (release: s.a)
+  #pragma omp target exit data map (release: s.b)
+
+  /* OpenMP 5.0 structure element mapping rules describe that elements of same
+     structure variable should allocate/deallocate in a uniform fashion, so
+     all elements of 's' should be removed together by above 'exit data's.  */
+  if (omp_target_is_present (&s, d))
+    abort ();
+  if (omp_target_is_present (&s.a, d))
+    abort ();
+  if (omp_target_is_present (&s.b, d))
+    abort ();
+  if (omp_target_is_present (&s.c, d))
+    abort ();
+  if (omp_target_is_present (&s.d, d))
+    abort ();
+
+  #pragma omp target enter data map (alloc: s.a, s.b)
+  #pragma omp target enter data map (alloc: s.a)
+  #pragma omp target enter data map (alloc: s.b)
+
+  #pragma omp target exit data map (release: s)
+  #pragma omp target exit data map (release: s)
+  #pragma omp target exit data map (release: s)
+
+  /* OpenMP 5.0 structure element mapping rules describe that elements of same
+     structure variable should allocate/deallocate in a uniform fashion, so
+     all elements of 's' should be removed together by above 'exit data's.  */
+  if (omp_target_is_present (&s, d))
+    abort ();
+  if (omp_target_is_present (&s.a, d))
+    abort ();
+  if (omp_target_is_present (&s.b, d))
+    abort ();
+  if (omp_target_is_present (&s.c, d))
+    abort ();
+  if (omp_target_is_present (&s.d, d))
+    abort ();
+
+  return 0;
+}