diff mbox series

OpenACC 'attach'/'detach' has no business affecting user-visible reference counting (was: [PATCH 07/13] OpenACC 2.6 deep copy: libgomp parts)

Message ID 87k10gr06m.fsf@euler.schwinge.homeip.net
State New
Headers show
Series OpenACC 'attach'/'detach' has no business affecting user-visible reference counting (was: [PATCH 07/13] OpenACC 2.6 deep copy: libgomp parts) | expand

Commit Message

Thomas Schwinge June 9, 2020, 10:41 a.m. UTC
Hi Julian!

On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Fri, 5 Jun 2020 13:17:09 +0200
> Thomas Schwinge <thomas@codesourcery.com> wrote:
>> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>
>> wrote:
>> > This part contains the libgomp runtime support for the
>> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds
>>
>> > --- a/libgomp/target.c
>> > +++ b/libgomp/target.c
>>
>> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
>>
>> > +        case GOMP_MAP_ATTACH:
>> > +          {
>> > +            cur_node.host_start = (uintptr_t) hostaddrs[i];
>> > +            cur_node.host_end = cur_node.host_start + sizeof (void *);
>> > +            splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
>> > +            if (n != NULL)
>> > +              {
>> > +                tgt->list[i].key = n;
>> > +                tgt->list[i].offset = cur_node.host_start - n->host_start;
>> > +                tgt->list[i].length = n->host_end - n->host_start;
>> > +                tgt->list[i].copy_from = false;
>> > +                tgt->list[i].always_copy_from = false;
>> > +                tgt->list[i].do_detach
>> > +                  = (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
>> > +                n->refcount++;
>> > +              }
>> > +            else
>> > +              {
>> > +                gomp_mutex_unlock (&devicep->lock);
>> > +                gomp_fatal ("outer struct not mapped for attach");
>> > +              }
>> > +            gomp_attach_pointer (devicep, aq, mem_map, n,
>> > +                                 (uintptr_t) hostaddrs[i], sizes[i],
>> > +                                 cbufp);
>> > +            continue;
>> > +          }
>>
>> For the OpenACC runtime API 'acc_attach' etc. routines they don't, so
>> what's the conceptual reason that for the corresponding OpenACC
>> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in
>> reference counting ('n->refcount++' above)?  I understand OpenACC
>> 'attach'/'detach' clauses to be simple "executable clauses", which
>> just update some values somewhere (say, like
>> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,
>> thus wouldn't appear to need reference counting?
>
> IIUC, n->refcount is not directly the "structural reference count" as
> seen at source level, but rather counts the number of target_var_descs
> in the lists appended to each target_mem_desc -- and GOMP_MAP_ATTACH
> have variable entries in those lists.

That may be OK if that's purely an implementation detail that isn't
visible to the user, however:

> That's not the case for the API
> routines.

As I had mentioned, the problem is: in contrast to 'acc_attach', an
OpenACC 'enter data' directive with 'attach' clause currently uses this
same reference-counted code path, and thus such an 'attach' without
corresponding 'detach' inhibits unmapping; see
'libgomp.oacc-c-c++-common/mdc-refcount-1.c' in the attached patch
"OpenACC 'attach'/'detach' has no business affecting user-visible
reference counting".

That patch seemed to be the logical next step then, to unify the code
paths for 'acc_attach' and 'enter data' directive with 'attach' clause
(which have to act in the same way).  That's (conceptually) somewhat
similar to what you had proposed as part of
<http://mid.mail-archive.com/b23ea71697f77d8214411a3e1348e9dee496e5a6.1590182783.git.julian@codesourcery.com>.
(But all these things really need to be discussed individually...)

However, that patch regresses
'libgomp.oacc-fortran/deep-copy-6-no_finalize.F90', and also the
'deep-copy-7b2f-2.c', and 'deep-copy-7cf.c' that I'm attaching here.  I
have not yet made an attempts to understand these regressions.  It may be
that a Detach Action actually effects an (attached) device pointer being
copied back to the host, and then disturbing things -- and if that, then
it may be a bug in libgomp, or in the test case.  ;-)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Julian Brown June 9, 2020, 12:23 p.m. UTC | #1
On Tue, 9 Jun 2020 12:41:21 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!
> 
> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>
> wrote:
> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,
> >> so what's the conceptual reason that for the corresponding OpenACC
> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in
> >> reference counting ('n->refcount++' above)?  I understand OpenACC
> >> 'attach'/'detach' clauses to be simple "executable clauses", which
> >> just update some values somewhere (say, like
> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,
> >> thus wouldn't appear to need reference counting?  
> >
> > IIUC, n->refcount is not directly the "structural reference count"
> > as seen at source level, but rather counts the number of
> > target_var_descs in the lists appended to each target_mem_desc --
> > and GOMP_MAP_ATTACH have variable entries in those lists.  
> 
> That may be OK if that's purely an implementation detail that isn't
> visible to the user, however:
> 
> > That's not the case for the API
> > routines.  
> 
> As I had mentioned, the problem is: in contrast to 'acc_attach', an
> OpenACC 'enter data' directive with 'attach' clause currently uses
> this same reference-counted code path, and thus such an 'attach'
> without corresponding 'detach' inhibits unmapping; see
> 'libgomp.oacc-c-c++-common/mdc-refcount-1.c' in the attached patch
> "OpenACC 'attach'/'detach' has no business affecting user-visible
> reference counting".

Hmm, right. That's quite a problem from an implementation perspective:
the "attach" clause in the target_mem_desc's var list is what triggers
the "detach" operation (for structured data lifetimes). Having those
references "not count" is quite an ugly wrinkle.

I'll think about that some more...

> That patch seemed to be the logical next step then, to unify the code
> paths for 'acc_attach' and 'enter data' directive with 'attach' clause
> (which have to act in the same way).  That's (conceptually) somewhat
> similar to what you had proposed as part of
> <http://mid.mail-archive.com/b23ea71697f77d8214411a3e1348e9dee496e5a6.1590182783.git.julian@codesourcery.com>.
> (But all these things really need to be discussed individually...)
> 
> However, that patch regresses
> 'libgomp.oacc-fortran/deep-copy-6-no_finalize.F90', and also the
> 'deep-copy-7b2f-2.c', and 'deep-copy-7cf.c' that I'm attaching here.
> I have not yet made an attempts to understand these regressions.  It
> may be that a Detach Action actually effects an (attached) device
> pointer being copied back to the host, and then disturbing things --
> and if that, then it may be a bug in libgomp, or in the test case.
> ;-)

I haven't (even) quite absorbed what you are trying to test with the "no
finalize" version of the deep-copy-6.f90 test case... I probably need
to go back and re-read the spec. IIRC, my understanding was that
copying out a data item that still has multiple attachments would *not*
automatically perform a detachment. Thus, attaches & detaches have to
balance (at least without "finalize"). But maybe I was wrong about that!

Thanks,

Julian
Julian Brown June 18, 2020, 6:21 p.m. UTC | #2
Hi!

On Tue, 9 Jun 2020 12:41:21 +0200
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi Julian!
> 
> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>
> wrote:
> > On Fri, 5 Jun 2020 13:17:09 +0200
> > Thomas Schwinge <thomas@codesourcery.com> wrote:  
> >> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>
> >> wrote:  
> >> > This part contains the libgomp runtime support for the
> >> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds    
> >>   
> >> > --- a/libgomp/target.c
> >> > +++ b/libgomp/target.c    
> >>   
> >> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct
> >> > gomp_device_descr *devicep,    
> >>   
> >> > +	      case GOMP_MAP_ATTACH:
> >> > +		{
> >> > +		  cur_node.host_start = (uintptr_t)
> >> > hostaddrs[i];
> >> > +		  cur_node.host_end = cur_node.host_start +
> >> > sizeof (void *);
> >> > +		  splay_tree_key n = splay_tree_lookup
> >> > (mem_map, &cur_node);
> >> > +		  if (n != NULL)
> >> > +		    {
> >> > +		      tgt->list[i].key = n;
> >> > +		      tgt->list[i].offset = cur_node.host_start
> >> > - n->host_start;
> >> > +		      tgt->list[i].length = n->host_end -
> >> > n->host_start;
> >> > +		      tgt->list[i].copy_from = false;
> >> > +		      tgt->list[i].always_copy_from = false;
> >> > +		      tgt->list[i].do_detach
> >> > +			= (pragma_kind !=
> >> > GOMP_MAP_VARS_OPENACC_ENTER_DATA);
> >> > +		      n->refcount++;
> >> > +		    }
> >> > +		  else
> >> > +		    {
> >> > +		      gomp_mutex_unlock (&devicep->lock);
> >> > +		      gomp_fatal ("outer struct not mapped for
> >> > attach");
> >> > +		    }
> >> > +		  gomp_attach_pointer (devicep, aq, mem_map, n,
> >> > +				       (uintptr_t)
> >> > hostaddrs[i], sizes[i],
> >> > +				       cbufp);
> >> > +		  continue;
> >> > +		}    
> >> 
> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,
> >> so what's the conceptual reason that for the corresponding OpenACC
> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in
> >> reference counting ('n->refcount++' above)?  I understand OpenACC
> >> 'attach'/'detach' clauses to be simple "executable clauses", which
> >> just update some values somewhere (say, like
> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,
> >> thus wouldn't appear to need reference counting?  
> >
> > IIUC, n->refcount is not directly the "structural reference count"
> > as seen at source level, but rather counts the number of
> > target_var_descs in the lists appended to each target_mem_desc --
> > and GOMP_MAP_ATTACH have variable entries in those lists.  
> 
> That may be OK if that's purely an implementation detail that isn't
> visible to the user, however:
> 
> > That's not the case for the API
> > routines.  
> 
> As I had mentioned, the problem is: in contrast to 'acc_attach', an
> OpenACC 'enter data' directive with 'attach' clause currently uses
> this same reference-counted code path, and thus such an 'attach'
> without corresponding 'detach' inhibits unmapping; [...]

The attached patch stops attach/detach operations from affecting
reference counts (either structured or dynamic). This isn't as invasive
as I'd imagined: we can extend the use of the "do_detach" flag in
target_mem_descs' variable lists to mark mappings that correspond to
attach operations, then use that flag to avoid refcount
increment/decrements. (The flag should possibly be renamed now.)

I've modified the refcount self-testing code successfully to work with
this new scheme too, in case that's helpful. I'll send the patches for
that separately.

Tested with offloading to NVPTX. OK?

Thanks,

Julian

ChangeLog

	libgomp/
	* oacc-mem.c (goacc_enter_data_internal): Don't affect
	reference counts for attach mappings.
	(goacc_exit_data_internal): Don't affect reference counts for
	detach mappings.
	* target.c (gomp_map_vars_existing): Don't affect reference
	counts for attach mappings.
	(gomp_map_vars_internal): Set do_detach flag unconditionally to
	mark attach mappings.
	(gomp_unmap_vars_internal): Use above flag to prevent affecting
	reference count for attach mappings.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90:
	Mark test as shouldfail.
	* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail
	gracefully in no-finalize mode.
Thomas Schwinge July 16, 2020, 8:35 a.m. UTC | #3
Hi Julian!

On 2020-06-18T19:21:57+0100, Julian Brown <julian@codesourcery.com> wrote:
> On Tue, 9 Jun 2020 12:41:21 +0200
> Thomas Schwinge <thomas@codesourcery.com> wrote:
>> On 2020-06-05T21:31:08+0100, Julian Brown <julian@codesourcery.com>
>> wrote:
>> > On Fri, 5 Jun 2020 13:17:09 +0200
>> > Thomas Schwinge <thomas@codesourcery.com> wrote:
>> >> On 2019-12-17T21:03:47-0800, Julian Brown <julian@codesourcery.com>
>> >> wrote:
>> >> > This part contains the libgomp runtime support for the
>> >> > GOMP_MAP_ATTACH and GOMP_MAP_DETACH mapping kinds
>> >>
>> >> > --- a/libgomp/target.c
>> >> > +++ b/libgomp/target.c
>> >>
>> >> > @@ -1203,6 +1211,32 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
>> >>
>> >> > +             case GOMP_MAP_ATTACH:
>> >> > +               {
>> >> > +                 cur_node.host_start = (uintptr_t) hostaddrs[i];
>> >> > +                 cur_node.host_end = cur_node.host_start + sizeof (void *);
>> >> > +                 splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
>> >> > +                 if (n != NULL)
>> >> > +                   {
>> >> > +                     tgt->list[i].key = n;
>> >> > +                     tgt->list[i].offset = cur_node.host_start - n->host_start;
>> >> > +                     tgt->list[i].length = n->host_end - n->host_start;
>> >> > +                     tgt->list[i].copy_from = false;
>> >> > +                     tgt->list[i].always_copy_from = false;
>> >> > +                     tgt->list[i].do_detach
>> >> > +                       = (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
>> >> > +                     n->refcount++;
>> >> > +                   }
>> >> > +                 else
>> >> > +                   {
>> >> > +                     gomp_mutex_unlock (&devicep->lock);
>> >> > +                     gomp_fatal ("outer struct not mapped for attach");
>> >> > +                   }
>> >> > +                 gomp_attach_pointer (devicep, aq, mem_map, n,
>> >> > +                                      (uintptr_t) hostaddrs[i], sizes[i],
>> >> > +                                      cbufp);
>> >> > +                 continue;
>> >> > +               }
>> >>
>> >> For the OpenACC runtime API 'acc_attach' etc. routines they don't,
>> >> so what's the conceptual reason that for the corresponding OpenACC
>> >> directive variants, 'GOMP_MAP_ATTACH' etc. here participate in
>> >> reference counting ('n->refcount++' above)?  I understand OpenACC
>> >> 'attach'/'detach' clauses to be simple "executable clauses", which
>> >> just update some values somewhere (say, like
>> >> 'GOMP_MAP_ALWAYS_POINTER'), but they don't alter any mapping state,
>> >> thus wouldn't appear to need reference counting?
>> >
>> > IIUC, n->refcount is not directly the "structural reference count"
>> > as seen at source level, but rather counts the number of
>> > target_var_descs in the lists appended to each target_mem_desc --
>> > and GOMP_MAP_ATTACH have variable entries in those lists.
>>
>> That may be OK if that's purely an implementation detail that isn't
>> visible to the user, however:
>>
>> > That's not the case for the API
>> > routines.
>>
>> As I had mentioned, the problem is: in contrast to 'acc_attach', an
>> OpenACC 'enter data' directive with 'attach' clause currently uses
>> this same reference-counted code path, and thus such an 'attach'
>> without corresponding 'detach' inhibits unmapping; [...]
>
> The attached patch stops attach/detach operations from affecting
> reference counts (either structured or dynamic). This isn't as invasive
> as I'd imagined: we can extend the use of the "do_detach" flag in
> target_mem_descs' variable lists to mark mappings that correspond to
> attach operations, then use that flag to avoid refcount
> increment/decrements.

Thanks, ACK.

> (The flag should possibly be renamed now.)

How about:

    -  /* True if variable should be detached at end of region.  */
    -  bool do_detach;
    +  /* True if this is for OpenACC 'attach'.  */
    +  bool is_attach;

(Changing that similarly is obvious/pre-approved.)

> Tested with offloading to NVPTX. OK?

I've adjusted the patch for current GCC sources, and did some further
changes/cleanup; see below, and attached "[OpenACC] Deep copy
attach/detach should not affect reference counts".  If you're happy with
that, that's OK for master and releases/gcc-10 (once un-frozen) branches.

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -1131,7 +1134,9 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
>           if (tgt->list[j].key == n)
>             {
>               for (size_t k = 0; k < groupnum; k++)
> -               if (j + k < tgt->list_count && tgt->list[j + k].key)
> +               if (j + k < tgt->list_count
> +                   && tgt->list[j + k].key
> +                   && !tgt->list[j + k].do_detach)
>                   {
>                     tgt->list[j + k].key->refcount++;
>                     tgt->list[j + k].key->dynamic_refcount++;
> @@ -1156,7 +1161,7 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
>         for (size_t j = 0; j < tgt->list_count; j++)
>           {
>             n = tgt->list[j].key;
> -           if (n)
> +           if (n && !tgt->list[j].do_detach)
>               n->dynamic_refcount++;
>           }
>       }

If I understand correctly, relatedly, we can also "strengthen" the
'is_tgt_unmapped' checking (nowadays centralized in 'goacc_exit_datum_1')
by excluding any 'do_detach' ones from '++num_mappings'.  Done.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -382,7 +382,7 @@ gomp_map_vars_existing (struct gomp_device_descr *devicep,
>                       (void *) newn->host_start,
>                       newn->host_end - newn->host_start, cbuf);
>
> -  if (oldn->refcount != REFCOUNT_INFINITY)
> +  if (oldn->refcount != REFCOUNT_INFINITY && kind != GOMP_MAP_ATTACH)
>      oldn->refcount++;
>  }

That's always-true.  Removed.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-1.c
> @@ -0,0 +1,50 @@
> +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
> +
> +#include <openacc.h>
> +#include <assert.h>
> +
> +#define N 1024
> +
> +struct mystr {
> +  int pad;
> +  int *data;
> +};

The 'pad' is no longer needed with PR95270 "OpenACC 'enter data attach'
looks up target memory object displaced by pointer size" fixed.

> +[...]

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/attach-detach-rc-2.c
> @@ -0,0 +1,4 @@
> +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
> +/* { dg-additional-options "-DATTACH_VIA_DIRECTIVE" } */
> +
> +#include "attach-detach-rc-1.c"

I've merged/extended 'libgomp.oacc-c-c++-common/attach-detach-rc-1.c',
'libgomp.oacc-c-c++-common/attach-detach-rc-2.c' into
'libgomp.oacc-c-c++-common/mdc-refcount-1.c', and further added
'libgomp.oacc-c-c++-common/mdc-refcount-2.c', and
'libgomp.oacc-c-c++-common/mdc-refcount-3.c'.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
diff mbox series

Patch

From d99a701387054259419292b95462f3646a00d6d9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 8 Jun 2020 21:35:32 +0200
Subject: [PATCH] OpenACC 'attach'/'detach' has no business affecting
 user-visible reference counting

In particular, an 'attach' without 'detach' must not inhibit unmapping.

	libgomp/
	* oacc-mem.c (goacc_attach_internal): New function, split out of
	'acc_attach_async'.
	(acc_attach, goacc_enter_data_internal): Use it.
	(goacc_exit_data_internal) <GOMP_MAP_DETACH,
	GOMP_MAP_FORCE_DETACH>: Skip unmapping.
	* target.c (gomp_map_vars_existing): Assert not 'GOMP_MAP_ATTACH'.
	(gomp_map_vars_internal) <GOMP_MAP_ATTACH>: Assert this
	is not an 'enter data'.
	* testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New file.
	* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust.
---
 libgomp/oacc-mem.c                            |  51 +++++---
 libgomp/target.c                              |  21 ++-
 .../mdc-refcount-1.c                          | 123 ++++++++++++++++++
 .../mdc-refcount-1-4-1.f90                    |   7 +-
 4 files changed, 176 insertions(+), 26 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c

diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 936ae649dd9..0758f59ec3c 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -881,12 +881,11 @@  acc_update_self_async (void *h, size_t s, int async)
   update_dev_host (0, h, s, async);
 }
 
-void
-acc_attach_async (void **hostaddr, int async)
+static void
+goacc_attach_internal (goacc_aq aq, void **hostaddr, size_t bias)
 {
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
-  goacc_aq aq = get_goacc_asyncqueue (async);
 
   struct splay_tree_key_s cur_node;
   splay_tree_key n;
@@ -907,15 +906,22 @@  acc_attach_async (void **hostaddr, int async)
     }
 
   gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, n, (uintptr_t) hostaddr,
-		       0, NULL);
+		       bias, NULL);
 
   gomp_mutex_unlock (&acc_dev->lock);
 }
 
+void
+acc_attach_async (void **hostaddr, int async)
+{
+  goacc_aq aq = get_goacc_asyncqueue (async);
+  goacc_attach_internal (aq, hostaddr, 0);
+}
+
 void
 acc_attach (void **hostaddr)
 {
-  acc_attach_async (hostaddr, acc_async_sync);
+  goacc_attach_internal (NULL, hostaddr, 0);
 }
 
 static void
@@ -1034,11 +1040,22 @@  goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
     {
       int group_last = find_group_last (i, mapnum, sizes, kinds);
 
-      gomp_map_vars_async (acc_dev, aq,
-			   (group_last - i) + 1,
-			   &hostaddrs[i], NULL,
-			   &sizes[i], &kinds[i], true,
-			   GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+      unsigned char kind = kinds[i] & 0xff;
+      switch (kind)
+	{
+	case GOMP_MAP_ATTACH:
+	  assert (group_last == i);
+	  goacc_attach_internal (aq, /*TODO is that type cast alright? */ (void **) hostaddrs[i], sizes[i]);
+	  /* Doesn't use reference counting.  */
+	  break;
+	default:
+	  gomp_map_vars_async (acc_dev, aq,
+			       (group_last - i) + 1,
+			       &hostaddrs[i], NULL,
+			       &sizes[i], &kinds[i], true,
+			       GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+	  break;
+	}
 
       i = group_last;
     }
@@ -1094,12 +1111,16 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
       bool finalize = false;
 
       if (kind == GOMP_MAP_FORCE_FROM
-	  || kind == GOMP_MAP_DELETE
-	  || kind == GOMP_MAP_FORCE_DETACH)
+	  || kind == GOMP_MAP_DELETE)
 	finalize = true;
 
       switch (kind)
 	{
+	case GOMP_MAP_DETACH:
+	case GOMP_MAP_FORCE_DETACH:
+	  /* Handled above; doesn't use reference counting.  */
+	  break;
+
 	case GOMP_MAP_FROM:
 	case GOMP_MAP_FORCE_FROM:
 	case GOMP_MAP_ALWAYS_FROM:
@@ -1110,14 +1131,10 @@  goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
-	case GOMP_MAP_DETACH:
-	case GOMP_MAP_FORCE_DETACH:
 	  {
 	    struct splay_tree_key_s cur_node;
 	    size_t size;
-	    if (kind == GOMP_MAP_POINTER
-		|| kind == GOMP_MAP_DETACH
-		|| kind == GOMP_MAP_FORCE_DETACH)
+	    if (kind == GOMP_MAP_POINTER)
 	      size = sizeof (void *);
 	    else
 	      size = sizes[i];
diff --git a/libgomp/target.c b/libgomp/target.c
index 36425477dcb..2197067a9a3 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -357,10 +357,12 @@  gomp_map_vars_existing (struct gomp_device_descr *devicep,
 			splay_tree_key newn, struct target_var_desc *tgt_var,
 			unsigned char kind, struct gomp_coalesce_buf *cbuf)
 {
+  assert (kind != GOMP_MAP_ATTACH);
+
   tgt_var->key = oldn;
   tgt_var->copy_from = GOMP_MAP_COPY_FROM_P (kind);
   tgt_var->always_copy_from = GOMP_MAP_ALWAYS_FROM_P (kind);
-  tgt_var->do_detach = kind == GOMP_MAP_ATTACH;
+  tgt_var->do_detach = false; //TODO Not 'newn->do_detach', right?
   tgt_var->offset = newn->host_start - oldn->host_start;
   tgt_var->length = newn->host_end - newn->host_start;
 
@@ -810,13 +812,15 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	}
       else if ((kind & typemask) == GOMP_MAP_ATTACH)
 	{
+	  assert (pragma_kind != GOMP_MAP_VARS_ENTER_DATA
+		  && pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+
 	  tgt->list[i].key = NULL;
 	  has_firstprivate = true;
 	  continue;
 	}
       cur_node.host_start = (uintptr_t) hostaddrs[i];
-      if (!GOMP_MAP_POINTER_P (kind & typemask)
-	  && (kind & typemask) != GOMP_MAP_ATTACH)
+      if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
       else
 	cur_node.host_end = cur_node.host_start + sizeof (void *);
@@ -1083,6 +1087,9 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		continue;
 	      case GOMP_MAP_ATTACH:
 		{
+		  assert (pragma_kind != GOMP_MAP_VARS_ENTER_DATA
+			  && pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+
 		  cur_node.host_start = (uintptr_t) hostaddrs[i];
 		  cur_node.host_end = cur_node.host_start + sizeof (void *);
 		  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
@@ -1093,8 +1100,12 @@  gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		      tgt->list[i].length = n->host_end - n->host_start;
 		      tgt->list[i].copy_from = false;
 		      tgt->list[i].always_copy_from = false;
-		      tgt->list[i].do_detach
-			= (pragma_kind != GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+		      tgt->list[i].do_detach = true;
+		      /* OpenACC 'attach'/'detach' has no business affecting
+			 user-visible reference counting, but the following
+			 adjustment of the structured reference counter ('data'
+			 construct), this is just an implementation detail,
+			 isn't visible to the user.  */
 		      n->refcount++;
 		    }
 		  else
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
new file mode 100644
index 00000000000..d5eb167ca07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c
@@ -0,0 +1,123 @@ 
+/* Verify that OpenACC 'attach'/'detach' doesn't interfere with reference
+   counting.  */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <openacc.h>
+
+/* Need to shared this (and, in particular, implicit '&data_work' in
+   'attach'/'detach' clauses) between 'test' and 'test_'.  */
+static unsigned char *data_work;
+
+static void test_(unsigned variant,
+		  unsigned char *data,
+		  void *data_d)
+{
+  assert(acc_is_present(&data_work, sizeof data_work));
+  assert(data_work == data);
+
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data);
+
+  if (variant & 1)
+    {
+#pragma acc enter data attach(data_work)
+    }
+  else
+    acc_attach((void **) &data_work);
+  acc_update_self(&data_work, sizeof data_work);
+  assert(data_work == data_d);
+
+  if (variant & 4)
+    {
+      if (variant & 2)
+	{ // attach some more
+	  data_work = data;
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+#pragma acc enter data attach(data_work)
+	  acc_attach((void **) &data_work);
+	  acc_attach((void **) &data_work);
+#pragma acc enter data attach(data_work)
+	}
+      else
+	{}
+    }
+  else
+    { // detach
+      data_work = data;
+      if (variant & 2)
+	{
+#pragma acc exit data detach(data_work)
+	}
+      else
+	acc_detach((void **) &data_work);
+      acc_update_self(&data_work, sizeof data_work);
+      assert(data_work == data);
+
+      // now not attached anymore
+
+#if 0
+      if (TODO)
+	{
+	  acc_detach(&data_work); //TODO PR95203 "libgomp: attach count underflow"
+	  acc_update_self(&data_work, sizeof data_work);
+	  assert(data_work == data);
+	}
+#endif
+    }
+
+  assert(acc_is_present(&data_work, sizeof data_work));
+}
+
+static void test(unsigned variant)
+{
+  const int size = sizeof (void *) + 1; // In sweet memory of PR95270.
+  unsigned char *data = (unsigned char *) malloc(size);
+  assert(data);
+  void *data_d = acc_create(data, size);
+  assert(data_d);
+  assert(acc_is_present(data, size));
+
+  data_work = data;
+
+  if (variant & 8)
+    {
+#pragma acc data copyin(data_work)
+      test_(variant, data, data_d);
+    }
+  else
+    {
+      acc_copyin(&data_work, sizeof data_work);
+      test_(variant, data, data_d);
+      acc_delete(&data_work, sizeof data_work);
+    }
+#if ACC_MEM_SHARED
+  assert(acc_is_present(&data_work, sizeof data_work));
+#else
+  assert(!acc_is_present(&data_work, sizeof data_work));
+#endif
+  data_work = NULL;
+
+  assert(acc_is_present(data, size));
+  acc_delete(data, size);
+  data_d = NULL;
+#if ACC_MEM_SHARED
+  assert(acc_is_present(data, size));
+#else
+  assert(!acc_is_present(data, size));
+#endif
+  free(data);
+  data = NULL;
+}
+
+int main()
+{
+  for (size_t i = 0; i < 16; ++i)
+    test(i);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
index b22e411567f..fbd52373946 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90
@@ -23,16 +23,15 @@  program main
   if (.not. acc_is_present(var%a)) stop 1
   if (.not. acc_is_present(var)) stop 2
 
+  !$acc exit data detach(var%a) finalize
   print *, "CheCKpOInT1"
   ! { dg-output ".*CheCKpOInT1(\n|\r\n|\r)" }
-  !$acc exit data detach(var%a) finalize
-  !TODO     goacc_exit_data_internal: Assertion `is_tgt_unmapped || num_mappings > 1' failed.
-  !TODO { dg-output ".*\[Aa\]ssert.*is_tgt_unmapped" { target { ! openacc_host_selected } } } ! Scan for what we expect in the "XFAILed" case (without actually XFAILing).
+  !$acc exit data delete(var%a)
+  !TODO { dg-output "(\n|\r\n|\r)libgomp: attach count underflow(\n|\r\n|\r)$" { target { ! openacc_host_selected } } } ! Scan for what we expect in the "XFAILed" case (without actually XFAILing).
   !TODO { dg-shouldfail "XFAILed" { ! openacc_host_selected } } ! ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all.
   !TODO { dg-final { if { [dg-process-target { xfail { ! openacc_host_selected } }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } ! ... so that we still get an XFAIL visible in the log.
   print *, "CheCKpOInT2"
   ! { dg-output ".CheCKpOInT2(\n|\r\n|\r)" { target { openacc_host_selected } } }
-  !$acc exit data delete(var%a)
   if (acc_is_present(var%a)) stop 3
   if (.not. acc_is_present(var)) stop 4
 
-- 
2.17.1