Message ID | 71cbd367-249c-420d-87b6-4291764ddddb@baylibre.com |
---|---|
State | New |
Headers | show |
Series | [OpenACC,2.7,v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters | expand |
Hi Chung-Lin! On 2024-04-11T22:08:47+0800, Chung-Lin Tang <cltang@baylibre.com> wrote: > On 2024/3/15 7:24 PM, Thomas Schwinge wrote: >> - if (n->refcount != REFCOUNT_INFINITY) >> + if (n->refcount != REFCOUNT_INFINITY >> + && n->refcount != REFCOUNT_ACC_MAP_DATA) >> n->refcount--; >> n->dynamic_refcount--; >> } >> >> + /* Mappings created by 'acc_map_data' may only be deleted by >> + 'acc_unmap_data'. */ >> + if (n->refcount == REFCOUNT_ACC_MAP_DATA >> + && n->dynamic_refcount == 0) >> + n->dynamic_refcount = 1; >> + >> if (n->refcount == 0) >> { >> bool copyout = (kind == GOMP_MAP_FROM >> >> ..., which really should have the same semantics? No strong opinion on >> which of the two variants you now chose. > > My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will > be lighter on any branch predictors (faster performing overall) Eh, OK... > so I will > stick with my version here. >>>> It's not clear to me why you need this handling -- instead of just >>>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, >>>> early 'return'? >>>> >>>> Per my understanding, this code is for OpenACC only exercised for >>>> structured data regions, and it seems strange (unnecessary?) to adjust >>>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I >>>> missing anything? >>> >>> No, that is not true. It goes through almost everything through gomp_map_vars_existing/_internal. >>> This is what happens when you acc_create/acc_copyin on a mapping created by acc_map_data. I still don't follow. If you 'acc_map_data' something, and then 'acc_create' the same memory region, then that's handled, with 'dynamic_refcount', via 'acc_create' -> 'goacc_enter_datum' -> 'goacc_map_var_existing', all in 'libgomp/oacc-mem.c'. Agree? >> But I don't understand what you foresee breaking with the following (on >> top of your v2): >> >> --- a/libgomp/target.c >> +++ b/libgomp/target.c >> @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr *devicep, void *devptr) >> static inline void >> gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) >> { >> - if (k == NULL || k->refcount == REFCOUNT_INFINITY) >> + if (k == NULL >> + || k->refcount == REFCOUNT_INFINITY >> + || k->refcount == REFCOUNT_ACC_MAP_DATA) >> return; >> >> uintptr_t *refcount_ptr = &k->refcount; >> >> - if (k->refcount == REFCOUNT_ACC_MAP_DATA) >> - refcount_ptr = &k->dynamic_refcount; >> - else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> + if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) >> refcount_ptr = &k->structelem_refcount; > ... >> Can you please show a test case? That is, a test case where the 'libgomp/target.c:gomp_increment_refcount' etc. handling is relevant. Those test cases: > I have re-tested the patch *without* the gomp_increment/decrement_refcount changes, > and have these regressions (just to demonstrate what is affected): > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test > +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test > +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test ... are cases where we 'acc_map_data' something, and then invoke an OpenACC compute constuct with a data clause for the same memory region... > Now, I have also re-tested your version (aka, just break early and return when k->refcount == REFCOUNT_ACC_MAP_DATA) > And for the record, that also works (no regressions). > > However, I strongly suggest we use my version here where we adjust the dynamic_refcount ..., and it's confusing to me why such an OpenACC compute constuct (which is to use the structured reference counter) should then use the dynamic reference counter, for 'acc_map_data'-mapped data? > simply because: *It is the whole point of this project item in OpenACC 2.7* > > The 2.7 spec articulated how increment/decrement interacts with acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more conforming to it implementation-wise. > (otherwise, no point in working on this at all, as there wasn't really anything behaviorally wrong about our implementation before) That is, in my understanding, those 'gomp_increment_refcount' changes don't affect the 'acc_map_data' reference counting, but instead, they change the reference counting for OpenACC constructs that are originally using structured reference counter to instead use the dynamic reference counter. This doesn't seem conceptually right to me. (..., even if not observable from the outside.) Grüße Thomas > diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h > index f98cccd8b66..089393846d1 100644 > --- a/libgomp/libgomp.h > +++ b/libgomp/libgomp.h > @@ -1163,6 +1163,8 @@ struct target_mem_desc; > /* Special value for refcount - tgt_offset contains target address of the > artificial pointer to "omp declare target link" object. */ > #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1) > +/* Special value for refcount - created through acc_map_data. */ > +#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2) > > /* Special value for refcount - structure element sibling list items. > All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST > diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c > index ba48a1ccbb7..d590806b5cd 100644 > --- a/libgomp/oacc-mem.c > +++ b/libgomp/oacc-mem.c > @@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s) > assert (n->refcount == 1); > assert (n->dynamic_refcount == 0); > /* Special reference counting behavior. */ > - n->refcount = REFCOUNT_INFINITY; > + n->refcount = REFCOUNT_ACC_MAP_DATA; > + n->dynamic_refcount = 1; > > if (profiling_p) > { > @@ -455,12 +456,7 @@ acc_unmap_data (void *h) > gomp_fatal ("[%p,%d] surrounds %p", > (void *) n->host_start, (int) host_size, (void *) h); > } > - /* TODO This currently doesn't catch 'REFCOUNT_INFINITY' usage different from > - 'acc_map_data'. Maybe 'dynamic_refcount' can be used for disambiguating > - the different 'REFCOUNT_INFINITY' cases, or simply separate > - 'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA' > - etc.)? */ > - else if (n->refcount != REFCOUNT_INFINITY) > + else if (n->refcount != REFCOUNT_ACC_MAP_DATA) > { > gomp_mutex_unlock (&acc_dev->lock); > gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped" > @@ -468,13 +464,12 @@ acc_unmap_data (void *h) > (void *) h, (int) host_size); > } > > - struct target_mem_desc *tgt = n->tgt; > + /* This should hold for all mappings created by acc_map_data. We are however > + removing this mapping in a "finalize" manner, so dynamic_refcount > 1 does > + not matter. */ > + assert (n->dynamic_refcount >= 1); > > - if (tgt->refcount == REFCOUNT_INFINITY) > - { > - gomp_mutex_unlock (&acc_dev->lock); > - gomp_fatal ("cannot unmap target block"); > - } > + struct target_mem_desc *tgt = n->tgt; > > /* Above, we've verified that the mapping must have been set up by > 'acc_map_data'. */ > @@ -519,7 +514,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, void *hostaddr, > } > > assert (n->refcount != REFCOUNT_LINK); > - if (n->refcount != REFCOUNT_INFINITY) > + if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA) > n->refcount++; > n->dynamic_refcount++; > > @@ -683,13 +679,30 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s, > > assert (n->refcount != REFCOUNT_LINK); > if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA > && n->refcount < n->dynamic_refcount) > { > gomp_mutex_unlock (&acc_dev->lock); > gomp_fatal ("Dynamic reference counting assert fail\n"); > } > > - if (finalize) > + if (n->refcount == REFCOUNT_ACC_MAP_DATA) > + { > + if (finalize) > + { > + /* Mappings created by acc_map_data are returned to initial > + dynamic_refcount of 1. Can only be deleted by acc_unmap_data. */ > + n->dynamic_refcount = 1; > + } > + else if (n->dynamic_refcount) > + { > + /* When mapping is created by acc_map_data, dynamic_refcount must be > + maintained at >= 1. */ > + if (n->dynamic_refcount > 1) > + n->dynamic_refcount--; > + } > + } > + else if (finalize) > { > if (n->refcount != REFCOUNT_INFINITY) > n->refcount -= n->dynamic_refcount; > @@ -1131,7 +1144,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, > } > /* This is a special case because we must increment the refcount by > the number of mapped struct elements, rather than by one. */ > - if (n->refcount != REFCOUNT_INFINITY) > + if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA) > n->refcount += groupnum - 1; > n->dynamic_refcount += groupnum - 1; > } > @@ -1203,7 +1217,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, > processed = true; > } > else > - assert (n->refcount != REFCOUNT_INFINITY); > + assert (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA); > > for (size_t j = 0; j < tgt->list_count; j++) > if (tgt->list[j].key == n) > diff --git a/libgomp/target.c b/libgomp/target.c > index bcc86051601..c9dcb8761e5 100644 > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -481,7 +481,9 @@ gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) > > uintptr_t *refcount_ptr = &k->refcount; > > - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > + if (k->refcount == REFCOUNT_ACC_MAP_DATA) > + refcount_ptr = &k->dynamic_refcount; > + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > refcount_ptr = &k->structelem_refcount; > else if (REFCOUNT_STRUCTELEM_P (k->refcount)) > refcount_ptr = k->structelem_refcount_ptr; > @@ -528,7 +530,9 @@ gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, > > uintptr_t *refcount_ptr = &k->refcount; > > - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > + if (k->refcount == REFCOUNT_ACC_MAP_DATA) > + refcount_ptr = &k->dynamic_refcount; > + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > refcount_ptr = &k->structelem_refcount; > else if (REFCOUNT_STRUCTELEM_P (k->refcount)) > refcount_ptr = k->structelem_refcount_ptr; > @@ -561,6 +565,10 @@ gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, > else if (*refcount_ptr > 0) > *refcount_ptr -= 1; > > + /* Force back to 1 if this is an acc_map_data mapping. */ > + if (k->refcount == REFCOUNT_ACC_MAP_DATA && *refcount_ptr == 0) > + *refcount_ptr = 1; > + > end: > if (*refcount_ptr == 0) > { > diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c > new file mode 100644 > index 00000000000..7bc55b94f33 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c > @@ -0,0 +1,36 @@ > +/* { dg-do run } */ > +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */ > + > +/* Test if acc_unmap_data has implicit finalize semantics. */ > + > +#include <stdlib.h> > +#include <openacc.h> > + > +int > +main (int argc, char **argv) > +{ > + const int N = 256; > + unsigned char *h; > + void *d; > + > + h = (unsigned char *) malloc (N); > + > + d = acc_malloc (N); > + > + acc_map_data (h, d, N); > + > + acc_copyin (h, N); > + acc_copyin (h, N); > + acc_copyin (h, N); > + > + acc_unmap_data (h); > + > + if (acc_is_present (h, N)) > + abort (); > + > + acc_free (d); > + > + free (h); > + > + return 0; > +} > diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c > index 872f0c1de5c..9ed9fa7e413 100644 > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c > @@ -10,7 +10,7 @@ main (int argc, char *argv[]) > { > acc_init (acc_device_default); > acc_unmap_data ((void *) foo); > -/* { dg-output "libgomp: cannot unmap target block" } */ > +/* { dg-output "libgomp: refusing to unmap block \\\[\[0-9a-fA-FxX\]+,\\\+64\\\] that has not been mapped by 'acc_map_data'" } */ > return 0; > } >
On 2024/4/12 3:14 PM, Thomas Schwinge wrote: >> I have re-tested the patch *without* the gomp_increment/decrement_refcount changes, >> and have these regressions (just to demonstrate what is affected): >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test >> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test >> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test > ... are cases where we 'acc_map_data' something, and then invoke an > OpenACC compute constuct with a data clause for the same memory region... > >> Now, I have also re-tested your version (aka, just break early and return when k->refcount == REFCOUNT_ACC_MAP_DATA) >> And for the record, that also works (no regressions). >> >> However, I strongly suggest we use my version here where we adjust the dynamic_refcount > ..., and it's confusing to me why such an OpenACC compute constuct (which > is to use the structured reference counter) should then use the dynamic > reference counter, for 'acc_map_data'-mapped data? > >> simply because: *It is the whole point of this project item in OpenACC 2.7* >> >> The 2.7 spec articulated how increment/decrement interacts with acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more conforming to it implementation-wise. >> (otherwise, no point in working on this at all, as there wasn't really anything behaviorally wrong about our implementation before) > That is, in my understanding, those 'gomp_increment_refcount' changes > don't affect the 'acc_map_data' reference counting, but instead, they > change the reference counting for OpenACC constructs that are originally > using structured reference counter to instead use the dynamic reference > counter. This doesn't seem conceptually right to me. (..., even if not > observable from the outside.) Okay, I've committed the attached patch, with the "early return upon k->refcount == REFCOUNT_ACC_MAP_DATA" in gomp_increment/decrement_refcount. If we continue to use k->refcount itself as the flag holder of map type, I guess we will not be able to directly determine whether it is a structured or dynamic adjustment at that point. Probably need a new field entirely. I think we don't really need to do that right now. Thanks, Chung-Lin From a7578a077ed8b64b94282aa55faf7037690abbc5 Mon Sep 17 00:00:00 2001 From: Chung-Lin Tang <cltang@baylibre.com> Date: Tue, 16 Apr 2024 09:03:21 +0000 Subject: [PATCH] OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference counters This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings. Adjustment around mapping related code to respect OpenACC semantics are also added. libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Return early for REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust testcase error output scan test. --- libgomp/libgomp.h | 2 + libgomp/oacc-mem.c | 49 ++++++++++++------- libgomp/target.c | 8 ++- .../libgomp.oacc-c-c++-common/lib-96.c | 36 ++++++++++++++ .../unmap-infinity-1.c | 2 +- 5 files changed, 77 insertions(+), 20 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index f98cccd8b66..089393846d1 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1163,6 +1163,8 @@ struct target_mem_desc; /* Special value for refcount - tgt_offset contains target address of the artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1) +/* Special value for refcount - created through acc_map_data. */ +#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2) /* Special value for refcount - structure element sibling list items. All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index ba48a1ccbb7..d590806b5cd 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s) assert (n->refcount == 1); assert (n->dynamic_refcount == 0); /* Special reference counting behavior. */ - n->refcount = REFCOUNT_INFINITY; + n->refcount = REFCOUNT_ACC_MAP_DATA; + n->dynamic_refcount = 1; if (profiling_p) { @@ -455,12 +456,7 @@ acc_unmap_data (void *h) gomp_fatal ("[%p,%d] surrounds %p", (void *) n->host_start, (int) host_size, (void *) h); } - /* TODO This currently doesn't catch 'REFCOUNT_INFINITY' usage different from - 'acc_map_data'. Maybe 'dynamic_refcount' can be used for disambiguating - the different 'REFCOUNT_INFINITY' cases, or simply separate - 'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA' - etc.)? */ - else if (n->refcount != REFCOUNT_INFINITY) + else if (n->refcount != REFCOUNT_ACC_MAP_DATA) { gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped" @@ -468,13 +464,12 @@ acc_unmap_data (void *h) (void *) h, (int) host_size); } - struct target_mem_desc *tgt = n->tgt; + /* This should hold for all mappings created by acc_map_data. We are however + removing this mapping in a "finalize" manner, so dynamic_refcount > 1 does + not matter. */ + assert (n->dynamic_refcount >= 1); - if (tgt->refcount == REFCOUNT_INFINITY) - { - gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("cannot unmap target block"); - } + struct target_mem_desc *tgt = n->tgt; /* Above, we've verified that the mapping must have been set up by 'acc_map_data'. */ @@ -519,7 +514,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, void *hostaddr, } assert (n->refcount != REFCOUNT_LINK); - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount++; n->dynamic_refcount++; @@ -683,13 +679,30 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s, assert (n->refcount != REFCOUNT_LINK); if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA && n->refcount < n->dynamic_refcount) { gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("Dynamic reference counting assert fail\n"); } - if (finalize) + if (n->refcount == REFCOUNT_ACC_MAP_DATA) + { + if (finalize) + { + /* Mappings created by acc_map_data are returned to initial + dynamic_refcount of 1. Can only be deleted by acc_unmap_data. */ + n->dynamic_refcount = 1; + } + else if (n->dynamic_refcount) + { + /* When mapping is created by acc_map_data, dynamic_refcount must be + maintained at >= 1. */ + if (n->dynamic_refcount > 1) + n->dynamic_refcount--; + } + } + else if (finalize) { if (n->refcount != REFCOUNT_INFINITY) n->refcount -= n->dynamic_refcount; @@ -1131,7 +1144,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, } /* This is a special case because we must increment the refcount by the number of mapped struct elements, rather than by one. */ - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount += groupnum - 1; n->dynamic_refcount += groupnum - 1; } @@ -1203,7 +1217,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, processed = true; } else - assert (n->refcount != REFCOUNT_INFINITY); + assert (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA); for (size_t j = 0; j < tgt->list_count; j++) if (tgt->list[j].key == n) diff --git a/libgomp/target.c b/libgomp/target.c index bcc86051601..5ec19ae489e 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -476,7 +476,9 @@ gomp_free_device_memory (struct gomp_device_descr *devicep, void *devptr) static inline void gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) { - if (k == NULL || k->refcount == REFCOUNT_INFINITY) + if (k == NULL + || k->refcount == REFCOUNT_INFINITY + || k->refcount == REFCOUNT_ACC_MAP_DATA) return; uintptr_t *refcount_ptr = &k->refcount; @@ -520,7 +522,9 @@ static inline void gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, bool *do_copy, bool *do_remove) { - if (k == NULL || k->refcount == REFCOUNT_INFINITY) + if (k == NULL + || k->refcount == REFCOUNT_INFINITY + || k->refcount == REFCOUNT_ACC_MAP_DATA) { *do_copy = *do_remove = false; return; diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c new file mode 100644 index 00000000000..7bc55b94f33 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */ + +/* Test if acc_unmap_data has implicit finalize semantics. */ + +#include <stdlib.h> +#include <openacc.h> + +int +main (int argc, char **argv) +{ + const int N = 256; + unsigned char *h; + void *d; + + h = (unsigned char *) malloc (N); + + d = acc_malloc (N); + + acc_map_data (h, d, N); + + acc_copyin (h, N); + acc_copyin (h, N); + acc_copyin (h, N); + + acc_unmap_data (h); + + if (acc_is_present (h, N)) + abort (); + + acc_free (d); + + free (h); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c index 872f0c1de5c..9ed9fa7e413 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c @@ -10,7 +10,7 @@ main (int argc, char *argv[]) { acc_init (acc_device_default); acc_unmap_data ((void *) foo); -/* { dg-output "libgomp: cannot unmap target block" } */ +/* { dg-output "libgomp: refusing to unmap block \\\[\[0-9a-fA-FxX\]+,\\\+64\\\] that has not been mapped by 'acc_map_data'" } */ return 0; }
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index f98cccd8b66..089393846d1 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1163,6 +1163,8 @@ struct target_mem_desc; /* Special value for refcount - tgt_offset contains target address of the artificial pointer to "omp declare target link" object. */ #define REFCOUNT_LINK (REFCOUNT_SPECIAL | 1) +/* Special value for refcount - created through acc_map_data. */ +#define REFCOUNT_ACC_MAP_DATA (REFCOUNT_SPECIAL | 2) /* Special value for refcount - structure element sibling list items. All such key refounts have REFCOUNT_STRUCTELEM bits set, with _FLAG_FIRST diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index ba48a1ccbb7..d590806b5cd 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -411,7 +411,8 @@ acc_map_data (void *h, void *d, size_t s) assert (n->refcount == 1); assert (n->dynamic_refcount == 0); /* Special reference counting behavior. */ - n->refcount = REFCOUNT_INFINITY; + n->refcount = REFCOUNT_ACC_MAP_DATA; + n->dynamic_refcount = 1; if (profiling_p) { @@ -455,12 +456,7 @@ acc_unmap_data (void *h) gomp_fatal ("[%p,%d] surrounds %p", (void *) n->host_start, (int) host_size, (void *) h); } - /* TODO This currently doesn't catch 'REFCOUNT_INFINITY' usage different from - 'acc_map_data'. Maybe 'dynamic_refcount' can be used for disambiguating - the different 'REFCOUNT_INFINITY' cases, or simply separate - 'REFCOUNT_INFINITY' values per different usage ('REFCOUNT_ACC_MAP_DATA' - etc.)? */ - else if (n->refcount != REFCOUNT_INFINITY) + else if (n->refcount != REFCOUNT_ACC_MAP_DATA) { gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("refusing to unmap block [%p,+%d] that has not been mapped" @@ -468,13 +464,12 @@ acc_unmap_data (void *h) (void *) h, (int) host_size); } - struct target_mem_desc *tgt = n->tgt; + /* This should hold for all mappings created by acc_map_data. We are however + removing this mapping in a "finalize" manner, so dynamic_refcount > 1 does + not matter. */ + assert (n->dynamic_refcount >= 1); - if (tgt->refcount == REFCOUNT_INFINITY) - { - gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("cannot unmap target block"); - } + struct target_mem_desc *tgt = n->tgt; /* Above, we've verified that the mapping must have been set up by 'acc_map_data'. */ @@ -519,7 +514,8 @@ goacc_map_var_existing (struct gomp_device_descr *acc_dev, void *hostaddr, } assert (n->refcount != REFCOUNT_LINK); - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount++; n->dynamic_refcount++; @@ -683,13 +679,30 @@ goacc_exit_datum_1 (struct gomp_device_descr *acc_dev, void *h, size_t s, assert (n->refcount != REFCOUNT_LINK); if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA && n->refcount < n->dynamic_refcount) { gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("Dynamic reference counting assert fail\n"); } - if (finalize) + if (n->refcount == REFCOUNT_ACC_MAP_DATA) + { + if (finalize) + { + /* Mappings created by acc_map_data are returned to initial + dynamic_refcount of 1. Can only be deleted by acc_unmap_data. */ + n->dynamic_refcount = 1; + } + else if (n->dynamic_refcount) + { + /* When mapping is created by acc_map_data, dynamic_refcount must be + maintained at >= 1. */ + if (n->dynamic_refcount > 1) + n->dynamic_refcount--; + } + } + else if (finalize) { if (n->refcount != REFCOUNT_INFINITY) n->refcount -= n->dynamic_refcount; @@ -1131,7 +1144,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, } /* This is a special case because we must increment the refcount by the number of mapped struct elements, rather than by one. */ - if (n->refcount != REFCOUNT_INFINITY) + if (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA) n->refcount += groupnum - 1; n->dynamic_refcount += groupnum - 1; } @@ -1203,7 +1217,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, processed = true; } else - assert (n->refcount != REFCOUNT_INFINITY); + assert (n->refcount != REFCOUNT_INFINITY + && n->refcount != REFCOUNT_ACC_MAP_DATA); for (size_t j = 0; j < tgt->list_count; j++) if (tgt->list[j].key == n) diff --git a/libgomp/target.c b/libgomp/target.c index bcc86051601..c9dcb8761e5 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -481,7 +481,9 @@ gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) uintptr_t *refcount_ptr = &k->refcount; - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) + if (k->refcount == REFCOUNT_ACC_MAP_DATA) + refcount_ptr = &k->dynamic_refcount; + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) refcount_ptr = &k->structelem_refcount; else if (REFCOUNT_STRUCTELEM_P (k->refcount)) refcount_ptr = k->structelem_refcount_ptr; @@ -528,7 +530,9 @@ gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, uintptr_t *refcount_ptr = &k->refcount; - if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) + if (k->refcount == REFCOUNT_ACC_MAP_DATA) + refcount_ptr = &k->dynamic_refcount; + else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) refcount_ptr = &k->structelem_refcount; else if (REFCOUNT_STRUCTELEM_P (k->refcount)) refcount_ptr = k->structelem_refcount_ptr; @@ -561,6 +565,10 @@ gomp_decrement_refcount (splay_tree_key k, htab_t *refcount_set, bool delete_p, else if (*refcount_ptr > 0) *refcount_ptr -= 1; + /* Force back to 1 if this is an acc_map_data mapping. */ + if (k->refcount == REFCOUNT_ACC_MAP_DATA && *refcount_ptr == 0) + *refcount_ptr = 1; + end: if (*refcount_ptr == 0) { diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c new file mode 100644 index 00000000000..7bc55b94f33 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-96.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */ + +/* Test if acc_unmap_data has implicit finalize semantics. */ + +#include <stdlib.h> +#include <openacc.h> + +int +main (int argc, char **argv) +{ + const int N = 256; + unsigned char *h; + void *d; + + h = (unsigned char *) malloc (N); + + d = acc_malloc (N); + + acc_map_data (h, d, N); + + acc_copyin (h, N); + acc_copyin (h, N); + acc_copyin (h, N); + + acc_unmap_data (h); + + if (acc_is_present (h, N)) + abort (); + + acc_free (d); + + free (h); + + return 0; +} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c index 872f0c1de5c..9ed9fa7e413 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c @@ -10,7 +10,7 @@ main (int argc, char *argv[]) { acc_init (acc_device_default); acc_unmap_data ((void *) foo); -/* { dg-output "libgomp: cannot unmap target block" } */ +/* { dg-output "libgomp: refusing to unmap block \\\[\[0-9a-fA-FxX\]+,\\\+64\\\] that has not been mapped by 'acc_map_data'" } */ return 0; }