diff mbox

[PR70895] Add copy mapping for reductions on OpenACC loop directives

Message ID 85006953-86f0-2cb7-8c01-d391ee032a4a@pllab.cs.nthu.edu.tw
State New
Headers show

Commit Message

Chung-Lin Tang Aug. 18, 2016, 3:07 p.m. UTC
On 2016/8/16 7:06 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 15 Aug 2016 19:25:48 +0800, Chung-Lin Tang <cltang@codesourcery.com> wrote:
>> per the discussion on the bugzilla PR page, reductions on OpenACC loop
>> directives will automatically get a copy clause mapping on an enclosing
>> parallel construct (unless bounded by a local variable or an explicit
>> firstprivate clause).
>>
>> There is also a patch for libgomp testsuite cases. Asides from the
>> fortran case which now needs explicit firstprivate clauses to work,
>> other C/C++ cases have been adjusted to remove explicit copy clauses.
>> (I have not exhaustively searched everywhere to eliminate them though)
>>
>> This has been tested using gomp-4_0-branch, which is based on GCC 6,
>> which is what this PR was originally filed for.
>>
>> I will be committing this soon for gomp-4_0-branch,
>> is this okay for gcc-6-branch and trunk as well?
> 
> On Mon, 15 Aug 2016 15:23:14 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
>> The gimplify.c change is ok for trunk and 6.3 (after 6.2 is released).
>> As for the testsuite, I'll leave it to Thomas/Nathan on what they prefer,
>> I'd think that having explicit clauses in e.g. half of the testcases and
>> implicit ones in the other half wouldn't hurt, so that both are tested
>> enough.
> 
> ACK, but from a quick scan it seems as if there's still sufficient
> coverage remaining with explicit usage.
> 
> What I'd like to see changed/added, though, is some libgomp.oacc-fortran
> test coverage of the new implicit copy clauses, and a handful of C/C++ as
> well as Fortran tree-scanning tests in gcc/testsuite/ -- basically, to
> document the expected behavior.

I've added the kernels case assertion you mentioned below, and committed the
gimplify patch to trunk, gcc-6-branch, and gomp-4_0-branch.
I've also added some of those testing items you mentioned, see the attached updated
testsuite patches.

>> +  /* For reductions clauses in OpenACC loop directives, by default create a
>> +     copy clause on the enclosing parallel construct for carrying back the
>> +     results.  */
>> +  if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
> 
> Should this be "ctx->region_type & ORT_ACC" instead of "=="?  I suppose
> the same thing also applies to OpenACC nested parallelism:
> 
>     #pragma acc parallel
>       {
>         [...]
>     #pragma acc parallel reduction([...])
>         {
>           [...]
> 
> ..., which we're not supporting right now, but that'll make it easier to
> spot once adding such support.

I'm ignoring this issue for now; this patch only deals with acc loop directives,
hence "== ORT_ACC". For Cesar's code path that deals with adding copy clauses for
parallel construct reductions, see the case in gimple_adjust_omp_clauses().
IMHO, to infer how we'll deal with nested parallelism right now is too long a shot.

> I suppose we can also run into this for ORT_ACC_KERNELS (if not, should
> mark/document that with gcc_unreachable or some such); per OpenACC 2.0a,
> 2.5.2 Kernels Construct, "a scalar variable referenced in the kernels
> construct that does not appear in a data clause for the construct or any
> enclosing data construct will be treated as if it appeared in a copy
> clause", so we should assert that this already is a GOVD_MAP.

Fair enough, I've added that in the gimplify patch.

> Can we also run into this for ORT_ACC_DATA and ORT_ACC_HOST_DATA, but
> nothing needs to be done for these?

Don't we have other mechanisms for enforcing proper nesting? data/host_data
constructs are supposed to be host code. They should not be directly enclosing loop directives,
as what we're concerned here.

The attached are the final patches I committed.

Thanks,
Chung-Lin

2016-08-18  Chung-Lin Tang  <cltang@codesourcery.com>

        PR middle-end/70895
        gcc/
        * gimplify.c (omp_add_variable): Adjust/add variable mapping on
        enclosing parallel construct for reduction variables on OpenACC loop
        directives.

        gcc/testsuite/
        * gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test.
        * c-c++-common/goacc/reduction-1.c: Likewise.
        * c-c++-common/goacc/reduction-2.c: Likewise.
        * c-c++-common/goacc/reduction-3.c: Likewise.
        * c-c++-common/goacc/reduction-4.c: Likewise.

        libgomp/
        * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit
        firstprivate clauses.
        * testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit
        copy clauses.
        * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
        * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.
diff mbox

Patch

Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90	(revision 239576)
@@ -19,7 +19,7 @@ 
   hs1 = 0
   hs2 = 0
 
-  !$acc parallel num_gangs (1000) copy(gs1, gs2)
+  !$acc parallel num_gangs (1000)
   !$acc loop reduction(+:gs1, gs2) gang
   do i = 1, n
      gs1 = gs1 + 1
@@ -27,7 +27,7 @@ 
   end do
   !$acc end parallel
 
-  !$acc parallel num_workers (4) vector_length (32) copy(ws1, ws2)
+  !$acc parallel num_workers (4) vector_length (32)
   !$acc loop reduction(+:ws1, ws2) worker
   do i = 1, n
      ws1 = ws1 + 1
@@ -35,7 +35,7 @@ 
   end do
   !$acc end parallel
 
-  !$acc parallel vector_length (32) copy(vs1, vs2)
+  !$acc parallel vector_length (32)
   !$acc loop reduction(+:vs1, vs2) vector
   do i = 1, n
      vs1 = vs1 + 1
@@ -43,7 +43,7 @@ 
   end do
   !$acc end parallel
 
-  !$acc parallel num_gangs(8) num_workers(4) vector_length(32) copy(cs1, cs2)
+  !$acc parallel num_gangs(8) num_workers(4) vector_length(32)
   !$acc loop reduction(+:cs1, cs2) gang worker vector
   do i = 1, n
      cs1 = cs1 + 1
@@ -74,7 +74,7 @@ 
   red = 0
   vred = 0
 
-  !$acc parallel num_gangs(10) vector_length(32) copy(red)
+  !$acc parallel num_gangs(10) vector_length(32)
   !$acc loop reduction(+:red) gang
   do i = 1, n/chunksize
      !$acc loop reduction(+:red) vector
Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90	(revision 239576)
@@ -50,7 +50,7 @@ 
 end subroutine redsub_private
 
 
-! Bogus reduction on an impliclitly firstprivate variable.  The results do
+! Bogus reduction on a firstprivate variable.  The results do
 ! survive the parallel region.  The goal here is to ensure that gfortran
 ! doesn't ICE.
 
@@ -58,7 +58,7 @@ 
   integer :: sum, n, arr(n)
   integer :: i
 
-  !$acc parallel
+  !$acc parallel firstprivate(sum)
   !$acc loop gang worker vector reduction (+:sum)
   do i = 1, n
      sum = sum + 1
@@ -72,7 +72,7 @@ 
   integer :: sum, n, arr(n)
   integer :: i, j
 
-  !$acc parallel copy (arr)
+  !$acc parallel copy (arr) firstprivate(sum)
   !$acc loop gang
   do i = 1, n
      sum = i;
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c	(revision 239576)
@@ -8,7 +8,7 @@ 
   int i, j, k, l = 0, f = 0, x = 0;
   int m1 = 4, m2 = -5, m3 = 17;
 
-#pragma acc parallel copy(l)
+#pragma acc parallel
   #pragma acc loop seq collapse(3) reduction(+:l)
     for (i = -2; i < m1; i++)
       for (j = m2; j < -2; j++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c	(revision 239576)
@@ -10,7 +10,7 @@ 
   int ondev = 0;
   int t = 0, h = 0;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ondev)
   {
 #pragma acc loop worker vector reduction (+:t)
     for (unsigned ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c	(revision 239576)
@@ -13,8 +13,7 @@ 
   for (i = 0; i < 1024; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang reduction(+:res)
     for (i = 0; i < 1024; i++)
@@ -28,8 +27,7 @@ 
 
   res = hres = 1;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang reduction(*:res)
     for (i = 0; i < 12; i++)
@@ -53,8 +51,7 @@ 
   for (i = 0; i < 1024; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang vector reduction(+:res)
     for (i = 0; i < 1024; i++)
@@ -78,8 +75,7 @@ 
   for (i = 0; i < 1024; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang worker reduction(+:res)
     for (i = 0; i < 1024; i++)
@@ -103,8 +99,7 @@ 
   for (i = 0; i < 1024; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang worker vector reduction(+:res)
     for (i = 0; i < 1024; i++)
@@ -128,8 +123,7 @@ 
   for (i = 0; i < 32768; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang reduction(+:res)
     for (j = 0; j < 32; j++)
@@ -161,7 +155,7 @@ 
     arr[i] = i;
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copyin(arr) copy(res)
+		       copyin(arr)
   {
     #pragma acc loop gang reduction(+:res)
     for (j = 0; j < 32; j++)
@@ -191,8 +185,7 @@ 
   for (i = 0; i < 32768; i++)
     arr[i] = i;
 
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res, mres)
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32)
   {
     #pragma acc loop gang reduction(+:res) reduction(max:mres)
     for (j = 0; j < 32; j++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(revision 239576)
@@ -12,7 +12,7 @@ 
   int ondev = 0;
   int t = 0,  h = 0;
 
-#pragma acc parallel vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel vector_length(32) copy(ondev)
   {
 #pragma acc loop vector reduction (+:t)
     for (unsigned ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c	(revision 239576)
@@ -11,7 +11,7 @@ 
 
   memset (b, '\0', sizeof (b));
 
-#pragma acc parallel copy(b[0:3][0:3]) copy(l)
+#pragma acc parallel copy(b[0:3][0:3])
     {
 #pragma acc loop collapse(2) reduction(+:l)
 	for (i = 0; i < 2; i++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c	(revision 239576)
@@ -19,7 +19,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel vector_length(32) copyin(ary[0:N])
   {
 #pragma acc loop vector reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -43,7 +43,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_workers(32) copyin(ary[0:N])
   {
 #pragma acc loop worker reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -67,7 +67,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_gangs (32) copyin(ary[0:N])
   {
 #pragma acc loop gang reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c	(revision 239576)
@@ -11,7 +11,7 @@ 
   int ondev = 0;
   int t = 0, h = 0;
   
-#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ondev)
   {
 #pragma acc loop gang worker vector reduction(+:t)
     for (unsigned ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c	(revision 239576)
@@ -22,7 +22,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel vector_length(32) copyin(ary[0:N])
   {
 #pragma acc loop vector reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -46,7 +46,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_workers(32) copyin(ary[0:N])
   {
 #pragma acc loop worker reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -70,7 +70,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_gangs (32) copyin(ary[0:N])
   {
 #pragma acc loop gang reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c	(revision 239576)
@@ -19,7 +19,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel vector_length(32) copyin(ary[0:N])
   {
 #pragma acc loop vector reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -43,7 +43,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_workers(32) copyin(ary[0:N])
   {
 #pragma acc loop worker reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -67,7 +67,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_gangs (32) copyin(ary[0:N])
   {
 #pragma acc loop gang reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c	(revision 239576)
@@ -11,7 +11,7 @@ 
   int ondev = 0;
   int t = 0, h = 0;
   
-#pragma acc parallel num_gangs(32) vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel num_gangs(32) vector_length(32) copy(ondev)
   {
 #pragma acc loop gang  reduction (+:t)
     for (unsigned ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c	(revision 239576)
@@ -11,7 +11,7 @@ 
   int ondev = 0;
   int t = 0,  h = 0;
 
-#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ondev)
   {
 #pragma acc loop worker reduction(+:t)
     for (unsigned ix = 0; ix < N; ix++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c	(revision 239575)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c	(revision 239576)
@@ -22,7 +22,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel vector_length(32) copyin(ary[0:N])
   {
 #pragma acc loop vector reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -46,7 +46,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_workers(32) copyin(ary[0:N])
   {
 #pragma acc loop worker reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)
@@ -70,7 +70,7 @@ 
 {
   Type tsum = 0, tprod = 1;
 
-#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod)
+#pragma acc parallel num_gangs (32) copyin(ary[0:N])
   {
 #pragma acc loop gang reduction(+:tsum) reduction (*:tprod)
     for (int ix = 0; ix < N; ix++)