diff mbox

Use "oacc kernels parallelized" attribute for parallelized OpenACC kernels (was: Use "oacc kernels" attribute for OpenACC kernels)

Message ID 8737cd4ukh.fsf@hertz.schwinge.homeip.net
State New
Headers show

Commit Message

Thomas Schwinge May 9, 2017, 8:57 p.m. UTC
Hi!

On Mon, 08 May 2017 21:29:28 +0200, I wrote:
> On Thu, 4 Aug 2016 16:07:00 +0200, I wrote:
> > Ping.
> > 
> > On Wed, 27 Jul 2016 12:06:59 +0200, I wrote:
> > > On Mon, 25 Jan 2016 16:09:14 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > On Mon, Jan 25, 2016 at 10:06:50AM -0500, Nathan Sidwell wrote:
> > > > > On 01/04/16 10:39, Nathan Sidwell wrote:
> > > > > >There's currently no robust predicate to determine whether an oacc offload
> > > > > >function is for a kernels region (as opposed to a parallel region).
> > > > > >[...]
> > > > > >
> > > > > >This patch marks TREE_PUBLIC on the offload attribute values, to note kernels
> > > > > >regions,  and adds a predicate to check that.  [...]
> > > > > >
> > > > > >Using these predicates improves the dump output of the openacc device lowering
> > > > > >pass too.
> > > 
> > > I just submitted a patch adding "Test cases to check OpenACC offloaded
> > > function's attributes and classification",
> 
> (Pinged in
> <http://mid.mail-archive.com/877f1r1duw.fsf@hertz.schwinge.homeip.net>.)
> 
> > > to actually check the dump output of "oaccdevlow" -- it works.  ;-)
> > > 
> > > > > https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00092.html
> > > > > ping?
> > > > 
> > > > Ok, thanks.
> > > 
> > > It's conceptually and code-wise simpler to just use a "oacc kernels"
> > > attribute for that.  (And, that will make another patch I'm working on
> > > less convoluted.)
> > > 
> > > I'm open to suggestions if there is a better place to set the "oacc
> > > kernels" attribute -- I put it into expand_omp_target, where another
> > > special thing for GF_OMP_TARGET_KIND_OACC_KERNELS is already being done,
> > > and before "rewriting" GF_OMP_TARGET_KIND_OACC_KERNELS (and
> > > GF_OMP_TARGET_KIND_OACC_PARALLEL) into BUILT_IN_GOACC_PARALLEL.  My
> > > reasoning for not setting the attribute earlier (like, in the front
> > > ends), is that at that point in/before expand_omp_target, we still have
> > > the distrinction between OACC_PARALLEL/OACC_KERNELS (tree codes), and
> > > later GF_OMP_TARGET_KIND_OACC_PARALLEL/GF_OMP_TARGET_KIND_OACC_KERNELS
> > > (GIMPLE_OMP_TARGET subcodes).  Another question/possibly cleanup of
> > > course might be to actually do set the "oacc kernels" attribute in the
> > > front end and merge OACC_KERNELS into OACC_PARALLEL, and
> > > GF_OMP_TARGET_KIND_OACC_KERNELS into GF_OMP_TARGET_KIND_OACC_PARALLEL?
> > > 
> > > But anyway, as a first step: OK for trunk?
> 
> commit fac5c3214f58812881635d3fb1e1751446d4b660
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Mon May 8 21:24:46 2017 +0200
> 
>     Use "oacc kernels" attribute for OpenACC kernels

And on top of that, we can then 'use "oacc kernels parallelized"
attribute for parallelized OpenACC kernels', to also make that more
explicit (and pave the way for another change later on).  OK for trunk?

commit b6b5d549089423e3fbe387f63467d052b956f3f7
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Tue May 9 20:14:03 2017 +0200

    Use "oacc kernels parallelized" attribute for parallelized OpenACC kernels
    
            gcc/
            * tree-parloops.c (create_parallel_loop): Set "oacc kernels
            parallelized" attribute for parallelized OpenACC kernels.
            * omp-offload.c (execute_oacc_device_lower): Use it.
            gcc/testsuite/
            * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
            * c-c++-common/goacc/classify-kernels.c: Likewise.
            * c-c++-common/goacc/kernels-counter-vars-function-scope.c:
            Likewise.
            * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
            * c-c++-common/goacc/kernels-double-reduction.c: Likewise.
            * c-c++-common/goacc/kernels-loop-2.c: Likewise.
            * c-c++-common/goacc/kernels-loop-3.c: Likewise.
            * c-c++-common/goacc/kernels-loop-g.c: Likewise.
            * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
            * c-c++-common/goacc/kernels-loop-n.c: Likewise.
            * c-c++-common/goacc/kernels-loop-nest.c: Likewise.
            * c-c++-common/goacc/kernels-loop.c: Likewise.
            * c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
            * c-c++-common/goacc/kernels-reduction.c: Likewise.
            * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
            * gfortran.dg/goacc/classify-kernels.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
            * gfortran.dg/goacc/kernels-loop.f95: Likewise.
---
 gcc/omp-offload.c                                  | 24 ++++++++++++++++++----
 .../goacc/classify-kernels-unparallelized.c        |  2 +-
 .../c-c++-common/goacc/classify-kernels.c          |  6 +++---
 .../goacc/kernels-counter-vars-function-scope.c    |  3 +--
 .../goacc/kernels-double-reduction-n.c             |  3 +--
 .../c-c++-common/goacc/kernels-double-reduction.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  |  3 +--
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  |  3 +--
 .../c-c++-common/goacc/kernels-loop-nest.c         |  3 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c    |  3 +--
 .../c-c++-common/goacc/kernels-one-counter-var.c   |  3 +--
 .../c-c++-common/goacc/kernels-reduction.c         |  3 +--
 .../goacc/classify-kernels-unparallelized.f95      |  2 +-
 .../gfortran.dg/goacc/classify-kernels.f95         |  6 +++---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data-2.f95      |  3 +--
 .../goacc/kernels-loop-data-enter-exit-2.f95       |  3 +--
 .../goacc/kernels-loop-data-enter-exit.f95         |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data-update.f95 |  3 +--
 .../gfortran.dg/goacc/kernels-loop-data.f95        |  3 +--
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 |  5 ++---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95   |  3 +--
 gcc/tree-parloops.c                                | 16 ++++++++-------
 26 files changed, 58 insertions(+), 60 deletions(-)



Grüße
 Thomas

Comments

Jakub Jelinek May 10, 2017, 4:30 p.m. UTC | #1
On Tue, May 09, 2017 at 10:57:34PM +0200, Thomas Schwinge wrote:
> commit b6b5d549089423e3fbe387f63467d052b956f3f7
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Tue May 9 20:14:03 2017 +0200
> 
>     Use "oacc kernels parallelized" attribute for parallelized OpenACC kernels
>     
>             gcc/
>             * tree-parloops.c (create_parallel_loop): Set "oacc kernels
>             parallelized" attribute for parallelized OpenACC kernels.
>             * omp-offload.c (execute_oacc_device_lower): Use it.
>             gcc/testsuite/
>             * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
>             * c-c++-common/goacc/classify-kernels.c: Likewise.
>             * c-c++-common/goacc/kernels-counter-vars-function-scope.c:
>             Likewise.
>             * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
>             * c-c++-common/goacc/kernels-double-reduction.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-2.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-3.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-g.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-n.c: Likewise.
>             * c-c++-common/goacc/kernels-loop-nest.c: Likewise.
>             * c-c++-common/goacc/kernels-loop.c: Likewise.
>             * c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
>             * c-c++-common/goacc/kernels-reduction.c: Likewise.
>             * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
>             * gfortran.dg/goacc/classify-kernels.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
>             * gfortran.dg/goacc/kernels-loop.f95: Likewise.

Ok.

	Jakub
diff mbox

Patch

diff --git gcc/omp-offload.c gcc/omp-offload.c
index 5e1eac4..e954ee9 100644
--- gcc/omp-offload.c
+++ gcc/omp-offload.c
@@ -1445,6 +1445,13 @@  execute_oacc_device_lower ()
       flag_openacc_dims = (char *)&flag_openacc_dims;
     }
 
+  bool is_oacc_kernels
+    = (lookup_attribute ("oacc kernels",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_kernels_parallelized
+    = (lookup_attribute ("oacc kernels parallelized",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+
   /* Discover, partition and process the loops.  */
   oacc_loop *loops = oacc_loop_discovery ();
   int fn_level = oacc_fn_attrib_level (attrs);
@@ -1454,17 +1461,26 @@  execute_oacc_device_lower ()
       if (fn_level >= 0)
 	fprintf (dump_file, "Function is OpenACC routine level %d\n",
 		 fn_level);
-      else if (lookup_attribute ("oacc kernels",
-				 DECL_ATTRIBUTES (current_function_decl)))
-	fprintf (dump_file, "Function is OpenACC kernels offload\n");
+      else if (is_oacc_kernels)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 (is_oacc_kernels_parallelized
+		  ? "parallelized" : "unparallelized"));
       else
 	fprintf (dump_file, "Function is OpenACC parallel offload\n");
     }
 
   unsigned outer_mask = fn_level >= 0 ? GOMP_DIM_MASK (fn_level) - 1 : 0;
   unsigned used_mask = oacc_loop_partition (loops, outer_mask);
+  /* OpenACC kernels constructs are special: they currently don't use the
+     generic oacc_loop infrastructure and attribute/dimension processing.  */
+  if (is_oacc_kernels && is_oacc_kernels_parallelized)
+    {
+      /* Parallelized OpenACC kernels constructs use gang parallelism.  See
+	 also tree-parloops.c:create_parallel_loop.  */
+      used_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG);
+    }
+
   int dims[GOMP_DIM_MAX];
-
   oacc_validate_dims (current_function_decl, attrs, dims, fn_level, used_mask);
 
   if (dump_file)
diff --git gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 70ff428..626f6b4 100644
--- gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -34,6 +34,6 @@  void KERNELS ()
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC kernels offload" 1 "oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
    { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
diff --git gcc/testsuite/c-c++-common/goacc/classify-kernels.c gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index c8b0fda..95037e6 100644
--- gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -25,11 +25,11 @@  void KERNELS ()
 /* Check that exactly one OpenACC kernels construct is analyzed, and that it
    can be parallelized.
    { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
    { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
-   { dg-final { scan-tree-dump-times "(?n)Function is OpenACC kernels offload" 1 "oaccdevlow" } }
+   { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index 17f240e..c475333 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -45,9 +45,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index 750f576..27ea2e9 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -27,10 +27,9 @@  foo (unsigned int n)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index df60d6a..0841e90 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -27,10 +27,9 @@  foo (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index 913d91f..acef6a1 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -59,11 +59,10 @@  main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 1822d2a..75e2bb7 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -39,9 +39,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index e946319..73b469d 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -7,9 +7,8 @@ 
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 9b63b45..5592623 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -43,9 +43,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index 279f797..e86be1b 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -46,9 +46,8 @@  foo (COUNTERTYPE n)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index db1071f..2b0e186 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -30,9 +30,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index abf7a3c..9619d53 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -46,9 +46,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index 95f4817..69539b2 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -44,9 +44,8 @@  main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 6f5a418..4a18272 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -26,9 +26,8 @@  foo (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
-
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 9887d35..4b282ca 100644
--- gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -36,6 +36,6 @@  end program main
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC kernels offload" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccdevlow" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
 ! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
diff --git gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index 69c89a9..da025c1 100644
--- gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -27,11 +27,11 @@  end program main
 ! Check that exactly one OpenACC kernels construct is analyzed, and that it
 ! can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(0, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
-! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC kernels offload" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccdevlow" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccdevlow" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccdevlow" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index 865f7a6..516aede 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -34,11 +34,10 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index c9f3a62..ff3788a 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -40,11 +40,10 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 3361607..60a5c96 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -40,11 +40,10 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 5ba56fb..ce04749 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -38,11 +38,10 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index a622a96..d2de138 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -38,10 +38,9 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 2 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 2 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 4ec2ac3..92872b2 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -38,11 +38,10 @@  end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.2 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
index 409fe6f..079712f2 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
@@ -32,10 +32,9 @@  end module test
 
 ! Check that only one loop is analyzed, and that it can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
+! TODO, PR70545.
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function __test_MOD_foo._omp_fn.0 " 1 "optimized" } }
-
-! TODO, PR70545.
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" { xfail *-*-* } } }
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index ae2cac6..cc9a3a9 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -30,9 +30,8 @@  end program main
 
 ! Check that only one loop is analyzed, and that it can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
 ! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
-
-! { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } }
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 6ce9d84..f826154 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -2040,19 +2040,20 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   tree cvar, cvar_init, initvar, cvar_next, cvar_base, type;
   edge exit, nexit, guard, end, e;
 
-  /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
   if (oacc_kernels_p)
     {
       gcc_checking_assert (lookup_attribute ("oacc kernels",
 					     DECL_ATTRIBUTES (cfun->decl)));
-
-      tree clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
-      OMP_CLAUSE_NUM_GANGS_EXPR (clause)
-	= build_int_cst (integer_type_node, n_threads);
-      oacc_set_fn_attrib (cfun->decl, clause, NULL);
+      /* Indicate to later processing that this is a parallelized OpenACC
+	 kernels construct.  */
+      DECL_ATTRIBUTES (cfun->decl)
+	= tree_cons (get_identifier ("oacc kernels parallelized"),
+		     NULL_TREE, DECL_ATTRIBUTES (cfun->decl));
     }
   else
     {
+      /* Prepare the GIMPLE_OMP_PARALLEL statement.  */
+
       basic_block bb = loop_preheader_edge (loop)->src;
       basic_block paral_bb = single_pred (bb);
       gsi = gsi_last_bb (paral_bb);
@@ -2154,7 +2155,8 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 
   /* Emit GIMPLE_OMP_FOR.  */
   if (oacc_kernels_p)
-    /* In combination with the NUM_GANGS on the parallel.  */
+    /* Parallelized OpenACC kernels constructs use gang parallelism.  See also
+       omp-offload.c:execute_oacc_device_lower.  */
     t = build_omp_clause (loc, OMP_CLAUSE_GANG);
   else
     {