diff mbox

Use plain -fopenacc to enable OpenACC kernels processing

Message ID 87si0j5u9o.fsf@hertz.schwinge.homeip.net
State New
Headers show

Commit Message

Thomas Schwinge Feb. 23, 2016, 3:19 p.m. UTC
Hi!

On Mon, 15 Feb 2016 17:53:58 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/02/16 15:40, Thomas Schwinge wrote:
> > On Fri, 5 Feb 2016 13:06:17 +0100, I wrote:
> >> On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> >>> On 09/11/15 16:35, Tom de Vries wrote:
> >>>> this patch series for stage1 trunk adds support to:
> >>>> - parallelize oacc kernels regions using parloops, and
> >>>> - map the loops onto the oacc gang dimension.
> >>
> >>> Atm, the parallelization behaviour for the kernels region is controlled
> >>> by flag_tree_parallelize_loops, which is also used to control generic
> >>> auto-parallelization by autopar using omp. That is not ideal, and we may
> >>> want a separate flag (or param) to control the behaviour for oacc
> >>> kernels, f.i. -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.
> >>
> >> I suggest to use plain -fopenacc to enable OpenACC kernels processing
> >> (which just makes sense, I hope) ;-) and have later processing stages
> >> determine the actual parametrization (currently: number of gangs) (that
> >> is, Nathan's recent "Default compute dimensions" patches).
> 
> That makes a lot of sense.  Thanks for working on this.

> >> Originally, I want to use:
> >>
> >>      OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, n_threads == 0 ? -1 : n_threads);
> >>
> >> ... to store -1 "have the compiler decidew" (instead of now 0 "have the
> >> run-time decide", which might prevent some code optimizations, as I
> >> understand it) for the n_threads == 0 case, but it seems that for an
> >> offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
> >> called with the parameter "used" set to 0 instead of "gang", and then the
> >> "Default anything left to 1 or a partitioned default" logic will default
> >> dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
> >> oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
> >> bug (and could you look into that)?

<https://gcc.gnu.org/PR69921> filed.  (Nathan?)

> >> --- gcc/tree-parloops.c
> >> +++ gcc/tree-parloops.c

> The oacc-parloops changes look good to me. I approve them for 6.0 stage 
> 4 (given that using the ftree-parallelize-loops=<n> flag for oacc 
> kernels parallelization was was just a placeholder waiting to be 
> replaced by an oacc-based approach). [ And I'd expect that the 
> tree-ssa-loop.c changes and the mechanical testsuite changes can be 
> regarded as trivial. ]

Thanks; committed (without changes) in r233634:

commit 3a37a410bbfed45d04f06887c348938182369d5a
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Feb 23 15:07:54 2016 +0000

    Use plain -fopenacc to enable OpenACC kernels processing
    
    	gcc/
    	* tree-parloops.c (create_parallel_loop, gen_parallel_loop)
    	(parallelize_loops): In OpenACC kernels mode, set n_threads to
    	zero.
    	(pass_parallelize_loops::gate): In OpenACC kernels mode, gate on
    	flag_openacc.
    	* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
    	gcc/testsuite/
    	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust
    	to -ftree-parallelize-loops/-fopenacc changes.
    	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
    	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
    	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
    	* c-c++-common/goacc/kernels-loop.c: Likewise.
    	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
    	* c-c++-common/goacc/kernels-reduction.c: Likewise.
    	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
    	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
    	libgomp/
    	* oacc-parallel.c (GOACC_parallel_keyed): Initialize dims.
    	* plugin/plugin-nvptx.c (nvptx_exec): Provide default values for
    	dims.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Adjust to
    	-ftree-parallelize-loops/-fopenacc changes.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
    	Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233634 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog                                      |  9 ++++++
 gcc/testsuite/ChangeLog                            | 18 ++++++++++++
 .../goacc/kernels-counter-vars-function-scope.c    |  3 +-
 .../goacc/kernels-double-reduction-n.c             |  3 +-
 .../c-c++-common/goacc/kernels-double-reduction.c  |  3 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |  3 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  |  4 +--
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  |  4 +--
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |  3 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  |  4 +--
 .../c-c++-common/goacc/kernels-loop-nest.c         |  3 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c    |  4 +--
 .../c-c++-common/goacc/kernels-one-counter-var.c   |  4 +--
 .../c-c++-common/goacc/kernels-reduction.c         |  4 +--
 .../gfortran.dg/goacc/kernels-loop-inner.f95       |  1 -
 .../gfortran.dg/goacc/kernels-loops-adjacent.f95   |  1 -
 gcc/tree-parloops.c                                | 25 ++++++++++++++---
 gcc/tree-ssa-loop.c                                |  7 ++---
 libgomp/ChangeLog                                  | 32 ++++++++++++++++++++++
 libgomp/oacc-parallel.c                            |  4 +++
 libgomp/plugin/plugin-nvptx.c                      | 18 ++++++++++--
 .../libgomp.oacc-c-c++-common/kernels-loop-2.c     |  3 --
 .../libgomp.oacc-c-c++-common/kernels-loop-3.c     |  3 --
 .../kernels-loop-and-seq-2.c                       |  3 --
 .../kernels-loop-and-seq-3.c                       |  3 --
 .../kernels-loop-and-seq-4.c                       |  3 --
 .../kernels-loop-and-seq-5.c                       |  3 --
 .../kernels-loop-and-seq-6.c                       |  3 --
 .../kernels-loop-and-seq.c                         |  3 --
 .../kernels-loop-collapse.c                        |  3 --
 .../libgomp.oacc-c-c++-common/kernels-loop-g.c     |  2 --
 .../kernels-loop-mod-not-zero.c                    |  3 --
 .../libgomp.oacc-c-c++-common/kernels-loop-n.c     |  3 --
 .../libgomp.oacc-c-c++-common/kernels-loop-nest.c  |  3 --
 .../libgomp.oacc-c-c++-common/kernels-loop.c       |  3 --
 .../libgomp.oacc-c-c++-common/kernels-reduction.c  |  3 --
 36 files changed, 114 insertions(+), 87 deletions(-)



Grüße
 Thomas
diff mbox

Patch

diff --git gcc/ChangeLog gcc/ChangeLog
index ce8d366..0b2149d 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,12 @@ 
+2016-02-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* tree-parloops.c (create_parallel_loop, gen_parallel_loop)
+	(parallelize_loops): In OpenACC kernels mode, set n_threads to
+	zero.
+	(pass_parallelize_loops::gate): In OpenACC kernels mode, gate on
+	flag_openacc.
+	* tree-ssa-loop.c (gate_oacc_kernels): Likewise.
+
 2016-02-23  Richard Biener  <rguenther@suse.de>
 
 	* mem-stats.h (struct mem_usage): Use PRIu64 for printing size_t.
diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 60372ce..17cf40c 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,21 @@ 
+2016-02-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Adjust
+	to -ftree-parallelize-loops/-fopenacc changes.
+	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
+	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
+	* c-c++-common/goacc/kernels-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
+	* c-c++-common/goacc/kernels-reduction.c: Likewise.
+	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
+
 2016-02-23  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
 
 	* gcc.target/i386/chkp-hidden-def.c: Require alias support.
diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index e8b5357..17f240e 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -51,4 +50,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index c39d674..750f576 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -34,4 +33,4 @@  foo (unsigned int n)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index 3501d0d..df60d6a 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -34,4 +33,4 @@  foo (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index f97584d..913d91f 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -67,4 +66,4 @@  main (void)
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 530d62a..1822d2a 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -45,5 +44,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 4f1c2c5..e946319 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,6 +1,5 @@ 
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -13,5 +12,4 @@ 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 151db51..9b63b45 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -49,4 +48,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index bee5f5a..279f797 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -52,5 +51,4 @@  foo (COUNTERTYPE n)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index ea0e342..db1071f 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -36,4 +35,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index ab5dfb9..abf7a3c 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -52,5 +51,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index b16a8cd..95f4817 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -50,5 +49,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 61c5df3..6f5a418 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -32,5 +31,4 @@  foo (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index 4db3a50..3334741 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -1,5 +1,4 @@ 
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program main
    implicit none
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
index fef3d10..fb92da8 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
@@ -1,5 +1,4 @@ 
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-ftree-parallelize-loops=10" }
 
 program main
    implicit none
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 139e38c..e498e5b 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -2016,7 +2016,8 @@  transform_to_exit_first_loop (struct loop *loop,
 /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
    LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
    NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
+   that number is to be determined later.  */
 
 static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
@@ -2049,6 +2050,7 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
       basic_block paral_bb = single_pred (bb);
       gsi = gsi_last_bb (paral_bb);
 
+      gcc_checking_assert (n_threads != 0);
       t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
       OMP_CLAUSE_NUM_THREADS_EXPR (t)
 	= build_int_cst (integer_type_node, n_threads);
@@ -2221,7 +2223,8 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
-   threads in parallel.
+   threads in parallel, which can be 0 if that number is to be determined
+   later.
 
    NITER describes number of iterations of LOOP.
    REDUCTION_LIST describes the reductions existent in the LOOP.  */
@@ -2318,6 +2321,7 @@  gen_parallel_loop (struct loop *loop,
       else
 	m_p_thread=MIN_PER_THREAD;
 
+      gcc_checking_assert (n_threads != 0);
       many_iterations_cond =
 	fold_build2 (GE_EXPR, boolean_type_node,
 		     nit, build_int_cst (type, m_p_thread * n_threads));
@@ -3177,7 +3181,7 @@  oacc_entry_exit_ok (struct loop *loop,
 static bool
 parallelize_loops (bool oacc_kernels_p)
 {
-  unsigned n_threads = flag_tree_parallelize_loops;
+  unsigned n_threads;
   bool changed = false;
   struct loop *loop;
   struct loop *skip_loop = NULL;
@@ -3199,6 +3203,13 @@  parallelize_loops (bool oacc_kernels_p)
   if (cfun->has_nonlocal_label)
     return false;
 
+  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
+     the argument to -ftree-parallelize-loops.  */
+  if (oacc_kernels_p)
+    n_threads = 0;
+  else
+    n_threads = flag_tree_parallelize_loops;
+
   gcc_obstack_init (&parloop_obstack);
   reduction_info_table_type reduction_list (10);
 
@@ -3361,7 +3372,13 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual bool gate (function *)
+  {
+    if (oacc_kernels_p)
+      return flag_openacc;
+    else
+      return flag_tree_parallelize_loops > 1;
+  }
   virtual unsigned int execute (function *);
   opt_pass * clone () { return new pass_parallelize_loops (m_ctxt); }
   void set_pass_param (unsigned int n, bool param)
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index bdbade5..4c39fbc 100644
--- gcc/tree-ssa-loop.c
+++ gcc/tree-ssa-loop.c
@@ -148,7 +148,7 @@  make_pass_tree_loop (gcc::context *ctxt)
 static bool
 gate_oacc_kernels (function *fn)
 {
-  if (flag_tree_parallelize_loops <= 1)
+  if (!flag_openacc)
     return false;
 
   tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
@@ -230,10 +230,9 @@  public:
   virtual bool gate (function *)
   {
     return (optimize
-	    /* Don't bother doing anything if the program has errors.  */
-	    && !seen_error ()
 	    && flag_openacc
-	    && flag_tree_parallelize_loops > 1);
+	    /* Don't bother doing anything if the program has errors.  */
+	    && !seen_error ());
   }
 
 }; // class pass_ipa_oacc
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 1394126..e6a7082 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,35 @@ 
+2016-02-23  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* oacc-parallel.c (GOACC_parallel_keyed): Initialize dims.
+	* plugin/plugin-nvptx.c (nvptx_exec): Provide default values for
+	dims.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Adjust to
+	-ftree-parallelize-loops/-fopenacc changes.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
+	Likewise.
+
 2016-02-22  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c-c++-common/vprop.c: New test.
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index bc24651..f795bf7 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -103,6 +103,10 @@  GOACC_parallel_keyed (int device, void (*fn) (void *),
       return;
     }
 
+  /* Default: let the runtime choose.  */
+  for (i = 0; i != GOMP_DIM_MAX; i++)
+    dims[i] = 0;
+
   va_start (ap, kinds);
   /* TODO: This will need amending when device_type is implemented.  */
   while ((tag = va_arg (ap, unsigned)) != 0)
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7ec1810..3f1bb6d 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -894,9 +894,21 @@  nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   /* Initialize the launch dimensions.  Typically this is constant,
      provided by the device compiler, but we must permit runtime
      values.  */
-  for (i = 0; i != 3; i++)
-    if (targ_fn->launch->dim[i])
-      dims[i] = targ_fn->launch->dim[i];
+  int seen_zero = 0;
+  for (i = 0; i != GOMP_DIM_MAX; i++)
+    {
+      if (targ_fn->launch->dim[i])
+       dims[i] = targ_fn->launch->dim[i];
+      if (!dims[i])
+       seen_zero = 1;
+    }
+
+  if (seen_zero)
+    {
+      for (i = 0; i != GOMP_DIM_MAX; i++)
+       if (!dims[i])
+         dims[i] = /* TODO */ 32;
+    }
 
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
      the host and the device. HP is a host pointer to the new chunk, and DP is
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
index 13e57bd..c7592d6 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
index f61a74a..31114ac 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index 2e4100f..d36592f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
index b3e736b..e622971 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
index 8b9affa..c731278 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index 83d4e7f..67dcce2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index 01d5e5e..b8b5dde 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index 61d1283..9d9308a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index f7f04cb..997d6c7 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 100
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
index 96b6e4e..88258be 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
@@ -1,5 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-g" } */
 
 #include "kernels-loop.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
index 1433cb2..147ebb5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
index fd0d5b1..9a3eaca 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
index 21d2599..28c725a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 1000
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
index 3762e5a..355123c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
index 511e25f..8647a94 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define n 10000