diff mbox

Use plain -fopenacc to enable OpenACC kernels processing (was: [PATCH, 6/16] Add pass_oacc_kernels)

Message ID 87bn7v4b0m.fsf@kepler.schwinge.homeip.net
State New
Headers show

Commit Message

Thomas Schwinge Feb. 5, 2016, 12:06 p.m. UTC
Hi!

On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 09/11/15 16:35, Tom de Vries wrote:
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.

> Atm, the parallelization behaviour for the kernels region is controlled 
> by flag_tree_parallelize_loops, which is also used to control generic 
> auto-parallelization by autopar using omp. That is not ideal, and we may 
> want a separate flag (or param) to control the behaviour for oacc 
> kernels, f.i. -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.

I suggest to use plain -fopenacc to enable OpenACC kernels processing
(which just makes sense, I hope) ;-) and have later processing stages
determine the actual parametrization (currently: number of gangs) (that
is, Nathan's recent "Default compute dimensions" patches).

The code changes are simple enough; OK for trunk?  (This patch depends on
my 'Un-parallelized OpenACC kernels constructs with nvptx offloading:
"avoid offloading"' pending review,
<http://news.gmane.org/find-root.php?message_id=%3C87zivg8rcy.fsf%40hertz.schwinge.homeip.net%3E>.)

Originally, I want to use:

    OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, n_threads == 0 ? -1 : n_threads);

... to store -1 "have the compiler decidew" (instead of now 0 "have the
run-time decide", which might prevent some code optimizations, as I
understand it) for the n_threads == 0 case, but it seems that for an
offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
called with the parameter "used" set to 0 instead of "gang", and then the
"Default anything left to 1 or a partitioned default" logic will default
dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
bug (and could you look into that)?



Grüße
 Thomas

Comments

Thomas Schwinge Feb. 10, 2016, 2:40 p.m. UTC | #1
Hi!

Will this patch be acceptable for GCC trunk in the current development
stage?  In its current incarnation, this patch depends on my
'Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid
offloading"' patch,
<http://news.gmane.org/find-root.php?message_id=%3C87zivg8rcy.fsf%40hertz.schwinge.homeip.net%3E>,
which Bernd suggested "has to be considered after gcc-6".  So, I'll have
to re-work this patch here, hence I'm first checking if it generally
meets approval?

On Fri, 5 Feb 2016 13:06:17 +0100, I wrote:
> On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 09/11/15 16:35, Tom de Vries wrote:
> > > this patch series for stage1 trunk adds support to:
> > > - parallelize oacc kernels regions using parloops, and
> > > - map the loops onto the oacc gang dimension.
> 
> > Atm, the parallelization behaviour for the kernels region is controlled 
> > by flag_tree_parallelize_loops, which is also used to control generic 
> > auto-parallelization by autopar using omp. That is not ideal, and we may 
> > want a separate flag (or param) to control the behaviour for oacc 
> > kernels, f.i. -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.
> 
> I suggest to use plain -fopenacc to enable OpenACC kernels processing
> (which just makes sense, I hope) ;-) and have later processing stages
> determine the actual parametrization (currently: number of gangs) (that
> is, Nathan's recent "Default compute dimensions" patches).
> 
> The code changes are simple enough; OK for trunk?  (This patch depends on
> my 'Un-parallelized OpenACC kernels constructs with nvptx offloading:
> "avoid offloading"' pending review,
> <http://news.gmane.org/find-root.php?message_id=%3C87zivg8rcy.fsf%40hertz.schwinge.homeip.net%3E>.)
> 
> Originally, I want to use:
> 
>     OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, n_threads == 0 ? -1 : n_threads);
> 
> ... to store -1 "have the compiler decidew" (instead of now 0 "have the
> run-time decide", which might prevent some code optimizations, as I
> understand it) for the n_threads == 0 case, but it seems that for an
> offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
> called with the parameter "used" set to 0 instead of "gang", and then the
> "Default anything left to 1 or a partitioned default" logic will default
> dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
> oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
> bug (and could you look into that)?
> 
> diff --git gcc/tree-parloops.c gcc/tree-parloops.c
> index 139e38c..e498e5b 100644
> --- gcc/tree-parloops.c
> +++ gcc/tree-parloops.c
> @@ -2016,7 +2016,8 @@ transform_to_exit_first_loop (struct loop *loop,
>  /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
>     LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
>     NEW_DATA is the variable that should be initialized from the argument
> -   of LOOP_FN.  N_THREADS is the requested number of threads.  */
> +   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
> +   that number is to be determined later.  */
>  
>  static void
>  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
> @@ -2049,6 +2050,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
>        basic_block paral_bb = single_pred (bb);
>        gsi = gsi_last_bb (paral_bb);
>  
> +      gcc_checking_assert (n_threads != 0);
>        t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
>        OMP_CLAUSE_NUM_THREADS_EXPR (t)
>  	= build_int_cst (integer_type_node, n_threads);
> @@ -2221,7 +2223,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
>  }
>  
>  /* Generates code to execute the iterations of LOOP in N_THREADS
> -   threads in parallel.
> +   threads in parallel, which can be 0 if that number is to be determined
> +   later.
>  
>     NITER describes number of iterations of LOOP.
>     REDUCTION_LIST describes the reductions existent in the LOOP.  */
> @@ -2318,6 +2321,7 @@ gen_parallel_loop (struct loop *loop,
>        else
>  	m_p_thread=MIN_PER_THREAD;
>  
> +      gcc_checking_assert (n_threads != 0);
>        many_iterations_cond =
>  	fold_build2 (GE_EXPR, boolean_type_node,
>  		     nit, build_int_cst (type, m_p_thread * n_threads));
> @@ -3177,7 +3181,7 @@ oacc_entry_exit_ok (struct loop *loop,
>  static bool
>  parallelize_loops (bool oacc_kernels_p)
>  {
> -  unsigned n_threads = flag_tree_parallelize_loops;
> +  unsigned n_threads;
>    bool changed = false;
>    struct loop *loop;
>    struct loop *skip_loop = NULL;
> @@ -3199,6 +3203,13 @@ parallelize_loops (bool oacc_kernels_p)
>    if (cfun->has_nonlocal_label)
>      return false;
>  
> +  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
> +     the argument to -ftree-parallelize-loops.  */
> +  if (oacc_kernels_p)
> +    n_threads = 0;
> +  else
> +    n_threads = flag_tree_parallelize_loops;
> +
>    gcc_obstack_init (&parloop_obstack);
>    reduction_info_table_type reduction_list (10);
>  
> @@ -3361,7 +3372,13 @@ public:
>    {}
>  
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
> +  virtual bool gate (function *)
> +  {
> +    if (oacc_kernels_p)
> +      return flag_openacc;
> +    else
> +      return flag_tree_parallelize_loops > 1;
> +  }
>    virtual unsigned int execute (function *);
>    opt_pass * clone () { return new pass_parallelize_loops (m_ctxt); }
>    void set_pass_param (unsigned int n, bool param)
> diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
> index bdbade5..4c39fbc 100644
> --- gcc/tree-ssa-loop.c
> +++ gcc/tree-ssa-loop.c
> @@ -148,7 +148,7 @@ make_pass_tree_loop (gcc::context *ctxt)
>  static bool
>  gate_oacc_kernels (function *fn)
>  {
> -  if (flag_tree_parallelize_loops <= 1)
> +  if (!flag_openacc)
>      return false;
>  
>    tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
> @@ -230,10 +230,9 @@ public:
>    virtual bool gate (function *)
>    {
>      return (optimize
> -	    /* Don't bother doing anything if the program has errors.  */
> -	    && !seen_error ()
>  	    && flag_openacc
> -	    && flag_tree_parallelize_loops > 1);
> +	    /* Don't bother doing anything if the program has errors.  */
> +	    && !seen_error ());
>    }
>  
>  }; // class pass_ipa_oacc
> diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
> index fe28154..2fd3d52 100644
> --- gcc/config/nvptx/nvptx.c
> +++ gcc/config/nvptx/nvptx.c
> @@ -4140,7 +4140,7 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
>  	  bool avoid_offloading_p = true;
>  	  for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
>  	    {
> -	      if (dims[ix] > 1)
> +	      if (dims[ix] > 1 || dims[ix] == 0)
>  		{
>  		  avoid_offloading_p = false;
>  		  break;
> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index bc24651..f795bf7 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -103,6 +103,10 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
>        return;
>      }
>  
> +  /* Default: let the runtime choose.  */
> +  for (i = 0; i != GOMP_DIM_MAX; i++)
> +    dims[i] = 0;
> +
>    va_start (ap, kinds);
>    /* TODO: This will need amending when device_type is implemented.  */
>    while ((tag = va_arg (ap, unsigned)) != 0)
> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
> index 7ec1810..3f1bb6d 100644
> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -894,9 +894,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
>    /* Initialize the launch dimensions.  Typically this is constant,
>       provided by the device compiler, but we must permit runtime
>       values.  */
> -  for (i = 0; i != 3; i++)
> -    if (targ_fn->launch->dim[i])
> -      dims[i] = targ_fn->launch->dim[i];
> +  int seen_zero = 0;
> +  for (i = 0; i != GOMP_DIM_MAX; i++)
> +    {
> +      if (targ_fn->launch->dim[i])
> +       dims[i] = targ_fn->launch->dim[i];
> +      if (!dims[i])
> +       seen_zero = 1;
> +    }
> +
> +  if (seen_zero)
> +    {
> +      for (i = 0; i != GOMP_DIM_MAX; i++)
> +       if (!dims[i])
> +         dims[i] = /* TODO */ 32;
> +    }
>  
>    /* This reserves a chunk of a pre-allocated page of memory mapped on both
>       the host and the device. HP is a host pointer to the new chunk, and DP is
> 
> The TODO in libgomp/plugin/plugin-nvptx.c:nvptx_exec will be resolved by
> Nathan's "Default compute dimensions (runtime)",
> <http://news.gmane.org/find-root.php?message_id=%3C56B21D23.5060209%40acm.org%3E>.
> 
> The remainder is just "mechanical" updates to the test cases:
> 
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
> index e8b5357..17f240e 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -51,4 +50,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
> index c39d674..750f576 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -34,4 +33,4 @@ foo (unsigned int n)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
> index 3501d0d..df60d6a 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -34,4 +33,4 @@ foo (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
> index f97584d..913d91f 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -67,4 +66,4 @@ main (void)
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
> index 530d62a..1822d2a 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -45,5 +44,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
> index 4f1c2c5..e946319 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
> @@ -1,6 +1,5 @@
>  /* { dg-additional-options "-O2" } */
>  /* { dg-additional-options "-g" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -13,5 +12,4 @@
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
> index 151db51..9b63b45 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -49,4 +48,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
> index bee5f5a..279f797 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -52,5 +51,4 @@ foo (COUNTERTYPE n)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
> index ea0e342..db1071f 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -36,4 +35,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
> index ab5dfb9..abf7a3c 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-loop.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -52,5 +51,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
> index b16a8cd..95f4817 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -50,5 +49,4 @@ main (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
> index 61c5df3..6f5a418 100644
> --- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
> +++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
> @@ -1,5 +1,4 @@
>  /* { dg-additional-options "-O2" } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>  /* { dg-additional-options "-fdump-tree-optimized" } */
>  
> @@ -32,5 +31,4 @@ foo (void)
>  /* Check that the loop has been split off into a function.  */
>  /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
> -
> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
> diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
> index 4db3a50..3334741 100644
> --- gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
> +++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
> @@ -1,5 +1,4 @@
>  ! { dg-additional-options "-O2" }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  
>  program main
>     implicit none
> diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
> index fef3d10..fb92da8 100644
> --- gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
> +++ gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
> @@ -1,5 +1,4 @@
>  ! { dg-additional-options "-O2" }
> -! { dg-additional-options "-ftree-parallelize-loops=10" }
>  
>  program main
>     implicit none
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> index 08745fc..366b4f5 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
> @@ -1,6 +1,5 @@
>  /* Test that the compiler decides to "avoid offloading".  */
>  
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* The ACC_DEVICE_TYPE environment variable gets set in the testing
>     framework, and that overrides the "avoid offloading" flag at run time.
>     { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> index 724228a..a63ec97 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
> @@ -1,8 +1,6 @@
>  /* Test that a user can override the compiler's "avoid offloading"
>     decision at run time.  */
>  
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <openacc.h>
>  
>  int main(void)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> index 2fb5196..da01d02 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
> @@ -1,7 +1,6 @@
>  /* Test that a user can override the compiler's "avoid offloading"
>     decision at compile time.  */
>  
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* Override the compiler's "avoid offloading" decision.
>     { dg-additional-options "-foffload-force" } */
>  
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> index 87ca378..39899ab 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
> @@ -1,7 +1,5 @@
>  /* This test exercises combined directives.  */
>  
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  int
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> index 8f0144c..31da8b1 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include  <openacc.h>
>  
>  int test_parallel ()
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> index 3ef6f9b..51745ba 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
> @@ -1,5 +1,4 @@
>  /* { dg-do run { target openacc_nvidia_accel_selected } } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
>  
>  #include <stdlib.h>
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> index 614ad33..588e864 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  int i;
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
> index 13e57bd..c7592d6 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N (1024 * 512)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
> index f61a74a..31114ac 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N (1024 * 512)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> index 5cdc200..3ffdfe2 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> index 2e4d4d2..a554d66 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> index 5bf00db..f0144b4 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> index d39b667..4719edd 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> index bb2e85b..ca4f638 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> index e513827..d2fff38 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 32
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> index c4791a4..0df4b3f 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 100
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
> index 96b6e4e..88258be 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
> @@ -1,5 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>  /* { dg-additional-options "-g" } */
>  
>  #include "kernels-loop.c"
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
> index 1433cb2..147ebb5 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N ((1024 * 512) + 1)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
> index fd0d5b1..9a3eaca 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N ((1024 * 512) + 1)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
> index 21d2599..28c725a 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N 1000
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
> index 3762e5a..355123c 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define N (1024 * 512)
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
> index 511e25f..8647a94 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
> @@ -1,6 +1,3 @@
> -/* { dg-do run } */
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  #define n 10000
> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> index 94a5ae2..83cddb5 100644
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
> @@ -1,5 +1,3 @@
> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
> -
>  #include <stdlib.h>
>  
>  int
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> index 5f18b94..ca5cd01 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
> @@ -2,7 +2,6 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-cpp" }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  ! The "avoid offloading" warning is only triggered for -O2 and higher.
>  ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>  ! The ACC_DEVICE_TYPE environment variable gets set in the testing
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> index 51801ad..6200b37 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
> @@ -3,7 +3,6 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-cpp" }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  ! The "avoid offloading" warning is only triggered for -O2 and higher.
>  ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> index bea6ab8..865d09f 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
> @@ -3,7 +3,6 @@
>  
>  ! { dg-do run }
>  ! { dg-additional-options "-cpp" }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  ! Override the compiler's "avoid offloading" decision.
>  ! { dg-additional-options "-foffload-force" }
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> index 4b52579..12ff36c 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
> @@ -1,7 +1,6 @@
>  ! This test exercises combined directives.
>  
>  ! { dg-do run }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  ! The "avoid offloading" warning is only triggered for -O2 and higher.
>  ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> index b9298c7..0643e89 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
> @@ -2,7 +2,6 @@
>  ! offloaded regions are properly mapped using present_or_copy.
>  
>  ! { dg-do run }
> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>  ! The "avoid offloading" warning is only triggered for -O2 and higher.
>  ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }


Grüße
 Thomas
Tom de Vries Feb. 15, 2016, 4:53 p.m. UTC | #2
On 10/02/16 15:40, Thomas Schwinge wrote:
> Hi!
>
> Will this patch be acceptable for GCC trunk in the current development
> stage?  In its current incarnation, this patch depends on my
> 'Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid
> offloading"' patch,
> <http://news.gmane.org/find-root.php?message_id=%3C87zivg8rcy.fsf%40hertz.schwinge.homeip.net%3E>,
> which Bernd suggested "has to be considered after gcc-6".  So, I'll have
> to re-work this patch here, hence I'm first checking if it generally
> meets approval?
>
> On Fri, 5 Feb 2016 13:06:17 +0100, I wrote:
>> On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 09/11/15 16:35, Tom de Vries wrote:
>>>> this patch series for stage1 trunk adds support to:
>>>> - parallelize oacc kernels regions using parloops, and
>>>> - map the loops onto the oacc gang dimension.
>>
>>> Atm, the parallelization behaviour for the kernels region is controlled
>>> by flag_tree_parallelize_loops, which is also used to control generic
>>> auto-parallelization by autopar using omp. That is not ideal, and we may
>>> want a separate flag (or param) to control the behaviour for oacc
>>> kernels, f.i. -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.
>>
>> I suggest to use plain -fopenacc to enable OpenACC kernels processing
>> (which just makes sense, I hope) ;-) and have later processing stages
>> determine the actual parametrization (currently: number of gangs) (that
>> is, Nathan's recent "Default compute dimensions" patches).
>>

Hi Thomas,

That makes a lot of sense.  Thanks for working on this.

>> The code changes are simple enough; OK for trunk?  (This patch depends on
>> my 'Un-parallelized OpenACC kernels constructs with nvptx offloading:
>> "avoid offloading"' pending review,
>> <http://news.gmane.org/find-root.php?message_id=%3C87zivg8rcy.fsf%40hertz.schwinge.homeip.net%3E>.)
>>
>> Originally, I want to use:
>>
>>      OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, n_threads == 0 ? -1 : n_threads);
>>
>> ... to store -1 "have the compiler decidew" (instead of now 0 "have the
>> run-time decide", which might prevent some code optimizations, as I
>> understand it) for the n_threads == 0 case, but it seems that for an
>> offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
>> called with the parameter "used" set to 0 instead of "gang", and then the
>> "Default anything left to 1 or a partitioned default" logic will default
>> dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
>> oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
>> bug (and could you look into that)?
>>
>> diff --git gcc/tree-parloops.c gcc/tree-parloops.c
>> index 139e38c..e498e5b 100644
>> --- gcc/tree-parloops.c
>> +++ gcc/tree-parloops.c
>> @@ -2016,7 +2016,8 @@ transform_to_exit_first_loop (struct loop *loop,
>>   /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
>>      LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
>>      NEW_DATA is the variable that should be initialized from the argument
>> -   of LOOP_FN.  N_THREADS is the requested number of threads.  */
>> +   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
>> +   that number is to be determined later.  */
>>
>>   static void
>>   create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
>> @@ -2049,6 +2050,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
>>         basic_block paral_bb = single_pred (bb);
>>         gsi = gsi_last_bb (paral_bb);
>>
>> +      gcc_checking_assert (n_threads != 0);
>>         t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
>>         OMP_CLAUSE_NUM_THREADS_EXPR (t)
>>   	= build_int_cst (integer_type_node, n_threads);
>> @@ -2221,7 +2223,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
>>   }
>>
>>   /* Generates code to execute the iterations of LOOP in N_THREADS
>> -   threads in parallel.
>> +   threads in parallel, which can be 0 if that number is to be determined
>> +   later.
>>
>>      NITER describes number of iterations of LOOP.
>>      REDUCTION_LIST describes the reductions existent in the LOOP.  */
>> @@ -2318,6 +2321,7 @@ gen_parallel_loop (struct loop *loop,
>>         else
>>   	m_p_thread=MIN_PER_THREAD;
>>
>> +      gcc_checking_assert (n_threads != 0);
>>         many_iterations_cond =
>>   	fold_build2 (GE_EXPR, boolean_type_node,
>>   		     nit, build_int_cst (type, m_p_thread * n_threads));
>> @@ -3177,7 +3181,7 @@ oacc_entry_exit_ok (struct loop *loop,
>>   static bool
>>   parallelize_loops (bool oacc_kernels_p)
>>   {
>> -  unsigned n_threads = flag_tree_parallelize_loops;
>> +  unsigned n_threads;
>>     bool changed = false;
>>     struct loop *loop;
>>     struct loop *skip_loop = NULL;
>> @@ -3199,6 +3203,13 @@ parallelize_loops (bool oacc_kernels_p)
>>     if (cfun->has_nonlocal_label)
>>       return false;
>>
>> +  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
>> +     the argument to -ftree-parallelize-loops.  */
>> +  if (oacc_kernels_p)
>> +    n_threads = 0;
>> +  else
>> +    n_threads = flag_tree_parallelize_loops;
>> +
>>     gcc_obstack_init (&parloop_obstack);
>>     reduction_info_table_type reduction_list (10);
>>
>> @@ -3361,7 +3372,13 @@ public:
>>     {}
>>
>>     /* opt_pass methods: */
>> -  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
>> +  virtual bool gate (function *)
>> +  {
>> +    if (oacc_kernels_p)
>> +      return flag_openacc;
>> +    else
>> +      return flag_tree_parallelize_loops > 1;
>> +  }

I wouldn't mind using the tertiary expression here, but I suppose that's 
a taste thing.

>>     virtual unsigned int execute (function *);
>>     opt_pass * clone () { return new pass_parallelize_loops (m_ctxt); }
>>     void set_pass_param (unsigned int n, bool param)

The oacc-parloops changes look good to me. I approve them for 6.0 stage 
4 (given that using the ftree-parallelize-loops=<n> flag for oacc 
kernels parallelization was was just a placeholder waiting to be 
replaced by an oacc-based approach). [ And I'd expect that the 
tree-ssa-loop.c changes and the mechanical testsuite changes can be 
regarded as trivial. ]

Thanks,
- Tom

>> diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
>> index bdbade5..4c39fbc 100644
>> --- gcc/tree-ssa-loop.c
>> +++ gcc/tree-ssa-loop.c
>> @@ -148,7 +148,7 @@ make_pass_tree_loop (gcc::context *ctxt)
>>   static bool
>>   gate_oacc_kernels (function *fn)
>>   {
>> -  if (flag_tree_parallelize_loops <= 1)
>> +  if (!flag_openacc)
>>       return false;
>>
>>     tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
>> @@ -230,10 +230,9 @@ public:
>>     virtual bool gate (function *)
>>     {
>>       return (optimize
>> -	    /* Don't bother doing anything if the program has errors.  */
>> -	    && !seen_error ()
>>   	    && flag_openacc
>> -	    && flag_tree_parallelize_loops > 1);
>> +	    /* Don't bother doing anything if the program has errors.  */
>> +	    && !seen_error ());
>>     }
>>
>>   }; // class pass_ipa_oacc
>> diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
>> index fe28154..2fd3d52 100644
>> --- gcc/config/nvptx/nvptx.c
>> +++ gcc/config/nvptx/nvptx.c
>> @@ -4140,7 +4140,7 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
>>   	  bool avoid_offloading_p = true;
>>   	  for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
>>   	    {
>> -	      if (dims[ix] > 1)
>> +	      if (dims[ix] > 1 || dims[ix] == 0)
>>   		{
>>   		  avoid_offloading_p = false;
>>   		  break;
>> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
>> index bc24651..f795bf7 100644
>> --- libgomp/oacc-parallel.c
>> +++ libgomp/oacc-parallel.c
>> @@ -103,6 +103,10 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
>>         return;
>>       }
>>
>> +  /* Default: let the runtime choose.  */
>> +  for (i = 0; i != GOMP_DIM_MAX; i++)
>> +    dims[i] = 0;
>> +
>>     va_start (ap, kinds);
>>     /* TODO: This will need amending when device_type is implemented.  */
>>     while ((tag = va_arg (ap, unsigned)) != 0)
>> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
>> index 7ec1810..3f1bb6d 100644
>> --- libgomp/plugin/plugin-nvptx.c
>> +++ libgomp/plugin/plugin-nvptx.c
>> @@ -894,9 +894,21 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
>>     /* Initialize the launch dimensions.  Typically this is constant,
>>        provided by the device compiler, but we must permit runtime
>>        values.  */
>> -  for (i = 0; i != 3; i++)
>> -    if (targ_fn->launch->dim[i])
>> -      dims[i] = targ_fn->launch->dim[i];
>> +  int seen_zero = 0;
>> +  for (i = 0; i != GOMP_DIM_MAX; i++)
>> +    {
>> +      if (targ_fn->launch->dim[i])
>> +       dims[i] = targ_fn->launch->dim[i];
>> +      if (!dims[i])
>> +       seen_zero = 1;
>> +    }
>> +
>> +  if (seen_zero)
>> +    {
>> +      for (i = 0; i != GOMP_DIM_MAX; i++)
>> +       if (!dims[i])
>> +         dims[i] = /* TODO */ 32;
>> +    }
>>
>>     /* This reserves a chunk of a pre-allocated page of memory mapped on both
>>        the host and the device. HP is a host pointer to the new chunk, and DP is
>>
>> The TODO in libgomp/plugin/plugin-nvptx.c:nvptx_exec will be resolved by
>> Nathan's "Default compute dimensions (runtime)",
>> <http://news.gmane.org/find-root.php?message_id=%3C56B21D23.5060209%40acm.org%3E>.
>>
>> The remainder is just "mechanical" updates to the test cases:
>>
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
>> index e8b5357..17f240e 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -51,4 +50,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
>> index c39d674..750f576 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -34,4 +33,4 @@ foo (unsigned int n)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
>> index 3501d0d..df60d6a 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -34,4 +33,4 @@ foo (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
>> index f97584d..913d91f 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -67,4 +66,4 @@ main (void)
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
>> index 530d62a..1822d2a 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -45,5 +44,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
>> index 4f1c2c5..e946319 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
>> @@ -1,6 +1,5 @@
>>   /* { dg-additional-options "-O2" } */
>>   /* { dg-additional-options "-g" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -13,5 +12,4 @@
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
>> index 151db51..9b63b45 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -49,4 +48,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
>> index bee5f5a..279f797 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -52,5 +51,4 @@ foo (COUNTERTYPE n)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
>> index ea0e342..db1071f 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -36,4 +35,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
>> index ab5dfb9..abf7a3c 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-loop.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -52,5 +51,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
>> index b16a8cd..95f4817 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -50,5 +49,4 @@ main (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
>> index 61c5df3..6f5a418 100644
>> --- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
>> +++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-additional-options "-O2" } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-fdump-tree-parloops1-all" } */
>>   /* { dg-additional-options "-fdump-tree-optimized" } */
>>
>> @@ -32,5 +31,4 @@ foo (void)
>>   /* Check that the loop has been split off into a function.  */
>>   /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
>>
>> -/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
>> -
>> +/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
>> diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
>> index 4db3a50..3334741 100644
>> --- gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
>> +++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
>> @@ -1,5 +1,4 @@
>>   ! { dg-additional-options "-O2" }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>
>>   program main
>>      implicit none
>> diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
>> index fef3d10..fb92da8 100644
>> --- gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
>> +++ gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
>> @@ -1,5 +1,4 @@
>>   ! { dg-additional-options "-O2" }
>> -! { dg-additional-options "-ftree-parallelize-loops=10" }
>>
>>   program main
>>      implicit none
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
>> index 08745fc..366b4f5 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
>> @@ -1,6 +1,5 @@
>>   /* Test that the compiler decides to "avoid offloading".  */
>>
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* The ACC_DEVICE_TYPE environment variable gets set in the testing
>>      framework, and that overrides the "avoid offloading" flag at run time.
>>      { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
>> index 724228a..a63ec97 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
>> @@ -1,8 +1,6 @@
>>   /* Test that a user can override the compiler's "avoid offloading"
>>      decision at run time.  */
>>
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <openacc.h>
>>
>>   int main(void)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
>> index 2fb5196..da01d02 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
>> @@ -1,7 +1,6 @@
>>   /* Test that a user can override the compiler's "avoid offloading"
>>      decision at compile time.  */
>>
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* Override the compiler's "avoid offloading" decision.
>>      { dg-additional-options "-foffload-force" } */
>>
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
>> index 87ca378..39899ab 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
>> @@ -1,7 +1,5 @@
>>   /* This test exercises combined directives.  */
>>
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   int
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
>> index 8f0144c..31da8b1 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include  <openacc.h>
>>
>>   int test_parallel ()
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
>> index 3ef6f9b..51745ba 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
>> @@ -1,5 +1,4 @@
>>   /* { dg-do run { target openacc_nvidia_accel_selected } } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
>>
>>   #include <stdlib.h>
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
>> index 614ad33..588e864 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   int i;
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
>> index 13e57bd..c7592d6 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N (1024 * 512)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
>> index f61a74a..31114ac 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N (1024 * 512)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
>> index 5cdc200..3ffdfe2 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
>> index 2e4d4d2..a554d66 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
>> index 5bf00db..f0144b4 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
>> index d39b667..4719edd 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
>> index bb2e85b..ca4f638 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
>> index e513827..d2fff38 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 32
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
>> index c4791a4..0df4b3f 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 100
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
>> index 96b6e4e..88258be 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>>   /* { dg-additional-options "-g" } */
>>
>>   #include "kernels-loop.c"
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
>> index 1433cb2..147ebb5 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N ((1024 * 512) + 1)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
>> index fd0d5b1..9a3eaca 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N ((1024 * 512) + 1)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
>> index 21d2599..28c725a 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N 1000
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
>> index 3762e5a..355123c 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define N (1024 * 512)
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
>> index 511e25f..8647a94 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
>> @@ -1,6 +1,3 @@
>> -/* { dg-do run } */
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   #define n 10000
>> diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
>> index 94a5ae2..83cddb5 100644
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
>> @@ -1,5 +1,3 @@
>> -/* { dg-additional-options "-ftree-parallelize-loops=32" } */
>> -
>>   #include <stdlib.h>
>>
>>   int
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
>> index 5f18b94..ca5cd01 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
>> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
>> @@ -2,7 +2,6 @@
>>
>>   ! { dg-do run }
>>   ! { dg-additional-options "-cpp" }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>   ! The "avoid offloading" warning is only triggered for -O2 and higher.
>>   ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>>   ! The ACC_DEVICE_TYPE environment variable gets set in the testing
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
>> index 51801ad..6200b37 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
>> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
>> @@ -3,7 +3,6 @@
>>
>>   ! { dg-do run }
>>   ! { dg-additional-options "-cpp" }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>   ! The "avoid offloading" warning is only triggered for -O2 and higher.
>>   ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>>
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
>> index bea6ab8..865d09f 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
>> +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
>> @@ -3,7 +3,6 @@
>>
>>   ! { dg-do run }
>>   ! { dg-additional-options "-cpp" }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>   ! Override the compiler's "avoid offloading" decision.
>>   ! { dg-additional-options "-foffload-force" }
>>
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
>> index 4b52579..12ff36c 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
>> +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
>> @@ -1,7 +1,6 @@
>>   ! This test exercises combined directives.
>>
>>   ! { dg-do run }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>   ! The "avoid offloading" warning is only triggered for -O2 and higher.
>>   ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>>
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
>> index b9298c7..0643e89 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
>> +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
>> @@ -2,7 +2,6 @@
>>   ! offloaded regions are properly mapped using present_or_copy.
>>
>>   ! { dg-do run }
>> -! { dg-additional-options "-ftree-parallelize-loops=32" }
>>   ! The "avoid offloading" warning is only triggered for -O2 and higher.
>>   ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
>
diff mbox

Patch

diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 139e38c..e498e5b 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -2016,7 +2016,8 @@  transform_to_exit_first_loop (struct loop *loop,
 /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
    LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
    NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
+   that number is to be determined later.  */
 
 static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
@@ -2049,6 +2050,7 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
       basic_block paral_bb = single_pred (bb);
       gsi = gsi_last_bb (paral_bb);
 
+      gcc_checking_assert (n_threads != 0);
       t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
       OMP_CLAUSE_NUM_THREADS_EXPR (t)
 	= build_int_cst (integer_type_node, n_threads);
@@ -2221,7 +2223,8 @@  create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
-   threads in parallel.
+   threads in parallel, which can be 0 if that number is to be determined
+   later.
 
    NITER describes number of iterations of LOOP.
    REDUCTION_LIST describes the reductions existent in the LOOP.  */
@@ -2318,6 +2321,7 @@  gen_parallel_loop (struct loop *loop,
       else
 	m_p_thread=MIN_PER_THREAD;
 
+      gcc_checking_assert (n_threads != 0);
       many_iterations_cond =
 	fold_build2 (GE_EXPR, boolean_type_node,
 		     nit, build_int_cst (type, m_p_thread * n_threads));
@@ -3177,7 +3181,7 @@  oacc_entry_exit_ok (struct loop *loop,
 static bool
 parallelize_loops (bool oacc_kernels_p)
 {
-  unsigned n_threads = flag_tree_parallelize_loops;
+  unsigned n_threads;
   bool changed = false;
   struct loop *loop;
   struct loop *skip_loop = NULL;
@@ -3199,6 +3203,13 @@  parallelize_loops (bool oacc_kernels_p)
   if (cfun->has_nonlocal_label)
     return false;
 
+  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
+     the argument to -ftree-parallelize-loops.  */
+  if (oacc_kernels_p)
+    n_threads = 0;
+  else
+    n_threads = flag_tree_parallelize_loops;
+
   gcc_obstack_init (&parloop_obstack);
   reduction_info_table_type reduction_list (10);
 
@@ -3361,7 +3372,13 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual bool gate (function *)
+  {
+    if (oacc_kernels_p)
+      return flag_openacc;
+    else
+      return flag_tree_parallelize_loops > 1;
+  }
   virtual unsigned int execute (function *);
   opt_pass * clone () { return new pass_parallelize_loops (m_ctxt); }
   void set_pass_param (unsigned int n, bool param)
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index bdbade5..4c39fbc 100644
--- gcc/tree-ssa-loop.c
+++ gcc/tree-ssa-loop.c
@@ -148,7 +148,7 @@  make_pass_tree_loop (gcc::context *ctxt)
 static bool
 gate_oacc_kernels (function *fn)
 {
-  if (flag_tree_parallelize_loops <= 1)
+  if (!flag_openacc)
     return false;
 
   tree oacc_function_attr = get_oacc_fn_attrib (fn->decl);
@@ -230,10 +230,9 @@  public:
   virtual bool gate (function *)
   {
     return (optimize
-	    /* Don't bother doing anything if the program has errors.  */
-	    && !seen_error ()
 	    && flag_openacc
-	    && flag_tree_parallelize_loops > 1);
+	    /* Don't bother doing anything if the program has errors.  */
+	    && !seen_error ());
   }
 
 }; // class pass_ipa_oacc
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index fe28154..2fd3d52 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -4140,7 +4140,7 @@  nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 	  bool avoid_offloading_p = true;
 	  for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++)
 	    {
-	      if (dims[ix] > 1)
+	      if (dims[ix] > 1 || dims[ix] == 0)
 		{
 		  avoid_offloading_p = false;
 		  break;
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index bc24651..f795bf7 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -103,6 +103,10 @@  GOACC_parallel_keyed (int device, void (*fn) (void *),
       return;
     }
 
+  /* Default: let the runtime choose.  */
+  for (i = 0; i != GOMP_DIM_MAX; i++)
+    dims[i] = 0;
+
   va_start (ap, kinds);
   /* TODO: This will need amending when device_type is implemented.  */
   while ((tag = va_arg (ap, unsigned)) != 0)
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7ec1810..3f1bb6d 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -894,9 +894,21 @@  nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   /* Initialize the launch dimensions.  Typically this is constant,
      provided by the device compiler, but we must permit runtime
      values.  */
-  for (i = 0; i != 3; i++)
-    if (targ_fn->launch->dim[i])
-      dims[i] = targ_fn->launch->dim[i];
+  int seen_zero = 0;
+  for (i = 0; i != GOMP_DIM_MAX; i++)
+    {
+      if (targ_fn->launch->dim[i])
+       dims[i] = targ_fn->launch->dim[i];
+      if (!dims[i])
+       seen_zero = 1;
+    }
+
+  if (seen_zero)
+    {
+      for (i = 0; i != GOMP_DIM_MAX; i++)
+       if (!dims[i])
+         dims[i] = /* TODO */ 32;
+    }
 
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
      the host and the device. HP is a host pointer to the new chunk, and DP is

The TODO in libgomp/plugin/plugin-nvptx.c:nvptx_exec will be resolved by
Nathan's "Default compute dimensions (runtime)",
<http://news.gmane.org/find-root.php?message_id=%3C56B21D23.5060209%40acm.org%3E>.

The remainder is just "mechanical" updates to the test cases:

diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index e8b5357..17f240e 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -51,4 +50,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index c39d674..750f576 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -34,4 +33,4 @@  foo (unsigned int n)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index 3501d0d..df60d6a 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -34,4 +33,4 @@  foo (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index f97584d..913d91f 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -67,4 +66,4 @@  main (void)
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 3 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 530d62a..1822d2a 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -45,5 +44,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 4f1c2c5..e946319 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,6 +1,5 @@ 
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -13,5 +12,4 @@ 
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 151db51..9b63b45 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -49,4 +48,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index bee5f5a..279f797 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -52,5 +51,4 @@  foo (COUNTERTYPE n)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index ea0e342..db1071f 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -36,4 +35,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index ab5dfb9..abf7a3c 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -52,5 +51,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index b16a8cd..95f4817 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -50,5 +49,4 @@  main (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-reduction.c gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 61c5df3..6f5a418 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -1,5 +1,4 @@ 
 /* { dg-additional-options "-O2" } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
@@ -32,5 +31,4 @@  foo (void)
 /* Check that the loop has been split off into a function.  */
 /* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(32," 1 "parloops1" } } */
-
+/* { dg-final { scan-tree-dump-times "(?n)oacc function \\(0," 1 "parloops1" } } */
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index 4db3a50..3334741 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -1,5 +1,4 @@ 
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 
 program main
    implicit none
diff --git gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
index fef3d10..fb92da8 100644
--- gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
+++ gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
@@ -1,5 +1,4 @@ 
 ! { dg-additional-options "-O2" }
-! { dg-additional-options "-ftree-parallelize-loops=10" }
 
 program main
    implicit none
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
index 08745fc..366b4f5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
@@ -1,6 +1,5 @@ 
 /* Test that the compiler decides to "avoid offloading".  */
 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* The ACC_DEVICE_TYPE environment variable gets set in the testing
    framework, and that overrides the "avoid offloading" flag at run time.
    { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
index 724228a..a63ec97 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
@@ -1,8 +1,6 @@ 
 /* Test that a user can override the compiler's "avoid offloading"
    decision at run time.  */
 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <openacc.h>
 
 int main(void)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
index 2fb5196..da01d02 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
@@ -1,7 +1,6 @@ 
 /* Test that a user can override the compiler's "avoid offloading"
    decision at compile time.  */
 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
index 87ca378..39899ab 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
@@ -1,7 +1,5 @@ 
 /* This test exercises combined directives.  */
 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 int
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
index 8f0144c..31da8b1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include  <openacc.h>
 
 int test_parallel ()
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
index 3ef6f9b..51745ba 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c
@@ -1,5 +1,4 @@ 
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-lcuda -lcublas -lcudart" } */
 
 #include <stdlib.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
index 614ad33..588e864 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 int i;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
index 13e57bd..c7592d6 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
index f61a74a..31114ac 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index 5cdc200..3ffdfe2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
index 2e4d4d2..a554d66 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
index 5bf00db..f0144b4 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index d39b667..4719edd 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index bb2e85b..ca4f638 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index e513827..d2fff38 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 32
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index c4791a4..0df4b3f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 100
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
index 96b6e4e..88258be 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
@@ -1,5 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
 /* { dg-additional-options "-g" } */
 
 #include "kernels-loop.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
index 1433cb2..147ebb5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
index fd0d5b1..9a3eaca 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
index 21d2599..28c725a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N 1000
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
index 3762e5a..355123c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
index 511e25f..8647a94 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
@@ -1,6 +1,3 @@ 
-/* { dg-do run } */
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 #define n 10000
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
index 94a5ae2..83cddb5 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
@@ -1,5 +1,3 @@ 
-/* { dg-additional-options "-ftree-parallelize-loops=32" } */
-
 #include <stdlib.h>
 
 int
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
index 5f18b94..ca5cd01 100644
--- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
@@ -2,7 +2,6 @@ 
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 ! The "avoid offloading" warning is only triggered for -O2 and higher.
 ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
 ! The ACC_DEVICE_TYPE environment variable gets set in the testing
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
index 51801ad..6200b37 100644
--- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
@@ -3,7 +3,6 @@ 
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 ! The "avoid offloading" warning is only triggered for -O2 and higher.
 ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
index bea6ab8..865d09f 100644
--- libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
+++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
@@ -3,7 +3,6 @@ 
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 ! Override the compiler's "avoid offloading" decision.
 ! { dg-additional-options "-foffload-force" }
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
index 4b52579..12ff36c 100644
--- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90
@@ -1,7 +1,6 @@ 
 ! This test exercises combined directives.
 
 ! { dg-do run }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 ! The "avoid offloading" warning is only triggered for -O2 and higher.
 ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
index b9298c7..0643e89 100644
--- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
@@ -2,7 +2,6 @@ 
 ! offloaded regions are properly mapped using present_or_copy.
 
 ! { dg-do run }
-! { dg-additional-options "-ftree-parallelize-loops=32" }
 ! The "avoid offloading" warning is only triggered for -O2 and higher.
 ! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } }