Report errors on inconsistent OpenACC nested reduction, clauses
diff mbox series

Message ID a59d2021-f4ed-a559-a5c4-46752e17de67@codesourcery.com
State New
Headers show
Series
  • Report errors on inconsistent OpenACC nested reduction, clauses
Related show

Commit Message

Frederik Harwath Oct. 21, 2019, 7:08 a.m. UTC
Hi,
OpenACC requires that, if a variable is used in reduction clauses on two nested loops, then there
must be reduction clauses for that variable on all loops that are nested in between the two loops
and all these reduction clauses must use the same operator; this has been first clarified by
OpenACC 2.6. This commit introduces a check for that property which reports errors if the property
is violated.

I have tested the patch by comparing "make check" results and I am not aware of any regressions.

Gergö has implemented the check and it works, but I was wondering if the way in which the patch
avoids issuing errors about operator switches more than once by modifying the clauses (cf. the
corresponding comment in omp-low.c) could lead to problems - the processing might still continue
after the error on the modified tree, right? I was also wondering about the best place for such
checks. Should this be a part of "pass_lower_omp" (as in the patch) or should it run earlier
like, for instance, "pass_diagnose_omp_blocks".

Can the patch be included in trunk?

Frederik

Comments

Thomas Schwinge Oct. 24, 2019, 2:31 p.m. UTC | #1
Hi Frederik and Jakub!

On 2019-10-21T09:08:28+0200, "Harwath, Frederik" <frederik@codesourcery.com> wrote:
> OpenACC requires that, if a variable is used in reduction clauses on two nested loops, then there
> must be reduction clauses for that variable on all loops that are nested in between the two loops
> and all these reduction clauses must use the same operator; this has been first clarified by
> OpenACC 2.6. This commit introduces a check for that property which reports errors if the property
> is violated.

So I previously (internally, 2018-11-29) noted:

| I wonder if these should really be diagnosed as hard errors, or rather as
| warnings?
| 
| The specification describes what the user is expected to do, and the
| compiler should assist to achieve that goal, but I wonder if there might
| be any reasonable cases where a compiler error diagnostic might be
| considered "too strong" here?  (Just a quick thought.  On the other hand,
| of course, "fail loudly for stupid things" is desiable.  Will have to
| think about that further.)

In line with the discussion in
<http://mid.mail-archive.com/20190529145245.GU19695@tucnak>, I would now
suggest that indeed we here demote error to warning diagnostics: there
isn't a problem for the compiler to generate code in presence of
non-sensical/missing 'reduction' clauses, so no reason for a hard error.
Does that make sense to you, too?

Obviously, then also adjust all mentions in the commit log etc. from
"error" to "warning".


> I have tested the patch by comparing "make check" results and I am not aware of any regressions.

(For avoidance of doubt, I have not yet tested the patch.)


> Gergö has implemented the check and it works, but I was wondering if the way in which the patch
> avoids issuing errors about operator switches more than once by modifying the clauses (cf. the
> corresponding comment in omp-low.c) could lead to problems - the processing might still continue
> after the error on the modified tree, right?

Yes, processing continues (in order to report more than just the first
error), but per my understanding a single 'error_at' call makes sure that
compilation will termiate at some later point, with an error exit code.

"Patching up" erroneous state or even completely removing OMP clauses is
-- as far as I understand -- acceptable to avoid "issuing errors about
operator switches more than once".  This doesn't affect code generation,
because no code will be generated at all.

(Does that answer your question?)


Regarding my suggestions to "demote error to warning diagnostics", I'd
suggest that at this point we do *not* try to fix for the user any
presumed wrong/missing 'reduction' clauses (difficult/impossible to do
correctly in the general case), but really only diagnose them.  Thus, no
more "modifying the clauses"; that code should disappear.  This may
result in more warning diagnostics being emitted, but that seems
reasonable, given that the user code is presumed buggy.  (So, unless it's
straight-forward, please don't spend much time on trying to minimize the
number of warning diagnostics emitted.)


> I was also wondering about the best place for such
> checks. Should this be a part of "pass_lower_omp" (as in the patch)

(..., and which, for example, is also where
'check_omp_nesting_restrictions' is called from 'scan_omp', running as
part of 'pass_lower_omp'...)

> or should it run earlier
> like, for instance, "pass_diagnose_omp_blocks".

(..., running as one of the first middle end passes, before
'pass_lower_omp'.)

Jakub, do you have an opinion on that?  (Full-quote of the patch is
below, for your easy review.)

I think the issue is balancing whether to have it in its own pass (for
clean separation, which generally certainly is preferable) vs. embedded
into existing code paths that already walk over all the GIMPLE statements
(to avoid introducing more compile-time processing overhead).


> Can the patch be included in trunk?

Normally I might say "OK to commit with the following requests
addressed", but as you're still new, it's maybe a good idea that you post
another revision (as a reply to this email, simply).


A few additional comments/requests:

> From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001
> From: Frederik Harwath <frederik@codesourcery.com>
> Date: Mon, 21 Oct 2019 08:27:58 +0200
> Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction
>  clauses
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
>     OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
>     this was first clarified by OpenACC 2.6) requires that, if a
>     variable is used in reduction clauses on two nested loops, then
>     there must be reduction clauses for that variable on all loops
>     that are nested in between the two loops and all these reduction
>     clauses must use the same operator.
>     This commit introduces a check for that property which reports
>     errors if it is violated.
>
>     In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
>     reductions on variable b; adding a reduction clause to make it compile cleanly
>     would make it a duplicate of the test for variable c.
>
>     2010-10-21  Gergö Barany  <gergo@codesourcery.com>
> 		Frederik Harwath  <frederik@codesourcery.com>

2010, eh?  ;-P

The 'gcc/testsuite/' and 'libgomp/testsuite'/ ChangeLog updates should
list my name, too, as I've written large portions of these changes.  (See
the original og8 commit a9e48066198ffb1e7bc2b137167a61a6cb47748c -- but
that got lost in the og9 version, huh...)

> 	 gcc/
> 	 * omp-low.c (struct omp_context): New fields
> 	 local_reduction_clauses, outer_reduction_clauses.
> 	 (new_omp_context): Initialize these.
> 	 (scan_sharing_clauses): Record reduction clauses on OpenACC
> 	 constructs.
> 	 (scan_omp_for): Check reduction clauses for incorrect nesting.
> 	 gcc/testsuite/
> 	 * c-c++-common/goacc/nested-reductions-fail.c: New test.
> 	 * c-c++-common/goacc/nested-reductions.c: New test.
> 	 * c-c++-common/goacc/reduction-6.c: Adjust.
> 	 libgomp/
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
> 	 Add missing reduction clauses.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
> 	 Likewise.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
> 	 Likewise.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
> 	 Likewise.
> ---
>  gcc/omp-low.c                                 | 107 +++-
>  .../goacc/nested-reductions-fail.c            | 492 ++++++++++++++++++
>  .../c-c++-common/goacc/nested-reductions.c    | 420 +++++++++++++++
>  .../c-c++-common/goacc/reduction-6.c          |  11 -
>  .../par-loop-comb-reduction-1.c               |   2 +-
>  .../par-loop-comb-reduction-2.c               |   2 +-
>  .../par-loop-comb-reduction-3.c               |   2 +-
>  .../par-loop-comb-reduction-4.c               |   2 +-
>  8 files changed, 1022 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
>  create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

So just C/C++ testing, no Fortran at all.  This is not ideal, but
probably (hopefully) acceptable given that this is working on the middle
end representation shared between all front ends.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -127,6 +127,12 @@ struct omp_context
>       corresponding tracking loop iteration variables.  */
>    hash_map<tree, tree> *lastprivate_conditional_map;
>  
> +  /* A tree_list of the reduction clauses in this context.  */
> +  tree local_reduction_clauses;
> +
> +  /* A tree_list of the reduction clauses in outer contexts.  */
> +  tree outer_reduction_clauses;
> +
>    /* Nesting depth of this context.  Used to beautify error messages re
>       invalid gotos.  The outermost ctx is depth 1, with depth 0 being
>       reserved for the main body of the function.  */
|    int depth;

> @@ -902,6 +908,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>        ctx->cb = outer_ctx->cb;
>        ctx->cb.block = NULL;
>        ctx->depth = outer_ctx->depth + 1;
> +      ctx->local_reduction_clauses = NULL;
> +      ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
>      }
>    else
>      {
> @@ -917,6 +925,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>        ctx->cb.adjust_array_error_bounds = true;
>        ctx->cb.dont_remap_vla_if_no_change = true;
>        ctx->depth = 1;
> +      ctx->local_reduction_clauses = NULL;
> +      ctx->outer_reduction_clauses = NULL;
>      }
>  
>    ctx->cb.decl_map = new hash_map<tree, tree>;

To match the order in 'struct omp_context' (see above), move these new
initializations before those of 'ctx->depth'.  (Even if that also just
achieves "some local consistency".)  ;-)


> @@ -1131,6 +1141,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  
>  	case OMP_CLAUSE_REDUCTION:
>  	case OMP_CLAUSE_IN_REDUCTION:
> +		  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
> +			ctx->local_reduction_clauses
> +	      = tree_cons (NULL, c, ctx->local_reduction_clauses);
>  	  decl = OMP_CLAUSE_DECL (c);
>  	  if (TREE_CODE (decl) == MEM_REF)
>  	    {

I think this should really only apply to 'OMP_CLAUSE_REDUCTION' but not
'OMP_CLAUSE_IN_REDUCTION' (please verify)?  If that's correct, then move
the new code directly after 'case OMP_CLAUSE_REDUCTION:', followed by a
"magic" '/* FALLTHRU */' comment before 'case OMP_CLAUSE_IN_REDUCTION:'.


> @@ -2410,7 +2423,99 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
>  	  gimple_omp_for_set_clauses (stmt, clauses);
>  	  check_oacc_kernel_gwv (stmt, ctx);
>  	}
> -    }
> +
> +	  /* Collect all variables named in reductions on this loop.  Ensure

I'm usually the last one to complain about such things ;-) -- but here
really the indentation of the new code seems to be off?  Please verify.
Maybe you had set a tab-stop to four spaces instead of eight?

> +		 that, if this loop has a reduction on some variable v, and there is
> +		 a reduction on v somewhere in an outer context, then there is a
> +		 reduction on v on all intervening loops as well.  */
> +	  tree local_reduction_clauses = NULL;
> +	  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
> +		{
> +		  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
> +			local_reduction_clauses
> +		  = tree_cons (NULL, c, local_reduction_clauses);
> +		}
> +	  if (ctx->outer_reduction_clauses == NULL && ctx->outer != NULL)
> +		ctx->outer_reduction_clauses
> +	  = chainon (unshare_expr (ctx->outer->local_reduction_clauses),
> +			 ctx->outer->outer_reduction_clauses);
> +	  tree outer_reduction_clauses = ctx->outer_reduction_clauses;
> +	  tree local_iter = local_reduction_clauses;
> +	  for (; local_iter; local_iter = TREE_CHAIN (local_iter))
> +		{
> +		  tree local_clause = TREE_VALUE (local_iter);
> +		  tree local_var = OMP_CLAUSE_DECL (local_clause);
> +		  tree_code local_op = OMP_CLAUSE_REDUCTION_CODE (local_clause);
> +		  bool have_outer_reduction = false;
> +		  tree ctx_iter = outer_reduction_clauses;
> +		  for (; ctx_iter; ctx_iter = TREE_CHAIN (ctx_iter))
> +			{
> +			  tree outer_clause = TREE_VALUE (ctx_iter);
> +			  tree outer_var = OMP_CLAUSE_DECL (outer_clause);
> +			  tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
> +			  if (outer_var == local_var && outer_op != local_op)
> +				{
> +				  error_at (gimple_location (stmt),
> +							"conflicting reduction operations for %qE",
> +							local_var);
> +				  inform (OMP_CLAUSE_LOCATION (outer_clause),
> +						  "location of the previous reduction for %qE",
> +						  outer_var);
> +				  /* Change this operation to be equal to the outer one.
> +					 This is meant to suppress spurious errors; for example,
> +					 in nested +, -, + reductions, we would generate errors
> +					 for both the change from + to - and from - to +.  */
> +				  OMP_CLAUSE_REDUCTION_CODE (local_clause) = outer_op;
> +				  /* Also change the location so that in nested +, -, -
> +					 reductions, the second error message also refers to the
> +					 outermost + reduction.  */
> +				  OMP_CLAUSE_LOCATION (local_clause)
> +			= OMP_CLAUSE_LOCATION (outer_clause);
> +				}
> +			  if (outer_var == local_var)
> +				{
> +				  have_outer_reduction = true;
> +				  break;
> +				}
> +			}
> +		  if (have_outer_reduction)
> +			{
> +			  /* There is a reduction on outer_var both on this loop and on
> +				 some enclosing loop.  Walk up the context tree until such a
> +				 loop with a reduction on outer_var is found, and complain
> +				 about all intervening loops that do not have such a
> +				 reduction.  */
> +			  struct omp_context *curr_loop = ctx->outer;
> +			  bool found = false;
> +			  while (curr_loop != NULL)
> +				{
> +				  tree curr_iter = curr_loop->local_reduction_clauses;
> +				  for (; curr_iter; curr_iter = TREE_CHAIN (curr_iter))
> +					{
> +					  tree curr_clause = TREE_VALUE (curr_iter);
> +					  tree curr_var = OMP_CLAUSE_DECL (curr_clause);
> +					  if (curr_var == local_var)
> +						{
> +						  found = true;
> +						  break;
> +						}
> +					}
> +				  if (!found)
> +					error_at (gimple_location (curr_loop->stmt),
> +							  "nested loop in reduction needs "
> +							  "reduction clause for %qE",
> +							  local_var);
> +				  else
> +					break;
> +				  curr_loop = curr_loop->outer;
> +				}
> +			}
> +		}
> +	  ctx->local_reduction_clauses = local_reduction_clauses;
> +	  ctx->outer_reduction_clauses
> +	= chainon (unshare_expr (ctx->local_reduction_clauses),
> +		   ctx->outer_reduction_clauses);
> +	}
>  
>    scan_sharing_clauses (clauses, ctx);


Regarding my suggestions to "demote error to warning diagnostics", I'd
then adjust the test cases as follows:

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c

Rename to '*-warn.c', and instead of 'dg-error' use 'dg-warning'
(possibly more than currently).

> @@ -0,0 +1,492 @@
> +/* Test erroneous cases of nested reduction loops.  */
> +
> +void acc_parallel (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  #pragma acc parallel
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but using a combined parallel loop construct.  */
> +
> +void acc_parallel_loop (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  #pragma acc parallel loop
> +  for (int h = 0; h < 10; ++h)
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but now the outermost reduction clause is on
> +   the parallel region, not the outermost loop.  */
> +void acc_parallel_reduction (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  #pragma acc parallel reduction(+:sum)
> +  {
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but using a combined parallel loop construct, and
> +   the outermost reduction clause is on that one, not the outermost loop.  */
> +void acc_parallel_loop_reduction (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  #pragma acc parallel loop reduction(+:sum)
> +  for (int h = 0; h < 10; ++h)
> +  {
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but inside a routine construct.  */
> +#pragma acc routine gang
> +void acc_routine (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +          #pragma acc loop reduction(+:sum)
> +          for (l = 0; l < 10; l++)
> +            sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
> +	  for (l = 0; l < 10; l++)
> +	    sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +void acc_kernels (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  /* FIXME:  No diagnostics are produced for these loops because reductions
> +     in kernels regions are not supported yet.  */
> +  #pragma acc kernels
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:diff)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(-:sum)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +  }
> +}


> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions.c

No changes.

> @@ -0,0 +1,420 @@
> +/* Test cases of nested reduction loops that should compile cleanly.  */
> +
> +void acc_parallel (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  #pragma acc parallel
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop collapse(2) reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but using a combined parallel loop construct.  */
> +
> +void acc_parallel_loop (void)
> +{
> +  int i, j, k, l, sum, diff;
> +
> +  #pragma acc parallel loop
> +  for (int h = 0; h < 10; ++h)
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop collapse(2) reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but now the outermost reduction clause is on
> +   the parallel region, not the outermost loop.  */
> +
> +void acc_parallel_reduction (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  #pragma acc parallel reduction(+:sum)
> +  {
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but using a combined parallel loop construct, and
> +   the outermost reduction clause is on that one, not the outermost loop.  */
> +void acc_parallel_loop_reduction (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  #pragma acc parallel loop reduction(+:sum)
> +  for (int h = 0; h < 10; ++h)
> +  {
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop // { dg-warning "insufficient partitioning available to parallelize loop" }
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +/* The same tests as above, but inside a routine construct.  */
> +#pragma acc routine gang
> +void acc_routine (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop collapse(2) reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop collapse(2) reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum) reduction(-:diff)
> +    for (i = 0; i < 10; i++)
> +      {
> +        #pragma acc loop reduction(+:sum)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(+:sum)
> +          for (k = 0; k < 10; k++)
> +            sum = 1;
> +
> +        #pragma acc loop reduction(-:diff)
> +        for (j = 0; j < 10; j++)
> +          #pragma acc loop reduction(-:diff)
> +          for (k = 0; k < 10; k++)
> +            diff = 1;
> +      }
> +  }
> +}
> +
> +void acc_kernels (void)
> +{
> +  int i, j, k, sum, diff;
> +
> +  /* FIXME:  These tests are not meaningful yet because reductions in
> +     kernels regions are not supported yet.  */
> +  #pragma acc kernels
> +  {
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +
> +    #pragma acc loop reduction(+:sum)
> +    for (i = 0; i < 10; i++)
> +      #pragma acc loop reduction(+:sum)
> +      for (j = 0; j < 10; j++)
> +        #pragma acc loop reduction(+:sum)
> +        for (k = 0; k < 10; k++)
> +          sum = 1;
> +  }
> +}

> --- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
> @@ -16,17 +16,6 @@ int foo (int N)
>        }
>    }
>  
> -  #pragma acc parallel
> -  {
> -    #pragma acc loop reduction(+:b)
> -    for (int i = 0; i < N; i++)
> -      {
> -        #pragma acc loop
> -	for (int j = 0; j < N; j++)
> -	  b += 1;
> -      }
> -  }
> -
>    #pragma acc parallel
>    {
>      #pragma acc loop reduction(+:c)

That one stays in, but gets a 'dg-warning'.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
> @@ -15,7 +15,7 @@ main (int argc, char *argv[])
>    #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
>      reduction(+:res) copy(res)
>    {
> -    #pragma acc loop gang
> +    #pragma acc loop gang reduction(+:res)
>      for (j = 0; j < 32; j++)
>        {
>  	#pragma acc loop worker reduction(+:res)
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
> @@ -14,7 +14,7 @@ main (int argc, char *argv[])
>    #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
>  		       reduction(^:res)
>    {
> -    #pragma acc loop gang
> +    #pragma acc loop gang reduction(^:res)
>      for (j = 0; j < 32; j++)
>        {
>  	#pragma acc loop worker vector reduction(^:res)
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
> @@ -16,7 +16,7 @@ main (int argc, char *argv[])
>    #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
>      reduction(+:res) copy(res)
>    {
> -    #pragma acc loop gang
> +    #pragma acc loop gang reduction(+:res)
>      for (j = 0; j < 32; j++)
>        {
>  	#pragma acc loop worker vector reduction(+:res)
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
> @@ -16,7 +16,7 @@ main (int argc, char *argv[])
>    #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
>      reduction(+:res) reduction(max:mres) copy(res, mres)
>    {
> -    #pragma acc loop gang
> +    #pragma acc loop gang reduction(+:res) reduction(max:mres)
>      for (j = 0; j < 32; j++)
>        {
>  	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)

These also leave as they are, add 'dg-warning's.  I previously
(internally, 2018-12-20) noted:

| [...] I have however not verified at this point, why
| these PASSed before, without those reduction clauses, and whether they're
| now still testing what they're meant to be testing.  (We can do that
| later, independently; I've made a note.)


Grüße
 Thomas
Frederik Harwath Oct. 29, 2019, 12:20 p.m. UTC | #2
On 24.10.19 16:31, Thomas Schwinge wrote:

Hi,
I have attached a revised patch.

>> [...] I was wondering if the way in which the patch
>> avoids issuing errors about operator switches more than once by modifying the clauses (cf. the
>> corresponding comment in omp-low.c) could lead to problems [...]
>  
> "Patching up" erroneous state or even completely removing OMP clauses is
> -- as far as I understand -- acceptable to avoid "issuing errors about
> operator switches more than once".  This doesn't affect code generation,
> because no code will be generated at all.
> 
> (Does that answer your question?)
>

Yes, thank you.

> 
> Regarding my suggestions to "demote error to warning diagnostics", I'd
> suggest that at this point we do *not* try to fix for the user any
> presumed wrong/missing 'reduction' clauses (difficult/impossible to do
> correctly in the general case), but really only diagnose them.

Ok, I have changed the errors into warnings and I have removed the
code for avoiding repeated messages.
> So just C/C++ testing, no Fortran at all.  This is not ideal, but
> probably (hopefully) acceptable given that this is working on the middle
> end representation shared between all front ends.

Thanks to Tobias, we now also have Fortran tests.

> To match the order in 'struct omp_context' (see above), move these new
> initializations before those of 'ctx->depth'.  (Even if that also just
> achieves "some local consistency".)  ;-)

Done.

>> @@ -1131,6 +1141,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>>   
>>   	case OMP_CLAUSE_REDUCTION:
>>   	case OMP_CLAUSE_IN_REDUCTION:
>> +		  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
>> +			ctx->local_reduction_clauses
>> +	      = tree_cons (NULL, c, ctx->local_reduction_clauses);
>>   	  decl = OMP_CLAUSE_DECL (c);
>>   	  if (TREE_CODE (decl) == MEM_REF)
>>   	    {
> 
> I think this should really only apply to 'OMP_CLAUSE_REDUCTION' but not > 'OMP_CLAUSE_IN_REDUCTION' (please verify)?

Right, I have moved the new code to the OMP_CLAUSE_REDUCTION case above.


> I'm usually the last one to complain about such things ;-) -- but here
> really the indentation of the new code seems to be off?  Please verify.
> Maybe you had set a tab-stop to four spaces instead of eight?

Oh, it should look better now.

>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
> 
> Rename to '*-warn.c', and instead of 'dg-error' use 'dg-warning'
> (possibly more than currently).

Ok.

>> --- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
>> @@ -16,17 +16,6 @@ int foo (int N)
>>         }
>>     }
>>   
>> -  #pragma acc parallel
>> -  {
>> -    #pragma acc loop reduction(+:b)
>> -    for (int i = 0; i < N; i++)
>> -      {
>> -        #pragma acc loop
>> -	for (int j = 0; j < N; j++)
>> -	  b += 1;
>> -      }
>> -  }
>> -
>>     #pragma acc parallel
>>     {
>>       #pragma acc loop reduction(+:c)
> 
> That one stays in, but gets a 'dg-warning'.

What warning would you expect to see here? I do not get any warnings.

Best regards,
Frederik
Thomas Schwinge Nov. 5, 2019, 2:22 p.m. UTC | #3
Hi Frederik!

On 2019-10-29T13:20:53+0100, "Harwath, Frederik" <frederik@codesourcery.com> wrote:
> On 24.10.19 16:31, Thomas Schwinge wrote:
>> So just C/C++ testing, no Fortran at all.  This is not ideal, but
>> probably (hopefully) acceptable given that this is working on the middle
>> end representation shared between all front ends.
>
> Thanks to Tobias, we now also have Fortran tests.

Indeed, thanks, Tobias.  I have not reviewed these in great detail, but
they certainly do look plausible.


>>> --- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
>>> +++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
>>> @@ -16,17 +16,6 @@ int foo (int N)
>>>         }
>>>     }
>>>   
>>> -  #pragma acc parallel
>>> -  {
>>> -    #pragma acc loop reduction(+:b)
>>> -    for (int i = 0; i < N; i++)
>>> -      {
>>> -        #pragma acc loop
>>> -	for (int j = 0; j < N; j++)
>>> -	  b += 1;
>>> -      }
>>> -  }
>>> -
>>>     #pragma acc parallel
>>>     {
>>>       #pragma acc loop reduction(+:c)
>> 
>> That one stays in, but gets a 'dg-warning'.
>
> What warning would you expect to see here? I do not get any warnings.

What I meant was that you should re-instantiate the code removed here,
and then add the expected 'dg-warning'.

..., but upon having a look myself, I notice that there actually is no
"nested loop in reduction needs reduction clause" diagnostic printed
here, huh.  Should there be?  (OK to address separately, later on.)


Similar for the libgomp execution test cases: undo the 'reduction' clause
additions, and instead add the expected 'dg-warning's (here, they're
really necessary), for the reason I had given at the end of my email.

Sorry if that was unclear.

For the same reason, please also leave out Tobias' translated
'libgomp.oacc-fortran/par-loop-comb-reduction-1.f90' -- we shall later
consider that one, separately.


For your convenience, I'm attaching an incremental patch, to be merged
into yours.


> From 22f45d4c2c11febce171272f9289c487aed4f9d7 Mon Sep 17 00:00:00 2001
> From: Frederik Harwath <frederik@codesourcery.com>
> Date: Tue, 29 Oct 2019 12:39:23 +0100
> Subject: [PATCH] Warn about inconsistent OpenACC nested reduction clauses
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
>     OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
>     this was first clarified by OpenACC 2.6) requires that, if a
>     variable is used in reduction clauses on two nested loops, then
>     there must be reduction clauses for that variable on all loops
>     that are nested in between the two loops and all these reduction
>     clauses must use the same operator.
>     This commit introduces a check for that property which reports
>     warnings if it is violated.
>
>     In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
>     reductions on variable b; adding a reduction clause to make it compile cleanly
>     would make it a duplicate of the test for variable c.

The latter paragraph then is not needed anymore.

>     2019-10-29  Gergö Barany  <gergo@codesourcery.com>
> 		Tobias Burnus  <tobias@codesourcery.com>
> 		Frederik Harwath  <frederik@codesourcery.com>
> 		Thomas Schwinge  <thomas@codesourcery.com>
>
> 	 gcc/
> 	 * omp-low.c (struct omp_context): New fields
> 	 local_reduction_clauses, outer_reduction_clauses.
> 	 (new_omp_context): Initialize these.
> 	 (scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
> 	 (scan_omp_for): Check reduction clauses for incorrect nesting.
> 	 gcc/testsuite/
> 	 * c-c++-common/goacc/nested-reductions-warn.c: New test.
> 	 * c-c++-common/goacc/nested-reductions.c: New test.
> 	 * c-c++-common/goacc/reduction-6.c: Adjust.
> 	 * gfortran.dg/goacc/nested-reductions-warn.f90: New test.
> 	 * gfortran.dg/goacc/nested-reductions.f90: New test.
> 	 libgomp/
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
> 	 Add missing reduction clauses.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
> 	 Likewise.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
> 	 Likewise.
> 	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
> 	 Likewise.
> 	 * testsuite/libgomp.oacc-fortran/par-loop-comb-reduction-1.f90:
> 	 New test.

The ChangeLog updates still have to be adjusted per my incremental patch.

With that addressed, OK for trunk.


A few more comments to address separately, later on.


I noticed in the 'libgomp.log' that we currently print:

    [...]/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: In function 'main':
    [...]/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:18:13: warning: nested loop in reduction needs reduction clause for 'res'
    [...]/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:18:13: warning: nested loop in reduction needs reduction clause for 'res'

Duplicate diagnostic, due to the the two nested inner loops.  (I'm just
noting that, not complaining.)


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -2410,6 +2425,89 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)

> +	ctx->outer_reduction_clauses
> +	  = chainon (unshare_expr (ctx->outer->local_reduction_clauses),
> +		     ctx->outer->outer_reduction_clauses);

I have not quickly re-understood -- if such a word exists ;-) -- the
logic here, but maybe we don't need anymore to 'unshare_expr', as we're
now no longer modifying the 'local_clause'...

> +      tree outer_reduction_clauses = ctx->outer_reduction_clauses;
> +      tree local_iter = local_reduction_clauses;
> +      for (; local_iter; local_iter = TREE_CHAIN (local_iter))
> +	{
> +	  tree local_clause = TREE_VALUE (local_iter);
> +	  tree local_var = OMP_CLAUSE_DECL (local_clause);
> +	  tree_code local_op = OMP_CLAUSE_REDUCTION_CODE (local_clause);
> +	  bool have_outer_reduction = false;
> +	  tree ctx_iter = outer_reduction_clauses;
> +	  for (; ctx_iter; ctx_iter = TREE_CHAIN (ctx_iter))
> +	    {
> +	      tree outer_clause = TREE_VALUE (ctx_iter);
> +	      tree outer_var = OMP_CLAUSE_DECL (outer_clause);
> +	      tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
> +	      if (outer_var == local_var && outer_op != local_op)
> +		{
> +		  warning_at (gimple_location (stmt), 0,
> +			      "conflicting reduction operations for %qE",
> +			      local_var);
> +		  inform (OMP_CLAUSE_LOCATION (outer_clause),
> +			  "location of the previous reduction for %qE",
> +			  outer_var);
> +
> +		}

... here.  (Once done to "suppress spurious errors".)

> +	      if (outer_var == local_var)
> +		{
> +		  have_outer_reduction = true;
> +		  break;
> +		}
> +	    }
> +	  if (have_outer_reduction)
> +	    {
> +	      /* There is a reduction on outer_var both on this loop and on
> +		 some enclosing loop.  Walk up the context tree until such a
> +		 loop with a reduction on outer_var is found, and complain
> +		 about all intervening loops that do not have such a
> +		 reduction.  */
> +	      struct omp_context *curr_loop = ctx->outer;
> +	      bool found = false;
> +	      while (curr_loop != NULL)
> +		{
> +		  tree curr_iter = curr_loop->local_reduction_clauses;
> +		  for (; curr_iter; curr_iter = TREE_CHAIN (curr_iter))
> +		    {
> +		      tree curr_clause = TREE_VALUE (curr_iter);
> +		      tree curr_var = OMP_CLAUSE_DECL (curr_clause);
> +		      if (curr_var == local_var)
> +			{
> +			  found = true;
> +			  break;
> +			}
> +		    }
> +		  if (!found)
> +		    warning_at (gimple_location (curr_loop->stmt), 0,
> +				"nested loop in reduction needs "
> +				"reduction clause for %qE",
> +				local_var);

Might also be useful (if feasible?) to 'inform' what the outer/inner
'reduction' clauses are that trigger this.

> +		  else
> +		    break;
> +		  curr_loop = curr_loop->outer;
> +		}
> +	    }
> +	}
> +      ctx->local_reduction_clauses = local_reduction_clauses;
> +      ctx->outer_reduction_clauses
> +	= chainon (unshare_expr (ctx->local_reduction_clauses),
> +		   ctx->outer_reduction_clauses);
>      }

Same (?) comment as above about 'unshare_expr'.


Grüße
 Thomas
Frederik Harwath Nov. 6, 2019, 12:47 p.m. UTC | #4
Hi Thomas,

On 05.11.19 15:22, Thomas Schwinge wrote:

> For your convenience, I'm attaching an incremental patch, to be merged
> into yours.> [...]> With that addressed, OK for trunk.

Thank you. I have merged the patches and committed.

> A few more comments to address separately, later on.

I will look into your remaining questions.

Best regards,
Frederik

Patch
diff mbox series

From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001
From: Frederik Harwath <frederik@codesourcery.com>
Date: Mon, 21 Oct 2019 08:27:58 +0200
Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction
 clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

    OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
    this was first clarified by OpenACC 2.6) requires that, if a
    variable is used in reduction clauses on two nested loops, then
    there must be reduction clauses for that variable on all loops
    that are nested in between the two loops and all these reduction
    clauses must use the same operator.
    This commit introduces a check for that property which reports
    errors if it is violated.

    In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
    reductions on variable b; adding a reduction clause to make it compile cleanly
    would make it a duplicate of the test for variable c.

    2010-10-21  Gergö Barany  <gergo@codesourcery.com>
		Frederik Harwath  <frederik@codesourcery.com>

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC
	 constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-fail.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing reduction clauses.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	 Likewise.
---
 gcc/omp-low.c                                 | 107 +++-
 .../goacc/nested-reductions-fail.c            | 492 ++++++++++++++++++
 .../c-c++-common/goacc/nested-reductions.c    | 420 +++++++++++++++
 .../c-c++-common/goacc/reduction-6.c          |  11 -
 .../par-loop-comb-reduction-1.c               |   2 +-
 .../par-loop-comb-reduction-2.c               |   2 +-
 .../par-loop-comb-reduction-3.c               |   2 +-
 .../par-loop-comb-reduction-4.c               |   2 +-
 8 files changed, 1022 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 279b6ef893a..a2212274685 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -127,6 +127,12 @@  struct omp_context
      corresponding tracking loop iteration variables.  */
   hash_map<tree, tree> *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
      invalid gotos.  The outermost ctx is depth 1, with depth 0 being
      reserved for the main body of the function.  */
@@ -902,6 +908,8 @@  new_omp_context (gimple *stmt, omp_context *outer_ctx)
       ctx->cb = outer_ctx->cb;
       ctx->cb.block = NULL;
       ctx->depth = outer_ctx->depth + 1;
+      ctx->local_reduction_clauses = NULL;
+      ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
     }
   else
     {
@@ -917,6 +925,8 @@  new_omp_context (gimple *stmt, omp_context *outer_ctx)
       ctx->cb.adjust_array_error_bounds = true;
       ctx->cb.dont_remap_vla_if_no_change = true;
       ctx->depth = 1;
+      ctx->local_reduction_clauses = NULL;
+      ctx->outer_reduction_clauses = NULL;
     }
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
@@ -1131,6 +1141,9 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 
 	case OMP_CLAUSE_REDUCTION:
 	case OMP_CLAUSE_IN_REDUCTION:
+		  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
+			ctx->local_reduction_clauses
+	      = tree_cons (NULL, c, ctx->local_reduction_clauses);
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (decl) == MEM_REF)
 	    {
@@ -2410,7 +2423,99 @@  scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 	  gimple_omp_for_set_clauses (stmt, clauses);
 	  check_oacc_kernel_gwv (stmt, ctx);
 	}
-    }
+
+	  /* Collect all variables named in reductions on this loop.  Ensure
+		 that, if this loop has a reduction on some variable v, and there is
+		 a reduction on v somewhere in an outer context, then there is a
+		 reduction on v on all intervening loops as well.  */
+	  tree local_reduction_clauses = NULL;
+	  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+		{
+		  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+			local_reduction_clauses
+		  = tree_cons (NULL, c, local_reduction_clauses);
+		}
+	  if (ctx->outer_reduction_clauses == NULL && ctx->outer != NULL)
+		ctx->outer_reduction_clauses
+	  = chainon (unshare_expr (ctx->outer->local_reduction_clauses),
+			 ctx->outer->outer_reduction_clauses);
+	  tree outer_reduction_clauses = ctx->outer_reduction_clauses;
+	  tree local_iter = local_reduction_clauses;
+	  for (; local_iter; local_iter = TREE_CHAIN (local_iter))
+		{
+		  tree local_clause = TREE_VALUE (local_iter);
+		  tree local_var = OMP_CLAUSE_DECL (local_clause);
+		  tree_code local_op = OMP_CLAUSE_REDUCTION_CODE (local_clause);
+		  bool have_outer_reduction = false;
+		  tree ctx_iter = outer_reduction_clauses;
+		  for (; ctx_iter; ctx_iter = TREE_CHAIN (ctx_iter))
+			{
+			  tree outer_clause = TREE_VALUE (ctx_iter);
+			  tree outer_var = OMP_CLAUSE_DECL (outer_clause);
+			  tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
+			  if (outer_var == local_var && outer_op != local_op)
+				{
+				  error_at (gimple_location (stmt),
+							"conflicting reduction operations for %qE",
+							local_var);
+				  inform (OMP_CLAUSE_LOCATION (outer_clause),
+						  "location of the previous reduction for %qE",
+						  outer_var);
+				  /* Change this operation to be equal to the outer one.
+					 This is meant to suppress spurious errors; for example,
+					 in nested +, -, + reductions, we would generate errors
+					 for both the change from + to - and from - to +.  */
+				  OMP_CLAUSE_REDUCTION_CODE (local_clause) = outer_op;
+				  /* Also change the location so that in nested +, -, -
+					 reductions, the second error message also refers to the
+					 outermost + reduction.  */
+				  OMP_CLAUSE_LOCATION (local_clause)
+			= OMP_CLAUSE_LOCATION (outer_clause);
+				}
+			  if (outer_var == local_var)
+				{
+				  have_outer_reduction = true;
+				  break;
+				}
+			}
+		  if (have_outer_reduction)
+			{
+			  /* There is a reduction on outer_var both on this loop and on
+				 some enclosing loop.  Walk up the context tree until such a
+				 loop with a reduction on outer_var is found, and complain
+				 about all intervening loops that do not have such a
+				 reduction.  */
+			  struct omp_context *curr_loop = ctx->outer;
+			  bool found = false;
+			  while (curr_loop != NULL)
+				{
+				  tree curr_iter = curr_loop->local_reduction_clauses;
+				  for (; curr_iter; curr_iter = TREE_CHAIN (curr_iter))
+					{
+					  tree curr_clause = TREE_VALUE (curr_iter);
+					  tree curr_var = OMP_CLAUSE_DECL (curr_clause);
+					  if (curr_var == local_var)
+						{
+						  found = true;
+						  break;
+						}
+					}
+				  if (!found)
+					error_at (gimple_location (curr_loop->stmt),
+							  "nested loop in reduction needs "
+							  "reduction clause for %qE",
+							  local_var);
+				  else
+					break;
+				  curr_loop = curr_loop->outer;
+				}
+			}
+		}
+	  ctx->local_reduction_clauses = local_reduction_clauses;
+	  ctx->outer_reduction_clauses
+	= chainon (unshare_expr (ctx->local_reduction_clauses),
+		   ctx->outer_reduction_clauses);
+	}
 
   scan_sharing_clauses (clauses, ctx);
 
diff --git a/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
new file mode 100644
index 00000000000..a642dd03870
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
@@ -0,0 +1,492 @@ 
+/* Test erroneous cases of nested reduction loops.  */
+
+void acc_parallel (void)
+{
+  int i, j, k, l, sum, diff;
+
+  #pragma acc parallel
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but using a combined parallel loop construct.  */
+
+void acc_parallel_loop (void)
+{
+  int i, j, k, l, sum, diff;
+
+  #pragma acc parallel loop
+  for (int h = 0; h < 10; ++h)
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but now the outermost reduction clause is on
+   the parallel region, not the outermost loop.  */
+void acc_parallel_reduction (void)
+{
+  int i, j, k, l, sum, diff;
+
+  #pragma acc parallel reduction(+:sum)
+  {
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but using a combined parallel loop construct, and
+   the outermost reduction clause is on that one, not the outermost loop.  */
+void acc_parallel_loop_reduction (void)
+{
+  int i, j, k, l, sum, diff;
+
+  #pragma acc parallel loop reduction(+:sum)
+  for (int h = 0; h < 10; ++h)
+  {
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(max:sum) // { dg-error "conflicting reduction operations for .sum." }
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but inside a routine construct.  */
+#pragma acc routine gang
+void acc_routine (void)
+{
+  int i, j, k, l, sum, diff;
+
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+          #pragma acc loop reduction(+:sum)
+          for (l = 0; l < 10; l++)
+            sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum) // { dg-error "conflicting reduction operations for .sum." }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+	  #pragma acc loop reduction(*:sum) // { dg-error "conflicting reduction operations for .sum." }
+	  for (l = 0; l < 10; l++)
+	    sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(-:diff) // { dg-error "nested loop in reduction needs reduction clause for .sum." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(+:sum) // { dg-error "nested loop in reduction needs reduction clause for .diff." }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+void acc_kernels (void)
+{
+  int i, j, k, sum, diff;
+
+  /* FIXME:  No diagnostics are produced for these loops because reductions
+     in kernels regions are not supported yet.  */
+  #pragma acc kernels
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:diff)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(-:sum)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/nested-reductions.c b/gcc/testsuite/c-c++-common/goacc/nested-reductions.c
new file mode 100644
index 00000000000..15385c4da99
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions.c
@@ -0,0 +1,420 @@ 
+/* Test cases of nested reduction loops that should compile cleanly.  */
+
+void acc_parallel (void)
+{
+  int i, j, k, sum, diff;
+
+  #pragma acc parallel
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop collapse(2) reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but using a combined parallel loop construct.  */
+
+void acc_parallel_loop (void)
+{
+  int i, j, k, l, sum, diff;
+
+  #pragma acc parallel loop
+  for (int h = 0; h < 10; ++h)
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop collapse(2) reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but now the outermost reduction clause is on
+   the parallel region, not the outermost loop.  */
+
+void acc_parallel_reduction (void)
+{
+  int i, j, k, sum, diff;
+
+  #pragma acc parallel reduction(+:sum)
+  {
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    for (i = 0; i < 10; i++)
+      #pragma acc loop
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but using a combined parallel loop construct, and
+   the outermost reduction clause is on that one, not the outermost loop.  */
+void acc_parallel_loop_reduction (void)
+{
+  int i, j, k, sum, diff;
+
+  #pragma acc parallel loop reduction(+:sum)
+  for (int h = 0; h < 10; ++h)
+  {
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    for (i = 0; i < 10; i++)
+      #pragma acc loop
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum) // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop // { dg-warning "insufficient partitioning available to parallelize loop" }
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+/* The same tests as above, but inside a routine construct.  */
+#pragma acc routine gang
+void acc_routine (void)
+{
+  int i, j, k, sum, diff;
+
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop collapse(2) reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop collapse(2) reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum) reduction(-:diff)
+    for (i = 0; i < 10; i++)
+      {
+        #pragma acc loop reduction(+:sum)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(+:sum)
+          for (k = 0; k < 10; k++)
+            sum = 1;
+
+        #pragma acc loop reduction(-:diff)
+        for (j = 0; j < 10; j++)
+          #pragma acc loop reduction(-:diff)
+          for (k = 0; k < 10; k++)
+            diff = 1;
+      }
+  }
+}
+
+void acc_kernels (void)
+{
+  int i, j, k, sum, diff;
+
+  /* FIXME:  These tests are not meaningful yet because reductions in
+     kernels regions are not supported yet.  */
+  #pragma acc kernels
+  {
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+
+    #pragma acc loop reduction(+:sum)
+    for (i = 0; i < 10; i++)
+      #pragma acc loop reduction(+:sum)
+      for (j = 0; j < 10; j++)
+        #pragma acc loop reduction(+:sum)
+        for (k = 0; k < 10; k++)
+          sum = 1;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-6.c b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
index 619f82b9d8b..3c10b4dddaf 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
@@ -16,17 +16,6 @@  int foo (int N)
       }
   }
 
-  #pragma acc parallel
-  {
-    #pragma acc loop reduction(+:b)
-    for (int i = 0; i < N; i++)
-      {
-        #pragma acc loop
-	for (int j = 0; j < N; j++)
-	  b += 1;
-      }
-  }
-
   #pragma acc parallel
   {
     #pragma acc loop reduction(+:c)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
index 5e82e1d350c..91fe772045d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
@@ -15,7 +15,7 @@  main (int argc, char *argv[])
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
     reduction(+:res) copy(res)
   {
-    #pragma acc loop gang
+    #pragma acc loop gang reduction(+:res)
     for (j = 0; j < 32; j++)
       {
 	#pragma acc loop worker reduction(+:res)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
index a339f327956..8a7bc724070 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c
@@ -14,7 +14,7 @@  main (int argc, char *argv[])
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
 		       reduction(^:res)
   {
-    #pragma acc loop gang
+    #pragma acc loop gang reduction(^:res)
     for (j = 0; j < 32; j++)
       {
 	#pragma acc loop worker vector reduction(^:res)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
index 6369d7fbb33..eba5c6a544e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -16,7 +16,7 @@  main (int argc, char *argv[])
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
     reduction(+:res) copy(res)
   {
-    #pragma acc loop gang
+    #pragma acc loop gang reduction(+:res)
     for (j = 0; j < 32; j++)
       {
 	#pragma acc loop worker vector reduction(+:res)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
index 140c3226327..12b823f33ab 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
@@ -16,7 +16,7 @@  main (int argc, char *argv[])
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
     reduction(+:res) reduction(max:mres) copy(res, mres)
   {
-    #pragma acc loop gang
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
     for (j = 0; j < 32; j++)
       {
 	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
-- 
2.17.1